Hits:
Indexed by:期刊论文
Date of Publication:2013-01-01
Journal:Journal of Theoretical and Applied Information Technology
Included Journals:Scopus
Volume:49
Issue:1
Page Number:214-221
ISSN No.:19928645
Abstract:A great deal of information included in Chinese text is invaluable asset for further text mining, but the difference between Chinese and the western languages imposes restrictions on further utilization of Chinese text. No distinction indication between words by using spaces is one of the major differences between Chinese, also some other Asian languages, such as Japanese, Thai, etc., and Western languages. Chinese segmentation and features extraction is essential in Chinese natural language processing because it is a precondition for further Chinese text information retrieval and knowledge discovery. Maximum matching and frequency statistics (MMFS) segmentation method based on length descending and string frequency statistics is an effective segmentation and extraction method for Chinese words and phrases, but there are still some shorter words and phrases included in the longer ones extracted by MMFS can't be obtained. In order to solve this problem, this paper presents a novel Chinese hierarchy feature extraction method combined MMFS with iterative learning algorithm. This method can extract hierarchy feature according to morphology with no need for lexicon support, no need for acquiring the probability between words in advance and no need for Chinese character index. Experimental results confirm the efficiency of this statistical method in extracting Chinese hierarchy feature. This method is also beneficial to feature extraction for other Asian languages similar to Chinese. ? 2005 - 2013 JATIT & LLS. All rights reserved.