副教授 博士生导师 硕士生导师
性别: 男
毕业院校: 大连理工大学
学位: 博士
所在单位: 建设管理系
学科: 工程管理
办公地点: 综合实验四号楼517室
电子邮箱: shjiang@dlut.edu.cn
开通时间: ..
最后更新时间: ..
点击次数:
论文类型: 期刊论文
发表时间: 2013-01-01
发表刊物: Journal of Theoretical and Applied Information Technology
收录刊物: Scopus
卷号: 49
期号: 1
页面范围: 214-221
ISSN号: 19928645
摘要: A great deal of information included in Chinese text is invaluable asset for further text mining, but the difference between Chinese and the western languages imposes restrictions on further utilization of Chinese text. No distinction indication between words by using spaces is one of the major differences between Chinese, also some other Asian languages, such as Japanese, Thai, etc., and Western languages. Chinese segmentation and features extraction is essential in Chinese natural language processing because it is a precondition for further Chinese text information retrieval and knowledge discovery. Maximum matching and frequency statistics (MMFS) segmentation method based on length descending and string frequency statistics is an effective segmentation and extraction method for Chinese words and phrases, but there are still some shorter words and phrases included in the longer ones extracted by MMFS can't be obtained. In order to solve this problem, this paper presents a novel Chinese hierarchy feature extraction method combined MMFS with iterative learning algorithm. This method can extract hierarchy feature according to morphology with no need for lexicon support, no need for acquiring the probability between words in advance and no need for Chinese character index. Experimental results confirm the efficiency of this statistical method in extracting Chinese hierarchy feature. This method is also beneficial to feature extraction for other Asian languages similar to Chinese. ? 2005 - 2013 JATIT & LLS. All rights reserved.