location: Current position: Shaohua Jiang's homepage >> Scientific Research >> Paper Publications

An approach based on iterative learning algorithm for Chinese text hierarchy feature extraction without lexicon

Hits:

Indexed by:期刊论文

Date of Publication:2013-01-01

Journal:Journal of Theoretical and Applied Information Technology

Included Journals:Scopus

Volume:49

Issue:1

Page Number:214-221

ISSN No.:19928645

Abstract:A great deal of information included in Chinese text is invaluable asset for further text mining, but the difference between Chinese and the western languages imposes restrictions on further utilization of Chinese text. No distinction indication between words by using spaces is one of the major differences between Chinese, also some other Asian languages, such as Japanese, Thai, etc., and Western languages. Chinese segmentation and features extraction is essential in Chinese natural language processing because it is a precondition for further Chinese text information retrieval and knowledge discovery. Maximum matching and frequency statistics (MMFS) segmentation method based on length descending and string frequency statistics is an effective segmentation and extraction method for Chinese words and phrases, but there are still some shorter words and phrases included in the longer ones extracted by MMFS can't be obtained. In order to solve this problem, this paper presents a novel Chinese hierarchy feature extraction method combined MMFS with iterative learning algorithm. This method can extract hierarchy feature according to morphology with no need for lexicon support, no need for acquiring the probability between words in advance and no need for Chinese character index. Experimental results confirm the efficiency of this statistical method in extracting Chinese hierarchy feature. This method is also beneficial to feature extraction for other Asian languages similar to Chinese. ? 2005 - 2013 JATIT & LLS. All rights reserved.

Pre One:Study and Application of Information Management System for Government-invested Construction Projects

Next One:Research on BIM-based construction domain text information management