Hits:
Indexed by:期刊论文
Date of Publication:2010-04-01
Journal:Journal of Information and Computational Science
Included Journals:EI、Scopus
Volume:7
Issue:4
Page Number:827-832
ISSN No.:15487741
Abstract:In the field of information and natural language processing, Chinese lexical analysis is important basic step for Chinese, Japanese or other asian language. This paper presents Chinese lexical analysis integrating word-level and character-level information based on hybrid model combining word-based CRF model and latent semi-CRF model. The word-lattice, which represents all candidate outputs, is built by utilizing the system lexicon. The linear-chain CRF is applied in the selection of final token sequence from word-lattice by using rich and flexible predefined features. Latent semi-CRF model is adopted in unknown word processing, which is character-based and invoked when no matching word can be found in a lexicon for building the lattice. This pragmatic method based on hybrid CRFs models offers a solution to the long-standing problems in corpus-based or statistical, word-based or character-based Chinese lexical analysis. First, flexible feature designs for hierarchical tag sets become possible. Second, influences of label and length bias are minimized. Third, the word-level information for the known words and the character-level information for the unknown words can be combined and fully used. ? 2010 Binary Information Press.