论文名称:Pragmatic Chinese lexical analysis based on word-character hybrid model 论文类型:期刊论文 发表刊物:Journal of Information and Computational Science 收录刊物:EI、Scopus 卷号:7 期号:4 页面范围:827-832 ISSN号:15487741 摘要:In the field of information and natural language processing, Chinese lexical analysis is important basic step for Chinese, Japanese or other asian language. This paper presents Chinese lexical analysis integrating word-level and character-level information based on hybrid model combining word-based CRF model and latent semi-CRF model. The word-lattice, which represents all candidate outputs, is built by utilizing the system lexicon. The linear-chain CRF is applied in the selection of final token sequence from word-lattice by using rich and flexible predefined features. Latent semi-CRF model is adopted in unknown word processing, which is character-based and invoked when no matching word can be found in a lexicon for building the lattice. This pragmatic method based on hybrid CRFs models offers a solution to the long-standing problems in corpus-based or statistical, word-based or character-based Chinese lexical analysis. First, flexible feature designs for hierarchical tag sets become possible. Second, influences of label and length bias are minimized. Third, the word-level information for the known words and the character-level information for the unknown words can be combined and fully used. ? 2010 Binary Information Press. 发表时间:2010-04-01