黄德根Huang Degen

(教授)

 博士生导师  硕士生导师
学位:博士
性别:男
毕业院校:大连理工大学
所在单位:计算机科学与技术学院
电子邮箱:huangdg@dlut.edu.cn

论文成果

Pragmatic Chinese lexical analysis based on word-character hybrid model

发表时间:2019-03-11 点击次数:

论文名称:Pragmatic Chinese lexical analysis based on word-character hybrid model
论文类型:期刊论文
发表刊物:Journal of Information and Computational Science
收录刊物:EI、Scopus
卷号:7
期号:4
页面范围:827-832
ISSN号:15487741
摘要:In the field of information and natural language processing, Chinese lexical analysis is important basic step for Chinese, Japanese or other asian language. This paper presents Chinese lexical analysis integrating word-level and character-level information based on hybrid model combining word-based CRF model and latent semi-CRF model. The word-lattice, which represents all candidate outputs, is built by utilizing the system lexicon. The linear-chain CRF is applied in the selection of final token sequence from word-lattice by using rich and flexible predefined features. Latent semi-CRF model is adopted in unknown word processing, which is character-based and invoked when no matching word can be found in a lexicon for building the lattice. This pragmatic method based on hybrid CRFs models offers a solution to the long-standing problems in corpus-based or statistical, word-based or character-based Chinese lexical analysis. First, flexible feature designs for hierarchical tag sets become possible. Second, influences of label and length bias are minimized. Third, the word-level information for the known words and the character-level information for the unknown words can be combined and fully used. ? 2010 Binary Information Press.
发表时间:2010-04-01