黄德根Huang Degen

(教授)

 博士生导师  硕士生导师
学位:博士
性别:男
毕业院校:大连理工大学
所在单位:计算机科学与技术学院
电子邮箱:huangdg@dlut.edu.cn

论文成果

Context Information and Fragments Based Cross-Domain Word Segmentation

发表时间:2019-03-09 点击次数:

论文名称:Context Information and Fragments Based Cross-Domain Word Segmentation
论文类型:期刊论文
发表刊物:CHINA COMMUNICATIONS
收录刊物:SCIE、CSCD、Scopus
卷号:9
期号:3
页面范围:49-57
ISSN号:1673-5447
关键字:cross-domain CWS; Conditional Randem Fields(CRFs); joint decoding; context variables; segmentation fragments
摘要:A new joint decoding strategy that combines the character-based and word-based conditional random field model is proposed. In this segmentation framework, fragments are used to generate candidate Out-of-Vocabularies (OOVs). After the initial segmentation, the segmentation fragments are divided into two classes as "combination" (combining several fragments as an unknown word) and "segregation" (segregating to some words). So, more OOVs can be recalled. Moreover, for the characteristics of the cross-domain segmentation, context information is reasonably used to guide Chinese Word Segmentation (CWS). This method is proved to be effective through several experiments on the test data from Sighan Bakeoffs 2007 and Bakeoffs 2010. The rates of OOV recall obtain better performance and the overall segmentation performances achieve a good effect.
发表时间:2012-03-01