论文名称:Using context and semantic resources for cross-domain word segmentation 论文类型:会议论文 收录刊物:EI、Scopus 页面范围:227-232 摘要:Chinese word Segmentation (CWS) plays a fundamental role in Chinese language processing, because almost all Chinese language processing tasks are assumed to work with segmented input. After active research for many years, most of reports from evaluation tasks always give impressive results. But most of them are limited to testing corpora on specific area. Once used on another different domain, the accuracy will plummet. Thus, the domain-adaptive word segmentation is introduced into Bakeoffs. In this paper, we propose a new joint decoding strategy that combines the character-based and word-based conditional random field model, which takes the part-of-speech of words in dictionary as important features in a segment path. Moreover, according to the characteristics of the cross-domain segmentation, context information is reasonably used to guide CWS. Besides, because there are similar contexts among synonyms, semantic information can be used to recall some out-of-vocabularies (OOVs). This method is proven to be effective through several experiments on the simplified Chinese test data from SIGHAN Bakeoff 2010. Except for the domain of literature, the F-scores are higher than the best performance of the corresponding open test. In addition, the rate of OOV recall reaches 70.7%, 84.3%, 79.0% and 86.2%, respectively. ? 2011 IEEE. 发表时间:2011-11-27