大连理工大学主页平台管理系统黄德根 Mining english-Chinese named entity pairs from comparable corpora 自然语言处理

论文成果

当前位置: 自然语言处理 >> 科学研究 >> 论文成果

Mining english-Chinese named entity pairs from comparable corpora

发布时间：2019-03-11 点击次数：

论文类型：期刊论文
发表刊物：ACM Transactions on Asian Language Information Processing
收录刊物：Scopus、EI
卷号：10
期号：4
ISSN号：15300226
摘要：Bilingual Named Entity (NE) pairs are valuable resources for many NLP applications. Since comparable corpora are more accessible, abundant and up-to-date, recent researches have concentrated on mining bilingual lexicons using comparable corpora. Leveraging comparable corpora, this research presents a novel approach to mining English-Chinese NE translations by combining multi-dimension features from various information sources for every possible NE pair, which include the transliteration model, English-Chinese matching, Chinese-English matching, translation model, length, and context vector. These features are integrated into one model with linear combination and minimum sample risk (MSR) algorithm. As for the high type-dependence of NE translation, we integrate different features according to different NE types. We experiment with the above individual feature or integrated features to mine person NE (PN) pairs, location NE (LN) pairs and organization NE (ON) pairs. When using transliteration and length to mine PN pairs, we achieve the best performance of 84.9% (F-score). The LN pairs can be mined with the features of transliteration model, length, translation model, English-Chinese matching and Chinese-English matching. And the best performance is 83.4% (F-score). The ON pairs can be mined with the features of English-Chinese matching and Chinese-English matching. It reaches the best performance with 84.1% (F-score). ? 2011 ACM.

上一条：DBS application to Chinese adjectives

下一条：Improved word alignment in patent domain

基本信息

黄德根Huang Degen

同专业博导

同专业硕导

个人学术主页

论文成果

Mining english-Chinese named entity pairs from comparable corpora