马建军

个人信息Personal Information

教授

硕士生导师

性别:女

毕业院校:大连理工大学

学位:博士

所在单位:外国语学院

学科:外国语言学及应用语言学

办公地点:文科楼107

联系方式:majian@dlut.edu.cn

电子邮箱:majian@dlut.edu.cn

扫描关注

论文成果

当前位置: 中文主页 >> 科学研究 >> 论文成果

An English part-of-speech tagger for machine translation in business domain

点击次数:

论文类型:会议论文

发表时间:2011-11-27

收录刊物:EI、Scopus

页面范围:183-189

摘要:Part-of-speech tagging is a crucial preprocessing step for machine translation. Current studies mainly focus on the methods, linguistic, statistic, machine learning or hybrid. But so far not many serious attempts have been performed to test the reported accuracy of taggers on different, perhaps domain-specific, corpora. Therefore, this paper presents an English POS tagger for English-Chinese machine translation in business domain, demonstrating how a present tagger can be adapted to learn from a small amount of data and handle unknown words for the purpose of machine translation. A small size of 998k English annotated corpus in business domain is built semi-automatically based on a new tagset, the maximum entropy model is adopted and rule-based approach is used in post-processing. Experiments show that our tagger achieves an accuracy of 99.08% in closed test and 98.14% in open test, which is a quite satisfactory result, compared with the reported best open test result of 97.18% of Stanford English tagger. ? 2011 IEEE.