location: Current position: Home >> Scientific Research >> Paper Publications

Identification of English prepositional phrases within business domain for machine translation

Hits:

Indexed by:期刊论文

Date of Publication:2013-10-10

Journal:Journal of Information and Computational Science

Included Journals:EI、Scopus

Volume:10

Issue:15

Page Number:4849-4860

ISSN No.:15487741

Abstract:An MT-oriented system using Conditional Random Fields (CRFs) is presented to identify English Prepositional Phrases (PPs) within business domain. For the purpose of English-Chinese Machine Translation (MT), we, under the guidance of the theory of Syntactic Functional Grammar (SFG), refine PP function chunks into four types instead of the binary attachment. In order to improve the identification of these chunk types, we revise the Penn Treebank tagset with four major changes being made. A small size of 998k English annotated corpus in business domain is semi-automatically built based on our new tagset employing the Maximum Entropy model. Experiments show that our system achieves an accuracy of 88.45%, higher than other reported approaches. The adjustments made in the PP chunk types and POS tagset give rise to 4.11%, 4.25% and 4.15% increase in the precision, recall and F-score respectively. ? 2013 Binary Information Press.

Pre One:功能小句自动句法分析结果的错误分析

Next One:从功能语法研究面向机器翻译的介词短语附着