大连理工大学主页平台管理系统赵亮 Mining large-scale comparable corpora from Chinese-English news collections Home

Current position: Home >> Scientific Research >> Paper Publications

Mining large-scale comparable corpora from Chinese-English news collections

Release Time:2019-03-11 Hits:

Indexed by: Conference Paper

Date of Publication: 2010-08-23

Included Journals: Scopus、EI

Volume: 2

Page Number: 472-480

Abstract: In this paper, we explore a CLIR-based? approach to construct large-scale Chinese- English comparable corpora, which is valuable for translation knowledge mining. The initial source and target document sets are crawled from news website and standardized uniformly. Keywords are extracted from the source document firstly, and then the extracted keywords are translated and combined as query words through certain criteria to retrieve against the index created using target document set. Meanwhile, the mapping correlations between source and target documents are developed according to the value of similarity calculated by the retrieval tool. Two methods are evaluated to filter the comparable document pairs so as to ensure the quality of the comparable corpora. Experimental results indicate that our approach is effective on the construction of Chinese- English comparable corpora.

Prev One:Internet time-delay prediction based on Wavelet transformation and ARIMA

Next One:Remote ethernet data transmission system based on hardware protocol stack chip

Home

Scientific Research

Teaching Research

Awards and Honours

Enrollment Information

Student Information

My Album

Blog

Mining large-scale comparable corpora from Chinese-English news collections