location: Current position: Home >> Scientific Research >> Paper Publications

Automatic foreign person names extraction from chinese documents on the web

Hits:

Indexed by:期刊论文

Date of Publication:2010-02-01

Journal:ICIC Express Letters

Included Journals:EI、Scopus

Volume:4

Issue:1

Page Number:189-196

ISSN No.:1881803X

Abstract:In this paper, a bootstrapping method for automatically extractzng foreignperson names (F-names) from Chinese web pages is presented. Starting from asmall set of F-name characters, the method iteratively extracts text-segmentscontaining F-name characters from the web. A context cue-word set is used toimprove the efficiency of extractzng. Statistic information is used to recognizeF-names from these text-segments. A confidence measure is assigned to eachpossible F-name candidate and a segmentation digraph is constructed forselecting F-names from F-name candidates. The method is used to extract 10000F-names from the Internet and the recognition precision is about 87%. Theresults show that the proposed method is effective. ICIC International ? 2010.

Pre One:正则表达式在汉英对照中国文化术语抽取中应用

Next One:Designing effective web mining-based techniques for OOV translation