论文名称:Automatic foreign person names extraction from chinese documents on the web 论文类型:期刊论文 发表刊物:ICIC Express Letters 收录刊物:EI、Scopus 卷号:4 期号:1 页面范围:189-196 ISSN号:1881803X 摘要:In this paper, a bootstrapping method for automatically extractzng foreignperson names (F-names) from Chinese web pages is presented. Starting from asmall set of F-name characters, the method iteratively extracts text-segmentscontaining F-name characters from the web. A context cue-word set is used toimprove the efficiency of extractzng. Statistic information is used to recognizeF-names from these text-segments. A confidence measure is assigned to eachpossible F-name candidate and a segmentation digraph is constructed forselecting F-names from F-name candidates. The method is used to extract 10000F-names from the Internet and the recognition precision is about 87%. Theresults show that the proposed method is effective. ICIC International ? 2010. 发表时间:2010-02-01