黄德根Huang Degen

(教授)

 博士生导师  硕士生导师
学位:博士
性别:男
毕业院校:大连理工大学
所在单位:计算机科学与技术学院
电子邮箱:huangdg@dlut.edu.cn

论文成果

Automatic Microblog-Oriented Unknown Word Recognition with Unsupervised Method

发表时间:2019-03-11 点击次数:

论文名称:Automatic Microblog-Oriented Unknown Word Recognition with Unsupervised Method
论文类型:期刊论文
发表刊物:CHINESE JOURNAL OF ELECTRONICS
收录刊物:SCIE
卷号:27
期号:1,1
页面范围:1-8
ISSN号:1022-4653
关键字:Low-frequency unknown words; Information entropy; Independence of strings; Modified usage of Accessor variety (AV)
摘要:As a prerequisite task in Natural language processing (NLP), Chinese word segmentation (CWS), is challenged by unknown words. Aiming to effectively detect Chinese unknown words, especially the low-frequency unknown words in unstructured microblog data, we modify the usage of Accessor variety (AV) to measure the context environments of core fragments and propose a novel variable, the Independence of strings, which is derived from the internal structure of segments. Our approach is unsupervised without using any manual materials. Due to the lack of manual resources of microblog-oriented unknown words extraction, we use sampling approach to assess the effectiveness of our method. Eicperimental results suggest our best system beats the baseline system as well as the state-of-the-art system by a significant improvement in F1-measure and the recall of low -frequency unknown words.
发表时间:2018-01-01