大连理工大学主页平台管理系统黄德根--自然语言处理-- Combining large-scale unlabeled corpus and lexicon for Chinese polysemous word similarity computation

论文成果

当前位置: 自然语言处理 >> 科学研究 >> 论文成果

Combining large-scale unlabeled corpus and lexicon for Chinese polysemous word similarity computation

发表时间：2019-03-11 点击次数：

论文名称：Combining large-scale unlabeled corpus and lexicon for Chinese polysemous word similarity computation
论文类型：会议论文
收录刊物：EI
卷号：10390 LNCS
页面范围：198-210
摘要：Word embeddings have achieved an outstanding performance in word similarity measurement. However, most prior works focus on building models with one embedding per word, neglect the fact that a word can have multiple senses. This paper proposes two sense embedding learning methods based on large-scale unlabeled corpus and Lexicon respectively for Chinese polysemous words. The corpus-based method labels the senses of polysemous words by clustering the contexts with tf-idf weight, and using the HowNet to initialize the number of senses instead of simply inducing a fixed number for each polysemous word. The lexicon-based method extends the AutoExtend to Tongyici Cilin with some related lexicon constraints for sense embedding learning. Furthermore, these two methods are combined for Chinese polysemous word similarity computation. The experiments on the Chinese Polysemous Word Similarity Dataset show the effectiveness and complementarity of our two sense embedding learning methods. The final Spearman rank correlation coefficient achieves 0.582, which outperforms the state-of-the-art performance on the evaluation dataset. © Springer International Publishing AG 2017.
发表时间：2017-07-13

上一条：Jointly learning bilingual sentiment and semantic representations for cross-language sentiment classification

下一条：基于词语关系的词向量模型

首页

科学研究

教学研究

获奖信息

招生信息

学生信息

我的相册

教师博客

个人信息

黄德根Huang Degen

同专业博导

同专业硕导

个人学术主页

论文成果

Combining large-scale unlabeled corpus and lexicon for Chinese polysemous word similarity computation