个人信息Personal Information
副教授
博士生导师
硕士生导师
性别:女
毕业院校:大连理工大学
学位:博士
所在单位:计算机科学与技术学院
学科:人工智能
办公地点:大连理工大学创新园大厦B911
电子邮箱:zhouhuiwei@dlut.edu.cn
Combining large-scale unlabeled corpus and lexicon for Chinese polysemous word similarity computation
点击次数:
论文类型:会议论文
发表时间:2017-07-13
收录刊物:EI
卷号:10390 LNCS
页面范围:198-210
摘要:Word embeddings have achieved an outstanding performance in word similarity measurement. However, most prior works focus on building models with one embedding per word, neglect the fact that a word can have multiple senses. This paper proposes two sense embedding learning methods based on large-scale unlabeled corpus and Lexicon respectively for Chinese polysemous words. The corpus-based method labels the senses of polysemous words by clustering the contexts with tf-idf weight, and using the HowNet to initialize the number of senses instead of simply inducing a fixed number for each polysemous word. The lexicon-based method extends the AutoExtend to Tongyici Cilin with some related lexicon constraints for sense embedding learning. Furthermore, these two methods are combined for Chinese polysemous word similarity computation. The experiments on the Chinese Polysemous Word Similarity Dataset show the effectiveness and complementarity of our two sense embedding learning methods. The final Spearman rank correlation coefficient achieves 0.582, which outperforms the state-of-the-art performance on the evaluation dataset. © Springer International Publishing AG 2017.