location: Current position: Home >> Scientific Research >> Paper Publications

Combining large-scale unlabeled corpus and lexicon for Chinese polysemous word similarity computation

Hits:

Indexed by:会议论文

Date of Publication:2017-07-13

Included Journals:EI

Volume:10390 LNCS

Page Number:198-210

Abstract:Word embeddings have achieved an outstanding performance in word similarity measurement. However, most prior works focus on building models with one embedding per word, neglect the fact that a word can have multiple senses. This paper proposes two sense embedding learning methods based on large-scale unlabeled corpus and Lexicon respectively for Chinese polysemous words. The corpus-based method labels the senses of polysemous words by clustering the contexts with tf-idf weight, and using the HowNet to initialize the number of senses instead of simply inducing a fixed number for each polysemous word. The lexicon-based method extends the AutoExtend to Tongyici Cilin with some related lexicon constraints for sense embedding learning. Furthermore, these two methods are combined for Chinese polysemous word similarity computation. The experiments on the Chinese Polysemous Word Similarity Dataset show the effectiveness and complementarity of our two sense embedding learning methods. The final Spearman rank correlation coefficient achieves 0.582, which outperforms the state-of-the-art performance on the evaluation dataset. © Springer International Publishing AG 2017.

Pre One:Combining Context and Knowledge Representations for Chemical-disease Relation Extraction

Next One:Jointly learning bilingual sentiment and semantic representations for cross-language sentiment classification