大连理工大学  登录  English 
张宪超
点赞:

教授   博士生导师   硕士生导师

主要任职: 国防(先进)科学技术发展研究院副院长

性别: 男

毕业院校: 中国科技大学

学位: 博士

在职信息:在职

所在单位: 软件学院

学科: 计算机应用技术 软件工程

电子邮箱:

手机版

访问量:

开通时间 : ..

最后更新时间: ..

当前位置: 中文主页 >> 科学研究 >> 论文成果
GLTM: A Global and Local Word Embedding-Based Topic Model for Short Texts

点击量:

论文类型: 期刊论文

第一作者: Liang, Wenxin

通讯作者: Zhang, XC (reprint author), Dalian Univ Technol, Sch Software, Dalian 116620, Peoples R China.

合写作者: Feng, Ran,Liu, Xinyue,Li, Yuangang,Zhang, Xianchao

发表时间: 2018-01-01

发表刊物: IEEE ACCESS

收录刊物: SCIE

卷号: 6

页面范围: 43612-43621

ISSN号: 2169-3536

关键字: Text mining; context modeling; natural language processing; topic model; short text

摘要: Short texts have become a kind of prevalent source of information, and discovering topical information from short text collections is valuable for many applications. Due to the length limitation, conventional topic models based on document-level word co-occurrence information often fail to distill semantically coherent topics from short text collections. On the other hand, word embeddings as a powerful tool have been successfully applied in natural language processing. Word embeddings trained on large corpus are encoded with general semantic and syntactic information of words, and hence they can be leveraged to guide topic modeling for short text collections as supplementary information for sparse co-occurrence patterns. However, word embeddings are trained on large external corpus and the encoded information is not necessarily suitable for training data set of topic models, which is ignored by most existing models. In this article, we propose a novel global and local word embedding-based topic model (GLTM) for short texts. In the GLTM, we train global word embeddings from large external corpus and employ the continuous skip-gram model with negative sampling (SGNS) to obtain local word embeddings. Utilizing both the global and local word embeddings, the GLTM can distill semantic relatedness information between words which can be further leveraged by Gibbs sampler in the inference process to strengthen semantic coherence of topics. Compared with five state-of-the-art short text topic models on four real-world short text collections, the proposed GLTM exhibits the superiority in most cases.

辽ICP备05001357号 地址:中国·辽宁省大连市甘井子区凌工路2号 邮编:116024
版权所有:大连理工大学