大连理工大学主页平台管理系统张宪超 GLTM: A Global and Local Word Embedding-Based Topic Model for Short Texts 中文主页

张宪超

点赞：

教授博士生导师硕士生导师

性别：男

毕业院校：中国科技大学

学位：博士

所在单位：软件学院、国际信息与软件学院

学科：计算机应用技术
软件工程

电子邮箱：

手机版

访问量：

开通时间： ..

最后更新时间：..

同专业博导同专业硕导个人学术主页

当前位置：中文主页 >> 科学研究 >> 论文成果

GLTM: A Global and Local Word Embedding-Based Topic Model for Short Texts

点击次数：

发布时间：2019-03-12

论文类型：期刊论文

发表时间：2018-01-01

发表刊物：IEEE ACCESS

收录刊物：SCIE

卷号：6

页面范围：43612-43621

ISSN号：2169-3536

关键字：Text mining; context modeling; natural language processing; topic model; short text

摘要：Short texts have become a kind of prevalent source of information, and discovering topical information from short text collections is valuable for many applications. Due to the length limitation, conventional topic models based on document-level word co-occurrence information often fail to distill semantically coherent topics from short text collections. On the other hand, word embeddings as a powerful tool have been successfully applied in natural language processing. Word embeddings trained on large corpus are encoded with general semantic and syntactic information of words, and hence they can be leveraged to guide topic modeling for short text collections as supplementary information for sparse co-occurrence patterns. However, word embeddings are trained on large external corpus and the encoded information is not necessarily suitable for training data set of topic models, which is ignored by most existing models. In this article, we propose a novel global and local word embedding-based topic model (GLTM) for short texts. In the GLTM, we train global word embeddings from large external corpus and employ the continuous skip-gram model with negative sampling (SGNS) to obtain local word embeddings. Utilizing both the global and local word embeddings, the GLTM can distill semantic relatedness information between words which can be further leveraged by Gibbs sampler in the inference process to strengthen semantic coherence of topics. Compared with five state-of-the-art short text topic models on four real-world short text collections, the proposed GLTM exhibits the superiority in most cases.

上一条：Multi-Task Clustering with Model Relation Learning

下一条：Weighted Multi-View Spectral Clustering Based on Spectral Perturbation