郭崇慧

个人信息Personal Information

教授

博士生导师

硕士生导师

主要任职:Director of Institute of Systems Engineering

其他任职:大连市数据科学与知识管理重点实验室主任

性别:男

毕业院校:大连理工大学

学位:博士

所在单位:系统工程研究所

学科:管理科学与工程. 系统工程

办公地点:经济管理学院D337室

联系方式:0411-84708007

电子邮箱:dlutguo@dlut.edu.cn

扫描关注

论文成果

当前位置: 中文主页 >> 科学研究 >> 论文成果

A text semantic topic discovery method based on the conditional co-occurrence degree

点击次数:

论文类型:期刊论文

发表时间:2019-11-27

发表刊物:NEUROCOMPUTING

收录刊物:EI、SCIE

卷号:368

页面范围:11-24

ISSN号:0925-2312

关键字:Text mining; Topic discovery; Semantic information; Conditional co-occurrence degree

摘要:The topic discovery method, as an effective tool for semantic mining and a key means to extract new features from original text, plays an important role in the field of text mining and knowledge discovery. To solve the problems encountered in traditional topic models, such as the loss of semantic information and the ambiguity of topic concepts, as well as the crossover and coverage among topics, we propose a semantic topic discovery method based on the conditional co-occurrence degree (CCOD_STDM). First, every document is split into multiple subdocuments according to the semantic structure of the document and the independence decision rules. Second, combinatorial words with strong semantic relevance are extracted based on the conditional co-occurrence degree within the subdocuments. Based on these combinatorial words, new subdocuments are formed by feature expansion and content reconstruction. Third, "topic-word" distributions and "document-topic" distributions of new subdocuments are obtained by topic modeling with Gibbs sampling. Finally, "document-topic" distributions of the original documents are obtained by merging new subdocuments' "document-topic" distributions with specific strategies. The numerical experiments are compared with six topic models and two evaluation methods on seven kinds of public corpora, and the experimental results verify the superiority of CCOD_STDM and its efficiency in topic discovery. More importantly, a case study illustrates that the combinatorial words can effectively avoid the polysemy problem and can facilitate the condensation and summary of topics. (C) 2019 Elsevier B.V. All rights reserved.