location: Current position: Home >> Scientific Research >> Paper Publications

A text semantic topic discovery method based on the conditional co-occurrence degree

Hits:

Indexed by:Journal Papers

Date of Publication:2019-11-27

Journal:NEUROCOMPUTING

Included Journals:EI、SCIE

Volume:368

Page Number:11-24

ISSN No.:0925-2312

Key Words:Text mining; Topic discovery; Semantic information; Conditional co-occurrence degree

Abstract:The topic discovery method, as an effective tool for semantic mining and a key means to extract new features from original text, plays an important role in the field of text mining and knowledge discovery. To solve the problems encountered in traditional topic models, such as the loss of semantic information and the ambiguity of topic concepts, as well as the crossover and coverage among topics, we propose a semantic topic discovery method based on the conditional co-occurrence degree (CCOD_STDM). First, every document is split into multiple subdocuments according to the semantic structure of the document and the independence decision rules. Second, combinatorial words with strong semantic relevance are extracted based on the conditional co-occurrence degree within the subdocuments. Based on these combinatorial words, new subdocuments are formed by feature expansion and content reconstruction. Third, "topic-word" distributions and "document-topic" distributions of new subdocuments are obtained by topic modeling with Gibbs sampling. Finally, "document-topic" distributions of the original documents are obtained by merging new subdocuments' "document-topic" distributions with specific strategies. The numerical experiments are compared with six topic models and two evaluation methods on seven kinds of public corpora, and the experimental results verify the superiority of CCOD_STDM and its efficiency in topic discovery. More importantly, a case study illustrates that the combinatorial words can effectively avoid the polysemy problem and can facilitate the condensation and summary of topics. (C) 2019 Elsevier B.V. All rights reserved.

Pre One:Big Data Analytics in Healthcare: Data-Driven Methods for Typical Treatment Pattern Mining

Next One:Products Ranking Through Aspect-Based Sentiment Analysis of Online Heterogeneous Reviews