location: Current position: Home >> Scientific Research >> Paper Publications

Dimension reduction of latent semantic indexing extracting from local feature space

Hits:

Indexed by:期刊论文

Date of Publication:2008-06-01

Journal:Journal of Computational Information Systems

Included Journals:EI

Volume:4

Issue:3

Page Number:915-922

ISSN No.:15539105

Abstract:Latent Semantic Indexing is a successful technology in information retrieval which attempts to explore the latent semantics implied by a query or a document through representing them in a dimension-reduced space, but it is not an optimal representation for text classification. It always drops the text classification performance when being applied to the whole training set because this completely unsupervised method ignores class discrimination while only concentrating on representation. An improved Latent Semantic Indexing method named Local Feature Latent Semantic Indexing (LFLSI) which considers the local features of each word representing the dimensionality of a text is proposed. It clarifies the meaning of each word in a specific text, so that it can select the most discriminative basis vectors using the training data iteratively. We adopt kNN and SVM to train and classify. Experiments conducted on the Reuters-21578 dataset indicate that the method is much better than traditional methods on classification within a much representative and effective dimension.

Pre One:Primary content block detection from Web page clusters through entropy and semantic distance

Next One:基于粗糙集理论的不完备数据填补方法