Hits:
Indexed by:期刊论文
Date of Publication:2008-06-01
Journal:Journal of Computational Information Systems
Included Journals:EI
Volume:4
Issue:3
Page Number:915-922
ISSN No.:15539105
Abstract:Latent Semantic Indexing is a successful technology in information retrieval which attempts to explore the latent semantics implied by a query or a document through representing them in a dimension-reduced space, but it is not an optimal representation for text classification. It always drops the text classification performance when being applied to the whole training set because this completely unsupervised method ignores class discrimination while only concentrating on representation. An improved Latent Semantic Indexing method named Local Feature Latent Semantic Indexing (LFLSI) which considers the local features of each word representing the dimensionality of a text is proposed. It clarifies the meaning of each word in a specific text, so that it can select the most discriminative basis vectors using the training data iteratively. We adopt kNN and SVM to train and classify. Experiments conducted on the Reuters-21578 dataset indicate that the method is much better than traditional methods on classification within a much representative and effective dimension.