高级工程师
性别: 男
毕业院校: 大连理工大学
学位: 博士
所在单位: 计算机科学与技术学院
学科: 计算机应用技术
办公地点: 创新园大厦D0103房间
联系方式: QQ:2407849530
电子邮箱: xukan@dlut.edu.cn
qq : 2407849530
开通时间: ..
最后更新时间: ..
点击次数:
论文类型: 期刊论文
发表时间: 2013-04-01
发表刊物: JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY
收录刊物: SCIE、EI、SSCI、Scopus
卷号: 64
期号: 4
页面范围: 818-828
ISSN号: 1532-2882
关键字: machine learning; information retrieval; searching
摘要: The central issue in language model estimation is smoothing, which is a technique for avoiding zero probability estimation problem and overcoming data sparsity. There are three representative smoothing methods: Jelinek-Mercer (JM) method; Bayesian smoothing using Dirichlet priors (Dir) method; and absolute discounting (Dis) method, whose parameters are usually estimated empirically. Previous research in information retrieval (IR) on smoothing parameter estimation tends to select a single value from optional values for the collection, but it may not be appropriate for all the queries. The effectiveness of all the optional values should be considered to improve the ranking performance. Recently, learning to rank has become an effective approach to optimize the ranking accuracy by merging the existing retrieval methods. In this article, the smoothing methods for language modeling in information retrieval (LMIR) with different parameters are treated as different retrieval methods, then a learning to rank approach to learn a ranking model based on the features extracted by smoothing methods is presented. In the process of learning, the effectiveness of all the optional smoothing parameters is taken into account for all queries. The experimental results on the Learning to Rank for Information Retrieval (LETOR) LETOR3.0 and LETOR4.0 data sets show that our approach is effective in improving the performance of LMIR.