个人信息Personal Information
教授
博士生导师
硕士生导师
性别:男
毕业院校:大连理工大学
学位:博士
所在单位:系统工程研究所
学科:管理科学与工程. 系统工程
电子邮箱:yzhdang@dlut.edu.cn
Learning Domain Feature from Text Corpora
点击次数:
论文类型:会议论文
发表时间:2008-10-12
收录刊物:EI、CPCI-S、Scopus
页面范围:11478-11481
关键字:domain feature; length first segment; DFP analysis
摘要:For improving performance in automatically electronic documents processing, this paper proposes a concept of domain feature, which is defined as terms that can represent topics of a certain domain. Then it presents a non-lexicon-based approach automatically learning domain feature from text corpora. This approach combines the length first segment algorithm and domain feature possibility(DFP) algorithm. The former segments domain foreground corpora and extracts words and phrases in a satisfying recall rate, while the latter enhances the precision rate of learning by comparing different statistic properties that domain feature shows between foreground and background corpora. Experiments verify that given appropriate foreground and background corpora, this approach significantly improves efficiency in domain feature building and gets better result than manually building does. Algorithms combined in this approach can be widely used in other research domains of knowledge management.