朴勇

个人信息Personal Information

副教授

硕士生导师

性别:男

毕业院校:大连理工大学

学位:博士

所在单位:软件学院、国际信息与软件学院

办公地点:大连经济开发区大连理工大学软件学院

联系方式:15641190702

电子邮箱:piaoy@dlut.edu.cn

扫描关注

论文成果

当前位置: 中文主页 >> 科学研究 >> 论文成果

XML Structure Extraction from plain texts with Hidden Markov Model

点击次数:

论文类型:会议论文

发表时间:2010-10-23

收录刊物:EI、Scopus

卷号:1

页面范围:560-564

摘要:Information extraction is one of the ways to convert unstructured text into structured records. Most of the previous work in this field are devoted to add semantic tags to specific textual content, so their structures are often plain which cannot illustrate relationships among semantic features. A novel approach, Structure Information Extraction System based on Hidden Markov Model (SIEHMM), for the task of extracting structure from plain texts is proposed in these papers, which utilizes path information for HMM training and automatically generate XML. Experiments on a real life dataset show SIEHMM has a high precision and recall ratio and can not only help solve problems of structural storage and text information retrieval, but also take advantages of XML to meet the future trends. ? 2010 IEEE.