潘东华

个人信息Personal Information

副教授

硕士生导师

性别：男

毕业院校：长春光学精密机械研究所

学位：硕士

所在单位：001173

电子邮箱：

移动版主页

个人学术主页

论文成果

当前位置：中文主页 >> 科学研究 >> 论文成果

Web Page Content Extraction Method Based on Link Density and Statistic

点击次数：

论文类型：会议论文

发表时间：2008-10-12

收录刊物：Scopus、CPCI-S、EI

页面范围：11452-11455

关键字：Knowledge Acquisition; Information Extraction; Web Page Content Extraction; Web Analysis

摘要：Web page content extraction is a key step for knowledge acquisition from the Internet. The physical layout of web pages is always composed of useful information, advertising links and images. So how to extract the right content and filter out irrelevant information is an important work. According to the different properties between content nodes and non-content nodes of web page represented as a tree, an algorithm based on link density and statistic is presented. This method increases the veracity of content extraction which will benefit the efficiency of information acquirement for corporations and organizations. The work of this paper is important for knowledge acquisition.

上一条：Knowledge Service Oriented Scientific and Technological Knowledge Portal

下一条：一种修正的向量空间模型在信息检索中的应用