大连理工大学主页平台管理系统潘东华 RESEARCH ON THEMATIC WORD EXTRACTION BASED ON HIGH QUALITY DATA SOURCES ON THE WEB Home

Current position: Home >> Scientific Research >> Paper Publications

Web Page Content Extraction Method Based on Link Density and Statistic

Release Time:2019-03-11 Hits:

Indexed by: Conference Paper

Date of Publication: 2008-10-12

Included Journals: Scopus、CPCI-S、EI

Page Number: 11452-11455

Key Words: Knowledge Acquisition; Information Extraction; Web Page Content Extraction; Web Analysis

Abstract: Web page content extraction is a key step for knowledge acquisition from the Internet. The physical layout of web pages is always composed of useful information, advertising links and images. So how to extract the right content and filter out irrelevant information is an important work. According to the different properties between content nodes and non-content nodes of web page represented as a tree, an algorithm based on link density and statistic is presented. This method increases the veracity of content extraction which will benefit the efficiency of information acquirement for corporations and organizations. The work of this paper is important for knowledge acquisition.

Prev One:Knowledge Service Oriented Scientific and Technological Knowledge Portal

Next One:一种修正的向量空间模型在信息检索中的应用

Home

Scientific Research

Teaching Research

Awards and Honours

Enrollment Information

Student Information

My Album

Blog

Web Page Content Extraction Method Based on Link Density and Statistic