location: Current position: Home >> Scientific Research >> Paper Publications

HTML Tree Parsing Algorithm Based on Pre-extracted Data

Hits:

Indexed by:会议论文

Date of Publication:2009-06-27

Included Journals:EI、CPCI-S、CPCI-SSH、Scopus

Page Number:52-52

Abstract:In the paper, a new method of extracting HTML Tree from web pages is proposed. Its main idea is that the parts of web pages which are not easy to parse including tags and attributes should be handled previously, then the remaining parts are tidied and parsed, and then both the two former extracted parts are deposited in the tree. As integrated the tidying process and the parsing process, the new method does not only keep the web data integrity but also simplify the complexity of algorithms. The test shows that it can parse all kinds of web pages and provide concrete fault tolerance mechanisms. © 2009 IEEE.

Pre One:网页正文信息抽取新方法

Next One:Specification of SA-RBAC policy based on Colored Petri Net