• Journal of Internet Computing and Services
    ISSN 2287 - 1136(Online) / ISSN 1598 - 0170 (Print)
    http://jics.or.kr/

Study on the Improvement of Extraction Performance for Domain Knowledge based Wrapper Generation


Jeong Chang-Hoo, Choi Yun-Soo, Seo Jeong-Hyeon, Yoon Hwa-Mook, Journal of Internet Computing and Services, Vol. 7, No. 4, pp. 67-78, Aug. 2006
Full Text:
Keywords: Domain Knowledge, Wrapper, Information Extraction

Abstract

Wrappers play an important role in extracting specified information from various sources. Wrapper rules by which information is extracted are often created from the domain-specific knowledge. Domain-specific knowledge helps recognizing the meaning the text representing various entities and values and detecting their formats However, such domain knowledge becomes powerless when value-representing data are not labeled with appropriate textual descriptions or there is nothing but a hyper link when certain text labels or values are expected. In order to alleviate these problems, we propose a probabilistic method for recognizing the entity type, i.e. generating wrapper rules, when there is no label associated with value-representing text. In addition, we have devised a method for using the information reachable by following hyperlinks when textual data are not immediately available on the target web page. Our experimental work shows that the proposed methods help increasing precision of the resulting wrapper, particularly extracting the title information, the most important entity on a web page. The proposed methods can be useful in making a more efficient and correct information extraction system for various sources of information without user intervention.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from November 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[APA Style]
Jeong Chang-Hoo, Choi Yun-Soo, Seo Jeong-Hyeon, & Yoon Hwa-Mook (2006). Study on the Improvement of Extraction Performance for Domain Knowledge based Wrapper Generation. Journal of Internet Computing and Services, 7(4), 67-78.

[IEEE Style]
J. Chang-Hoo, C. Yun-Soo, S. Jeong-Hyeon and Y. Hwa-Mook, "Study on the Improvement of Extraction Performance for Domain Knowledge based Wrapper Generation," Journal of Internet Computing and Services, vol. 7, no. 4, pp. 67-78, 2006.

[ACM Style]
Jeong Chang-Hoo, Choi Yun-Soo, Seo Jeong-Hyeon, and Yoon Hwa-Mook. 2006. Study on the Improvement of Extraction Performance for Domain Knowledge based Wrapper Generation. Journal of Internet Computing and Services, 7, 4, (2006), 67-78.