• Journal of Internet Computing and Services
    ISSN 2287 - 1136(Online) / ISSN 1598 - 0170 (Print)
    http://jics.or.kr/

An Automated Topic Specific Web Crawler Calculating Degree of Relevance


Seo Hae-Sung, Choi Young-Soo, Choi Kyung-Hee, Jung Gi-Hyun, Noh Sang-Uk, Journal of Internet Computing and Services, Vol. 7, No. 3, pp. 155-168, Jun. 2006
Full Text:
Keywords: topic specific Web crawler (focused crawler), degree of relevance, Web page classification, Machine Learning, compiled rules

Abstract

It is desirable if users surfing on the Internet could find Web pages related to their interests as closely as possible. Toward this ends, this paper presents a topic specific Web crawler computing the degree of relevance. collecting a cluster of pages given a specific topic, and refining the preliminary set of related web pages using term frequency/document frequency, entropy, and compiled rules. In the experiments, we tested our topic specific crawler in terms of the accuracy of its classification, crawling efficiency, and crawling consistency. First, the classification accuracy using the set of rules compiled by CN2 was the best, among those of C4.5 and back propagation learning algorithms. Second, we measured the classification efficiency to determine the best threshold value affecting the degree of relevance. In the third experiment, the consistency of our topic specific crawler was measured in terms of the number of the resulting URLs overlapped with different starting URLs. The experimental results imply that our topic specific crawler was fairly consistent, regardless of the starting URLs randomly chosen.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from November 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[APA Style]
Seo Hae-Sung, Choi Young-Soo, Choi Kyung-Hee, Jung Gi-Hyun, & Noh Sang-Uk (2006). An Automated Topic Specific Web Crawler Calculating Degree of Relevance. Journal of Internet Computing and Services, 7(3), 155-168.

[IEEE Style]
S. Hae-Sung, C. Young-Soo, C. Kyung-Hee, J. Gi-Hyun and N. Sang-Uk, "An Automated Topic Specific Web Crawler Calculating Degree of Relevance," Journal of Internet Computing and Services, vol. 7, no. 3, pp. 155-168, 2006.

[ACM Style]
Seo Hae-Sung, Choi Young-Soo, Choi Kyung-Hee, Jung Gi-Hyun, and Noh Sang-Uk. 2006. An Automated Topic Specific Web Crawler Calculating Degree of Relevance. Journal of Internet Computing and Services, 7, 3, (2006), 155-168.