• Journal of Internet Computing and Services
    ISSN 2287 - 1136(Online) / ISSN 1598 - 0170 (Print)
    http://jics.or.kr/

An effective approach to generate Wikipedia infobox of movie domain using semi-structured data


Hanif Bhuiyan, Kyeong-Jin Oh, Myung-Duk Hong, Geun-Sik Jo, Journal of Internet Computing and Services, Vol. 18, No. 3, pp. 49-61, Jun. 2017
10.7472/jksii.2017.18.3.49, Full Text:
Keywords: Wikipedia, Semantic relation, Identification, Infobox Template, Information Extraction, Semi-structured data.

Abstract

Wikipedia infoboxes have emerged as an important structured information source on the web. To compose infobox for an article, considerable amount of manual effort is required from an author. Due to this manual involvement, infobox suffers from inconsistency, data heterogeneity, incompleteness, schema drift etc. Prior works attempted to solve those problems by generating infobox automatically based on the corresponding article text. However, there are many articles in Wikipedia that do not have enough text content to generate infobox. In this paper, we present an automated approach to generate infobox for movie domain of Wikipedia by extracting information from several sources of the web instead of relying on article text only. The proposed methodology has been developed using semantic relations of article content and available semi-structured information of the web. It processes the article text through some classification processes to identify the template from the large pool of template list. Finally, it extracts the information for the corresponding template attributes from web and thus generates infobox. Through a comprehensive experimental evaluation the proposed scheme was demonstrated as an effective and efficient approach to generate Wikipedia infobox.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from November 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[APA Style]
Hanif Bhuiyan, Kyeong-Jin Oh, Myung-Duk Hong, & Geun-Sik Jo (2017). An effective approach to generate Wikipedia infobox of movie domain using semi-structured data. Journal of Internet Computing and Services, 18(3), 49-61. DOI: 10.7472/jksii.2017.18.3.49.

[IEEE Style]
H. Bhuiyan, K. Oh, M. Hong and G. Jo, "An effective approach to generate Wikipedia infobox of movie domain using semi-structured data," Journal of Internet Computing and Services, vol. 18, no. 3, pp. 49-61, 2017. DOI: 10.7472/jksii.2017.18.3.49.

[ACM Style]
Hanif Bhuiyan, Kyeong-Jin Oh, Myung-Duk Hong, and Geun-Sik Jo. 2017. An effective approach to generate Wikipedia infobox of movie domain using semi-structured data. Journal of Internet Computing and Services, 18, 3, (2017), 49-61. DOI: 10.7472/jksii.2017.18.3.49.