• Journal of Internet Computing and Services
    ISSN 2287 - 1136 (Online) / ISSN 1598 - 0170 (Print)
    https://jics.or.kr/

Building Training Data for Public Sentiment Analysis Using FoodTech Consumer Reactions on Social Media


Su-rak Son, Yi-na Jeong, Journal of Internet Computing and Services, Vol. 27, No. 1, pp. 181-188, Feb. 2026
10.7472/jksii.2026.27.1.181, Full Text:  HTML
Keywords: sentiment analysis, FoodTech, social media comments, Dataset Construction, automatic dataset construction

Abstract

This study presents a fully automated pipeline that constructs a train-ready sentiment (positive/neutral/negative) and multi-label topic (10 categories) dataset from public YouTube comments in the FoodTech domain. The pipeline collects comments via a reviewer-channel whitelist and topic-context queries; applies privacy masking and normalization; and removes exact and near duplicates using character 3–5-gram TF-IDF with cosine-radius filtering. Sentiment labels are decided at the clause level by domain lexicons (including slang and emojis), intensifiers/attenuators, negation scope, and anchor terms; topics are assigned by a rule-based dictionary. A confidence mixture (conf_mix) combines sentiment and topic confidences, and a two-tier core–buffer acceptance policy balances precision and coverage. After an additional deduplication pass, the data are split 70/15/15. In our run, the cleaned corpus contains 18,456 rows; the acceptance rate is 18.96%, yielding 3,499 instances (Train 2,799 / Dev 350 / Test 350 with matched distributions). A baseline with character n-gram TF-IDF and logistic regression achieves Accuracy 0.871 and Macro-F1 0.854 for sentiment, and micro-F1 0.845/macro-F1 0.583 for topics. We report 95% confidence intervals via 1,000-sample bootstrap, indicating statistical stability. The resulting dataset serves as a practical, reproducible starting point for domain-specific sentiment modeling without manual annotation, while future gains are expected from increasing negative samples and expanding long-tail topic lexicons.


Statistics
Show / Hide Statistics

Statistics (Past 3 Years)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[APA Style]
Son, S. & Jeong, Y. (2026). Building Training Data for Public Sentiment Analysis Using FoodTech Consumer Reactions on Social Media. Journal of Internet Computing and Services, 27(1), 181-188. DOI: 10.7472/jksii.2026.27.1.181.

[IEEE Style]
S. Son and Y. Jeong, "Building Training Data for Public Sentiment Analysis Using FoodTech Consumer Reactions on Social Media," Journal of Internet Computing and Services, vol. 27, no. 1, pp. 181-188, 2026. DOI: 10.7472/jksii.2026.27.1.181.

[ACM Style]
Su-rak Son and Yi-na Jeong. 2026. Building Training Data for Public Sentiment Analysis Using FoodTech Consumer Reactions on Social Media. Journal of Internet Computing and Services, 27, 1, (2026), 181-188. DOI: 10.7472/jksii.2026.27.1.181.