• Journal of Internet Computing and Services
    ISSN 2287 - 1136 (Online) / ISSN 1598 - 0170 (Print)
    https://jics.or.kr/

A Survey of Semi-Supervised Learning in Cybersecurity: Methods, Domains, and Guidelines for Fair Evaluation


Suchul Lee, Journal of Internet Computing and Services, Vol. 26, No. 6, pp. 1-12, Dec. 2025
10.7472/jksii.2025.26.6.1, Full Text:  HTML
Keywords: Semi-supervised learning, cybersecurity, Pseudo-labeling, Consistency regularization

Abstract

Cybersecurity persistently exposes the limits of purely supervised learning due to label scarcity and rapid distribution shift. This paper offers a comprehensive survey of semi-supervised learning (SSL) in cybersecurity, organized along two axes: methods (pseudo-labeling/teacher–student, consistency regularization, graph-based approaches) and application domains (malware, encrypted traffic, intrusion detection/IoT, web/phishing). Numerous SSL studies empirically support the low-density separation principle, showing that under limited labels SSL stabilizes in-cluster predictions and pushes the decision boundary toward low-density regions, thereby improving both label efficiency and generalization. By domain: (1) in malware/phishing, where cluster separation is comparatively clear, pseudo-labeling/teacher–student methods are effective; (2) for traffic flows or image-like signals, consistency regularization is advantageous; and (3) for relational data—where interactions are primary signals, such as host–process–file behavior graphs, device-to-device communication/session graphs, and URL–domain–certificate link graphs—graph-based SSL has clear strengths. We introduce the notion of a label snapshot (the time-specific availability and reliability of labels) to make evaluation and reporting explicitly time-aware, and we propose practical guidelines on calibrated confidence thresholds, augmentation–data alignment, and graph smoothing/propagation settings. Finally, for fair and reproducible evaluation, we recommend time-aware splits, label snapshots, and operating-point metrics as core assessment practices.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from November 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[APA Style]
Lee, S. (2025). A Survey of Semi-Supervised Learning in Cybersecurity: Methods, Domains, and Guidelines for Fair Evaluation. Journal of Internet Computing and Services, 26(6), 1-12. DOI: 10.7472/jksii.2025.26.6.1.

[IEEE Style]
S. Lee, "A Survey of Semi-Supervised Learning in Cybersecurity: Methods, Domains, and Guidelines for Fair Evaluation," Journal of Internet Computing and Services, vol. 26, no. 6, pp. 1-12, 2025. DOI: 10.7472/jksii.2025.26.6.1.

[ACM Style]
Suchul Lee. 2025. A Survey of Semi-Supervised Learning in Cybersecurity: Methods, Domains, and Guidelines for Fair Evaluation. Journal of Internet Computing and Services, 26, 6, (2025), 1-12. DOI: 10.7472/jksii.2025.26.6.1.