• Journal of Internet Computing and Services
    ISSN 2287 - 1136 (Online) / ISSN 1598 - 0170 (Print)
    https://jics.or.kr/

Analyzing the Impact of Data Anonymization on Artificial Intelligence Model Performance


Soyeon Park, Seongjin Ahn, Journal of Internet Computing and Services, Vol. 26, No. 4, pp. 69-78, Aug. 2025
10.7472/jksii.2025.26.4.69, Full Text:  HTML
Keywords: Personal Data, De-Identification, K-anonymity, Logistic Regression

Abstract

Artificial intelligence models trained on personal data provide useful and practical functions closely related to real-life applications across various fields. However, the issue of personal data breaches has emerged as a critical challenge that AI services must address. Consequently, many countries have established laws and guidelines mandating the application of de-identification when using personal data in AI systems. While de-identification ensures the safety of personal information, it can significantly impact AI model performance when utilizing de-identified data. This study proposes an optimal de-identification level determination method balancing privacy protection and model performance. In experiments using logistic regression models trained on de-identified data satisfying various levels of the k-anonymity privacy model, the model’s accuracy at the k = 2 level was approximately 82.1%, comparable to that of the original data. However, when the de-identification level was increased to k = 5, the accuracy sharply dropped to approximately 74.9% and then stabilized within the range of 74–76%. Notably, the recall for the minority class declined drastically, but applying class weighting and the SMOTE technique effectively improved performance, demonstrating that imbalanced data conditions can be addressed through additional adjustments or de-identification level tuning.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from November 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[APA Style]
Park, S. & Ahn, S. (2025). Analyzing the Impact of Data Anonymization on Artificial Intelligence Model Performance. Journal of Internet Computing and Services, 26(4), 69-78. DOI: 10.7472/jksii.2025.26.4.69.

[IEEE Style]
S. Park and S. Ahn, "Analyzing the Impact of Data Anonymization on Artificial Intelligence Model Performance," Journal of Internet Computing and Services, vol. 26, no. 4, pp. 69-78, 2025. DOI: 10.7472/jksii.2025.26.4.69.

[ACM Style]
Soyeon Park and Seongjin Ahn. 2025. Analyzing the Impact of Data Anonymization on Artificial Intelligence Model Performance. Journal of Internet Computing and Services, 26, 4, (2025), 69-78. DOI: 10.7472/jksii.2025.26.4.69.