• Journal of Internet Computing and Services
    ISSN 2287 - 1136(Online) / ISSN 1598 - 0170 (Print)
    http://jics.or.kr/

Feature-selection algorithm based on genetic algorithms using unstructured data for attack mail identification


Sung-Sam Hong, Dong-Wook Kim, Myung-Mook Han, Journal of Internet Computing and Services, Vol. 20, No. 1, pp. 1-10, Feb. 2019
10.7472/jksii.2019.20.1.01, Full Text:
Keywords: Security, Unstructured Data, Intelligent Data Analysis, Feature selection, Attack Mail

Abstract

Since big-data text mining extracts many features and data, clustering and classification can result in high computational complexity and low reliability of the analysis results. In particular, a term document matrix obtained through text mining represents term-document features, but produces a sparse matrix. We designed an advanced genetic algorithm (GA) to extract features in text mining for detection model. Term frequency inverse document frequency (TF-IDF) is used to reflect the document-term relationships in feature extraction. Through a repetitive process, a predetermined number of features are selected. And, we used the sparsity score to improve the performance of detection model. If a spam mail data set has the high sparsity, detection model have low performance and is difficult to search the optimization detection model. In addition, we find a low sparsity model that have also high TF-IDF score by using s(F) where the numerator in fitness function. We also verified its performance by applying the proposed algorithm to text classification. As a result, we have found that our algorithm shows higher performance (speed and accuracy) in attack mail classification.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from November 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[APA Style]
Sung-Sam Hong, Dong-Wook Kim, & Myung-Mook Han (2019). Feature-selection algorithm based on genetic algorithms using unstructured data for attack mail identification. Journal of Internet Computing and Services, 20(1), 1-10. DOI: 10.7472/jksii.2019.20.1.01.

[IEEE Style]
S. Hong, D. Kim and M. Han, "Feature-selection algorithm based on genetic algorithms using unstructured data for attack mail identification," Journal of Internet Computing and Services, vol. 20, no. 1, pp. 1-10, 2019. DOI: 10.7472/jksii.2019.20.1.01.

[ACM Style]
Sung-Sam Hong, Dong-Wook Kim, and Myung-Mook Han. 2019. Feature-selection algorithm based on genetic algorithms using unstructured data for attack mail identification. Journal of Internet Computing and Services, 20, 1, (2019), 1-10. DOI: 10.7472/jksii.2019.20.1.01.