• Journal of Internet Computing and Services
    ISSN 2287 - 1136(Online) / ISSN 1598 - 0170 (Print)
    http://jics.or.kr/

Research on text mining based malware analysis technology using string information


Ji-hee Ha, Tae-jin Lee, Journal of Internet Computing and Services, Vol. 21, No. 1, pp. 45-55, Feb. 2020
10.7472/jksii.2020.21.1.45, Full Text:
Keywords: Malware, Malware analysis, String, Text Mining, TFIDF

Abstract

Due to the development of information and communication technology, the number of new / variant malicious codes is increasing rapidly every year, and various types of malicious codes are spreading due to the development of Internet of things and cloud computing technology. In this paper, we propose a malware analysis method based on string information that can be used regardless of operating system environment and represents library call information related to malicious behavior. Attackers can easily create malware using existing code or by using automated authoring tools, and the generated malware operates in a similar way to existing malware. Since most of the strings that can be extracted from malicious code are composed of information closely related to malicious behavior, it is processed by weighting data features using text mining based method to extract them as effective features for malware analysis. Based on the processed data, a model is constructed using various machine learning algorithms to perform experiments on detection of malicious status and classification of malicious groups. Data has been compared and verified against all files used on Windows and Linux operating systems. The accuracy of malicious detection is about 93.5%, the accuracy of group classification is about 90%. The proposed technique has a wide range of applications because it is relatively simple, fast, and operating system independent as a single model because it is not necessary to build a model for each group when classifying malicious groups. In addition, since the string information is extracted through static analysis, it can be processed faster than the analysis method that directly executes the code.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from November 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[APA Style]
Ji-hee Ha and Tae-jin Lee (2020). Research on text mining based malware analysis technology using string information. Journal of Internet Computing and Services, 21(1), 45-55. DOI: 10.7472/jksii.2020.21.1.45.

[IEEE Style]
J. Ha and T. Lee, "Research on text mining based malware analysis technology using string information," Journal of Internet Computing and Services, vol. 21, no. 1, pp. 45-55, 2020. DOI: 10.7472/jksii.2020.21.1.45.

[ACM Style]
Ji-hee Ha and Tae-jin Lee. 2020. Research on text mining based malware analysis technology using string information. Journal of Internet Computing and Services, 21, 1, (2020), 45-55. DOI: 10.7472/jksii.2020.21.1.45.