• Journal of Internet Computing and Services
    ISSN 2287 - 1136 (Online) / ISSN 1598 - 0170 (Print)
    https://jics.or.kr/

The Identification Framework for source code author using Authorship Analysis and CNN


Gun-Yoon Shin, Dong-Wook Kim, Sung-sam Hong, Myung-Mook Han, Journal of Internet Computing and Services, Vol. 19, No. 5, pp. 33-41, Oct. 2018
10.7472/jksii.2018.19.5.33, Full Text:
Keywords: Author Identification, Authorship Analysis, Convolutional Neural Network, Machine Learning, Code Analysis

Abstract

Recently, Internet technology has developed, various programs are being created and therefore various codes are being made through many authors. On this aspect, some author deceive a program or code written by other particular author as they make it themselves and use other writers' code indiscriminately, or not indicating the exact code which has been used. Due to this makes it more and more difficult to protect the code. In this paper, we propose author identification framework using Authorship Analysis theory and Natural Language Processing(NLP) based on Convolutional Neural Network(CNN). We apply Authorship Analysis theory to extract features for author identification in the source code, and combine them with the features being used text mining to perform author identification using machine learning. In addition, applying CNN based natural language processing method to source code for code author classification. Therefore, we propose a framework for the identification of authors using the Authorship Analysis theory and the CNN. In order to identify the author, we need special features for identifying the authors only, and the NLP method based on the CNN is able to apply language with a special system such as source code and identify the author. identification accuracy based on Authorship Analysis theory is 95.1% and identification accuracy applied to CNN is 98%.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from November 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[APA Style]
Shin, G., Kim, D., Hong, S., & Han, M. (2018). The Identification Framework for source code author using Authorship Analysis and CNN. Journal of Internet Computing and Services, 19(5), 33-41. DOI: 10.7472/jksii.2018.19.5.33.

[IEEE Style]
G. Shin, D. Kim, S. Hong, M. Han, "The Identification Framework for source code author using Authorship Analysis and CNN," Journal of Internet Computing and Services, vol. 19, no. 5, pp. 33-41, 2018. DOI: 10.7472/jksii.2018.19.5.33.

[ACM Style]
Gun-Yoon Shin, Dong-Wook Kim, Sung-sam Hong, and Myung-Mook Han. 2018. The Identification Framework for source code author using Authorship Analysis and CNN. Journal of Internet Computing and Services, 19, 5, (2018), 33-41. DOI: 10.7472/jksii.2018.19.5.33.