• Journal of Internet Computing and Services
    ISSN 2287 - 1136(Online) / ISSN 1598 - 0170 (Print)
    http://jics.or.kr/

Multi-source information integration framework using self-supervised learning-based language model


Hanmin Kim, Jeongbin Lee, Gyudong Park, Mye Sohn, Journal of Internet Computing and Services, Vol. 22, No. 6, pp. 141-150, Dec. 2021
10.7472/jksii.2021.22.6.141, Full Text:
Keywords: Self-Supervised Learning, Language Model, similar relationship between sentences, Multi-source information integration

Abstract

Based on Artificial Intelligence technology, AI-enabled warfare is expected to become the main issue in the future warfare. Natural language processing technology is a core technology of AI technology, and it can significantly contribute to reducing the information burden of underrstanidng reports, information objects and intelligences written in natural language by commanders and staff. In this paper, we propose a Language model-based Multi-source Information Integration (LAMII) framework to reduce the information overload of commanders and support rapid decision-making. The proposed LAMII framework consists of the key steps of representation learning based on language models in self-supervsied way and document integration using autoencoders. In the first step, representation learning that can identify the similar relationship between two heterogeneous sentences is performed using the self-supervised learning technique. In the second step, using the learned model, documents that implies similar contents or topics from multiple sources are found and integrated. At this time, the autoencoder is used to measure the information redundancy of the sentences in order to remove the duplicate sentences. In order to prove the superiority of this paper, we conducted comparison experiments using the language models and the benchmark sets used to evaluate their performance. As a result of the experiment, it was demonstrated that the proposed LAMII framework can effectively predict the similar relationship between heterogeneous sentence compared to other language models.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from November 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[APA Style]
Hanmin Kim, Jeongbin Lee, Gyudong Park, & Mye Sohn (2021). Multi-source information integration framework using self-supervised learning-based language model. Journal of Internet Computing and Services, 22(6), 141-150. DOI: 10.7472/jksii.2021.22.6.141.

[IEEE Style]
H. Kim, J. Lee, G. Park and M. Sohn, "Multi-source information integration framework using self-supervised learning-based language model," Journal of Internet Computing and Services, vol. 22, no. 6, pp. 141-150, 2021. DOI: 10.7472/jksii.2021.22.6.141.

[ACM Style]
Hanmin Kim, Jeongbin Lee, Gyudong Park, and Mye Sohn. 2021. Multi-source information integration framework using self-supervised learning-based language model. Journal of Internet Computing and Services, 22, 6, (2021), 141-150. DOI: 10.7472/jksii.2021.22.6.141.