• Journal of Internet Computing and Services
    ISSN 2287 - 1136 (Online) / ISSN 1598 - 0170 (Print)
    https://jics.or.kr/

Implementation of parallel blocked LU decomposition program for utilizing cache memory on GP-GPUs


Youngtae Kim, Doo-Han Kim, Myoung-Han Yu, Journal of Internet Computing and Services, Vol. 14, No. 6, pp. 41-48, Dec. 2013
10.7472/jksii.2013.14.6.41, Full Text:
Keywords: LU decomposition, GP-GPU, Nvidia CUDA, Parallel program

Abstract

GP-GPUs are general purposed GPUs for numerical computation based on multiple threads which are originally for graphic processing. GP-GPUs provide cache memory in a form of shared memory which user programs can access directly, unlikely typical cache memory. In this research, we implemented the parallel block LU decomposition program to utilize cache memory in GP-GPUs. The parallel blocked LU decomposition program designed with Nvidia CUDA C run 7~8 times faster than nun-blocked LU decomposition program in the same GP-GPU computation environment.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from November 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[APA Style]
Kim, Y., Kim, D., & Yu, M. (2013). Implementation of parallel blocked LU decomposition program for utilizing cache memory on GP-GPUs. Journal of Internet Computing and Services, 14(6), 41-48. DOI: 10.7472/jksii.2013.14.6.41.

[IEEE Style]
Y. Kim, D. Kim, M. Yu, "Implementation of parallel blocked LU decomposition program for utilizing cache memory on GP-GPUs," Journal of Internet Computing and Services, vol. 14, no. 6, pp. 41-48, 2013. DOI: 10.7472/jksii.2013.14.6.41.

[ACM Style]
Youngtae Kim, Doo-Han Kim, and Myoung-Han Yu. 2013. Implementation of parallel blocked LU decomposition program for utilizing cache memory on GP-GPUs. Journal of Internet Computing and Services, 14, 6, (2013), 41-48. DOI: 10.7472/jksii.2013.14.6.41.