• Journal of Internet Computing and Services
    ISSN 2287 - 1136 (Online) / ISSN 1598 - 0170 (Print)
    https://jics.or.kr/

An Application-Level Fault Tolerant System For Synchronous Parallel Computation


Pil-Seong Park, Journal of Internet Computing and Services, Vol. 9, No. 5, pp. 185-0, Oct. 2008
Full Text:
Keywords: fault-tolerant, MPI(Message Passing Interface), synchronous algorithm, checkpointing/rollback

Abstract

An MTBF(mean time between failures) of large scale parallel systems is known to be only an order of several hours, and large computations sometimes result in a waste of huge amount of CPU time, However. the MPI(Message Passing Interface), a de facto standard for message passing parallel programming, suggests no possibility to handle such a problem. In this paper, we propose an application-level fault tolerant computation system, purely on the basis of the current MPI standard without using any non-standard fault tolerant MPI library, that can be used for general scientific synchronous parallel computation.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from November 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[APA Style]
Park, P. (2008). An Application-Level Fault Tolerant System For Synchronous Parallel Computation. Journal of Internet Computing and Services, 9(5), 185-0.

[IEEE Style]
P. Park, "An Application-Level Fault Tolerant System For Synchronous Parallel Computation," Journal of Internet Computing and Services, vol. 9, no. 5, pp. 185-0, 2008.

[ACM Style]
Pil-Seong Park. 2008. An Application-Level Fault Tolerant System For Synchronous Parallel Computation. Journal of Internet Computing and Services, 9, 5, (2008), 185-0.