SC20 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Refining Fortran Failed Images


Workshop:ESPM2 2020: Fifth International Workshop on Extreme Scale Programming Models and Middleware

Authors: Nathan T. Weeks (Iowa State University, Harvard University) and Glenn R. Luecke and Gurpur M. Prabhu (Iowa State University)


Abstract: The Fortran 2018 standard introduced syntax and semantics that allow a parallel application to recover from failed images (fail-stop processes) during execution. Teams are a key new language feature that facilitates this capability for applications that use collective subroutines: when a team of images is partitioned into one or more sets of new teams, only active images comprise the new teams; failed images are excluded.

This paper summarizes the language facilities for handling failed images specified in the Fortran 2018 standard and subsequent interpretations by the US Fortran Programming Language Standards Technical Committee. We propose standardizing some semantics that have been left processor (implementation) dependent to enable the development of portable fault-tolerant parallel Fortran applications.

Finally, we present a prototype implementation (based on GFortran, OpenCoarrays, and ULFM2) of a substantial subset of the Fortran 2018 failed images functionality, including semantic changes proposed herein.





Back to ESPM2 2020: Fifth International Workshop on Extreme Scale Programming Models and Middleware Archive Listing



Back to Full Workshop Archive Listing