Current and Future Converged Cloud-HPC Workflows at LLNL
SessionHPC in the Cloud
Event Type
State of the Practice Talk
Best Practices
Cloud and Distributed Computing
TP
TimeTuesday, 17 November 20202pm - 2:30pm EDT
LocationTrack 7
DescriptionCurrent and emerging scientific workflows at the Lawrence Livermore National Laboratory (LLNL) require the integration of cloud technologies with traditional HPC to make discoveries. In this talk, we present prominent workflow examples, trends in these converged workflows and gaps that they face at one of the world's largest computing centers. Based on application examples, we will describe successful workflow patterns that make use of loose convergence between HPC clusters and on-premises container orchestration clusters. While the converged approach is making significant strides, we still find critical gaps such as lack of integration with resource and job management software, keeping it from realizing its full potential. We will discuss how LLNL is co-designing our critical software infrastructure with workflow teams, the computing facility and industry partners. Finally, we will highlight some of the key techniques we use to address outstanding challenges in resource expression and scheduling in a converged environment.

