SC20 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Characterizing Scientific Workflows on HPC Systems Using Logs


Workshop:WORKS20: 15th Workshop on Workflows in Support of Large-Scale Science

Authors: Devarshi Ghoshal, Brian Austin, Deborah Bard, Christopher Daley, Glenn Lockwood, Nicholas J. Wright, and Lavanya Ramakrishnan (Lawrence Berkeley National Laboratory)


Abstract: Scientific advances depend on the ability to effectively and efficiently use high performance computing (HPC) systems to manage and run large, complex scientific workflows. Toward understanding the characteristics of these large scientific workflows, we propose two methods to identify workflows with temporal connections and data-dependencies from batch queue and I/O logs available at HPC systems. We use the two methods to characterize and correlate workflow runtime with node requests, I/O patterns, and resource usage on three months of log data available for Cori, a supercomputer at NERSC. A key result from our analyses shows that single-job workflows often do not use all allocated CPUs that provides opportunities to consider allocating resources at a finer-granularity.





Back to WORKS20: 15th Workshop on Workflows in Support of Large-Scale Science Archive Listing



Back to Full Workshop Archive Listing