Accelerating Fusion Energy Experimental Workflows Using HPC Resources
TimeFriday, 13 November 20203:50pm - 4:15pm EST
DescriptionExperiments on magnetic fusion energy routinely generate high-velocity, large-volume datasets. In experiments, which last about one minute with about 30 minute cool-down phases in between them, numerous diagnostics sample high-temperature fusion plasmas with ever increasing spatial and temporal resolution. Analyzing and presenting these measurements in near real-time to the science team aids in the rapid assessment of just-concluded experiments. It also allows to make more informed decisions on setting up follow-up experiments, thereby accelerating scientific discovery. Facilitating this workflow on HPC facilities requires, besides raw computational power, consistently available high-network throughput, and the availability of a database that is accessible from inside the HPC facility as well as externally.
We are developing the DELTA framework that aims to tackle these challenges specific to fusion energy sciences. For one, the workflows are often non-static. Depending on the data source and situation, different analyses need to be performed. DELTA is aimed to be highly configurable and facilitate a broad range of workflows. Parallelizing data analysis routines, which are often run on workstations, to HPC settings also requires to make choices between data and task parallelism. Finally, DELTA aims to provide real-time analysis results. As such, it needs to span from the data source to the visualization output. Implementing this software architecture on modern HPC systems requires to coordinate the interaction of multiple software components.
In this paper we describe the implementation and performance of DELTA on Cori, a Cray XC-40 supercomputer operated by the National Energy Research Scientific Compute Center (NERSC) in California. Leveraging the ADIOS2 I/O library, DELTA allows to routinely stream measurement data from the KSTAR fusion facility in Korea to Cori with more than 500 MByte/sec. Distributing data analysis tasks among Cori compute nodes allows to perform routine correlation analysis over 100 times faster than on a traditional single-core workstations. The analyzed data is stored on a local no-SQL database instance. There it is consumed by a single-page web application, running on NERSC's spin container service, to provide real-time visualization to the science team. We further describe current efforts to incorporate machine learning for data compression and select relevant portions of the data stream to analyze.