Workshop:CANOPIE-HPC: Containers and New Orchestration Paradigms for Isolated Environments in HPC
Authors: Jayjeet Chakraborty, Carlos Maltzahn, and Ivo Jimenez (University of California, Santa Cruz)
Abstract: Researchers working in various fields of computational science often find it difficult to reproduce experiments from artifacts like code, data, diagrams and results which are left behind by the previous researchers. The code developed on one machine often fails to run on other machines due to differences in hardware architecture, OS and software dependencies, among others. This is accompanied by the difficulty in understanding how artifacts are organized, as well as in using them in the correct order. Software containers can be used to address some of these problems, and thus researchers and developers have built scientific workflow engines that execute the steps of a workflow in separate containers. Existing container-native workflow engines assume the availability of infrastructure deployed in the cloud or HPC centers. In this paper, we present Popper, a container-native workflow engine that does not assume the presence of a Kubernetes cluster or any service other than a container engine such as Docker or Podman. We introduce the design and architecture of Popper and describe how it abstracts away the complexity of multiple container engines and resource managers, enabling users to focus only on writing workflow logic. With Popper, researchers can build and validate workflows easily in almost any environment of their choice including local machines, SLURM-based HPC clusters, CI services or Kubernetes-based cloud computing environments. To exemplify the suitability of this workflow engine, we present three case studies in which we take examples from machine learning and high-performance computing and turn them into Popper workflows.