Authors: Hariharan Devarajan (Illinois Institute of Technology), Huihuo Zheng (Argonne National Laboratory (ANL)), Xian-He Sun (Illinois Institute of Technology), and Venkatram Vishwanath (Argonne National Laboratory (ANL))
Abstract: Deep learning has been widely utilized in various science domains to achieve unprecedented results. These applications typically rely on massive datasets to train the networks. As the size of datasets grow rapidly, I/O becomes a major bottleneck in large scale distributed training. We characterize the I/O behaviors of several scientific deep learning applications running on our production machine, Theta, at Argonne Leadership Computing Facility, with a goal to identify potential bottlenecks and to provide guidance for developing efficient parallel I/O library for scientific deep learning. We found that workloads utilizing TensorFlow Data Pipeline can achieve efficient I/O through overlapping I/O with computation; however, they have potential scaling issues at larger scale as POSIX I/O is used underneath without parallel I/O. We also identified directions for I/O optimization for workloads utilizing a custom data streaming function. These workloads can potentially benefit from data prefetching, data sieving and asynchronous I/O.
Best Poster Finalist (BP): no
Poster: PDF
Poster summary: PDF
Back to Poster Archive Listing