A 1024-Member Ensemble Data Assimilation with 3.5-km Mesh Global Weather Simulations Hisashi Yashiro (National Institute for Environmental Studies, Japan; RIKEN Center for Computational Science (R-CCS)); Koji Terasaki, Yuta Kawai, Shuhei Kudo, Takemasa Miyoshi, Toshiyuki Imamura, and Kazuo Minami (RIKEN Center for Computational Science (R-CCS)); Hikaru Inoue and Tatsuo Nishiki (Fujitsu Laboratories Ltd); Takayuki Saji (Metro Inc, Japan); Masaki Satoh (University of Tokyo, Atmosphere and Ocean Research Institute); and Hirofumi Tomita (RIKEN Center for Computational Science (R-CCS))
Accelerating Large-Scale Excited-State GW Calculations on Leadership HPC Systems Mauro Del Ben and Charlene Yang (Lawrence Berkeley National Laboratory); Zhenglu Li (University of California, Berkeley; Lawrence Berkeley National Laboratory); Felipe H. da Jornada (Stanford University); Steven G. Louie (University of California, Berkeley; Lawrence Berkeley National Laboratory); and Jack Deslippe (Lawrence Berkeley National Laboratory)
Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity Cong Guo (Shanghai Jiao Tong University); Bo Yang Hsueh (Nvidia Corporation); Jingwen Leng (Shanghai Jiao Tong University, Shanghai Qi Zhi Institute); Yuxian Qiu and Yue Guan (Shanghai Jiao Tong University); Zehuan Wang, Xiaoying Jia, and Xipeng Li (Nvidia Corporation); Minyi Guo (Shanghai Jiao Tong University, Shanghai Qi Zhi Institute); and Yuhao Zhu (University of Rochester)
Acceleration of Fusion Plasma Turbulence Simulations using the Mixed-Precision Communication-Avoiding Krylov Method Yasuhiro Idomura (Japan Atomic Energy Agency), Takuya Ina (RIKEN Center for Computational Science (R-CCS)), Yussuf Ali (Japan Atomic Energy Agency), and Toshiyuki Imamura (RIKEN Center for Computational Science (R-CCS))
AI-Driven Multiscale Simulations Illuminate Mechanisms of SARS-CoV-2 Spike Dynamics Lorenzo Casalino, Abigail Dommer, Zied Gaieb, Emilia P. Barros, Terra Stzain, and Surl-Hee Ahn (University of California, San Diego); Anda Trifan (University of Illinois); Alexander Brace (Argonne National Laboratory (ANL)); Anthony Bogetti (University of Pittsburgh); Heng Ma (Argonne National Laboratory (ANL)); Hyungro Lee and Matteo Turilli (Rutgers University); Syma Khalid (University of Southampton); Lillian Chong (University of Pittsburgh); Carlos Simmerling (Stony Brook University); David Hardy, Julio Maia, and James Phillips (University of Illinois); Thorsten Kurth and Abraham Stern (Nvidia Corporation); Lei Huang and John McCalpin (University of Texas); Mahidhar Tatineni (San Diego Supercomputer Center); Tom Gibbs (Nvidia Corporation); John Stone (University of Illinois); Shantenu Jha (Brookhaven National Laboratory); Arvind Ramanathan (Argonne National Laboratory (ANL)); and Rommie E Amaro (University of California, San Diego)
Alias-Free, Matrix-Free and Quadrature-Free Discontinuous Galerkin Algorithms for (Plasma) Kinetic Equations Ammar Hakim (Princeton Plasma Physics Laboratory) and James Juno (University of Maryland)
Alita: Comprehensive Performance Isolation through Bias Resource Management for Public Clouds Quan Chen, Shuai Xue, and Shang Zhao (Shanghai Jiao Tong University, Alibaba Cloud); Shanpei Chen, Yihao Wu, Yu Xu, Zhuo Song, Tao Ma, and Yong Yang (Alibaba Cloud); and Minyi Guo (Shanghai Jiao Tong University)
ANT-Man: Towards Agile Power Management in the Microservice Era Xiaofeng Hou, Chao Li, Jiacheng Liu, and Lu Zhang (Shanghai Jiao Tong University); Yang Hu (University of Texas, Dallas); and Minyi Guo (Shanghai Jiao Tong University)
Architecture and Performance Studies of 3D-Hyper-FleX-LION for Reconfigurable All-to-All HPC Networks Gengchen Liu, Roberto Proietti, Marjan Fariborz, Pouya Fotouhi, Xian Xiao, and S.J.Ben Yoo (University of California, Davis)
BATCH: Machine Learning Inference Serving on Serverless Platforms with Adaptive Batching Ahsan Ali (University of Nevada, Reno); Riccardo Pinciroli (College of William & Mary); Feng Yan (University of Nevada, Reno); and Evgenia Smirni (College of William & Mary)
BiQGEMM: Matrix Multiplication with Lookup Table For Binary-Coding-Based Quantized DNNs Yongkweon Jeon, Baeseong Park, Se Jung Kwon, Byeongwook Kim, Jeongin Yun, and Dongsoo Lee (Samsung)
BORA: A Bag Optimizer for Robotic Analysis Jian Zhang (ShanghaiTech University); Tao Xie (San Diego State University); Yuzhuo Jing, Yanjie Song, and Guanzhou Hu (ShanghaiTech University); Si Chen (West Chester University of Pennsylvania); and Shu Yin (ShanghaiTech University)
C-SAW: A Framework for Graph Sampling and Random Walk on GPUs Santosh Pandey (Stevens Institute of Technology), Lingda Li and Adolfy Hoisie (Brookhaven National Laboratory), Xiaoye S. Li (Lawrence Berkeley National Laboratory), and Hang Liu (Stevens Institute of Technology)
CAB-MPI: Exploring Interprocess Work-Stealing towards Balanced MPI Communication Kaiming Ouyang (University of California, Riverside); Min Si (Argonne National Laboratory (ANL)); Atsushi Hori (RIKEN Center for Computational Science (R-CCS)); Zizhong Chen (University of California, Riverside); and Pavan Balaji (Argonne National Laboratory (ANL))
CCAMP: An Integrated Translation and Optimization Framework for OpenACC and OpenMP Jacob Lambert (University of Oregon, Oak Ridge National Laboratory (ORNL)); Seyong Lee and Jeffrey S. Vetter (Oak Ridge National Laboratory (ORNL)); and Allen D. Malony (University of Oregon)
Cell-List based Molecular Dynamics on Many-Core Processors: A Case Study on Sunway TaihuLight Supercomputer Xiaohui Duan, Ping Gao, Meng Zhang, and Tingjian Zhang (Shandong University; National Supercomputing Center, Wuxi); Hongsong Meng (National Supercomputing Center, Wuxi); Yuxuan Li (Tsinghua University, China; National Supercomputing Center, Wuxi); Bertil Schmidt (Johannes Gutenberg University Mainz); Haohuan Fu, Lin Gan, and Wei Xue (Tsinghua University, China; National Supercomputing Center, Wuxi); Weiguo Liu (Shandong University; National Supercomputing Center, Wuxi); and Guangwen Yang (Tsinghua University, China; National Supercomputing Center, Wuxi)
Chronicles of Astra: Challenges and Lessons from the First Petascale Arm Supercomputer Kevin Pedretti, Andrew J. Younge, Simon D. Hammond, James H. Laros III, Matthew L. Curry, Michael J. Aguilar, Robert J. Hoekstra, and Ron Brightwell (Sandia National Laboratories)
Co-Design for A64FX Manycore Processor and "Fugaku" Mitsuhisa Sato, Yutaka Ishikawa, Hirofumi Tomita, Yuetsu Kodama, Tetsuya Odajima, Miwako Tsuji, and Hisashi Yashiro (RIKEN Center for Computational Science (R-CCS)) and Masaki Aoki, Naoyuki Shida, Ikuo Miyoshi, Kouichi Hirai, Atsushi Furuya, Akira Asato, Kuniki Morita, and Toshiyuki Shimizu (Fujitsu Ltd)
Compiler-Based Timing For Extremely Fine-Grain Preemptive Parallelism Souradip Ghosh, Michael Cuevas, Simone Campanoni, and Peter Dinda (Northwestern University)
Compiling Generalized Histograms for GPU Troels Henriksen and Sune Hellfritzsch (University of Copenhagen), Ponnuswamy Sadayappan (University of Utah), and Cosmin Oancea (Copenhagen University)
Convolutional Neural Network Training with Distributed K-FAC J. Gregory Pauloski (University of Texas); Zhao Zhang, Lei Huang, and Weijia Xu (Texas Advanced Computing Center (TACC)); and Ian T. Foster (University of Chicago, Argonne National Laboratory (ANL))
Cost-Aware Prediction of Uncorrected DRAM Errors in the Field Isaac Boixaderas, Darko Zivanovic, Sergi Moré, Javier Bartolome, David Vicente, Marc Casas, and Paul M. Carpenter (Barcelona Supercomputing Center) and Petar Radojković and Eduard Ayguadé (Barcelona Supercomputing Center, Polytechnic University of Catalonia)
CRAC: Checkpoint-Restart Architecture for CUDA with Streams and UVM Twinkle Jain and Gene Cooperman (Northeastern University)
Density Matrix Quantum Circuit Simulation via the BSP Machine on Modern GPU Clusters Ang Li and Omer Subasi (Pacific Northwest National Laboratory (PNNL)); Xiu Yang (Pacific Northwest National Laboratory (PNNL), Lehigh University); and Sriram Krishnamoorthy (Pacific Northwest National Laboratory (PNNL), Washington State University)
Distributed Many-to-Many Protein Sequence Alignment using Sparse Matrices Oguz Selvitopi (Lawrence Berkeley National Laboratory); Saliya Ekanayake (Microsoft Corporation); Giulia Guidi (University of California, Berkeley); Georgios A. Pavlopoulos (Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming"); Ariful Azad (Indiana University); and Aydin Buluc (Lawrence Berkeley National Laboratory)
Distributed-Memory DMRG via Sparse and Dense Parallel Tensor Contractions Ryan Levy, Edgar Solomonik, and Bryan Clark (University of Illinois)
Distributed-Memory Parallel Symmetric Nonnegative Matrix Factorization Srinivas Eswar and Koby Hayashi (Georgia Institute of Technology), Grey Ballard (Wake Forest University), Ramakrishnan Kannan (Oak Ridge National Laboratory (ORNL)), and Richard Vuduc and Haesun Park (Georgia Institute of Technology)
DRCCTPROF: A Fine-Grained Call Path Profiler for ARM-Based Clusters Qidong Zhao (William & Mary), Xu Liu (North Carolina State University), and Milind Chabbi (Scalable Machines Research)
Efficient 2D Tensor Network Simulation of Quantum Systems Yuchen Pang, Tianyi Hao, Annika Dugad, Yiqing Zhou, and Edgar Solomonik (University of Illinois at Urbana-Champaign)
An Efficient and Non-Intrusive GPU Scheduling Framework for Deep Learning Training Systems Shaoqi Wang (University of Colorado, Colorado Springs); Oscar J. Gonzalez (Nokia Bell Labs); Xiaobo Zhou (University of Colorado, Colorado Springs); and Thomas Williams, Brian D. Friedman, Martin Havemann, and Thomas Woo (Nokia Bell Labs)
Efficient Tiled Sparse Matrix Multiplication through Matrix Signatures Sureyya Emre Kurt (University of Utah), Aravind Sukumaran-Rajam (Washington State University), Fabrice Rastello (French Institute for Research in Computer Science and Automation (INRIA)), and Ponnuswamy Sadayappan (University of Utah)
Enabling Rapid COVID-19 Small Molecule Drug Design Through Scalable Deep Learning of Generative Models Sam Ade Jacobs, Tim Moon, Kevin McLoughlin, Derek Jones, David Hysom, Dong H. Ahn, John Gyllenhaal, Pythagoras Watson, Felice C. Lightsone, Jonathan E. Allen, Ian Karlin, and Brian Van Essen (Lawrence Livermore National Laboratory)
Evaluation of a Minimally Synchronous Algorithm for 2:1 Octree Balance Hansol Suh (Georgia Institute of Technology, School of Computational Science and Engineering) and Tobin Isaac (School of Computational Science and Engineering)
Experimental Evaluation of NISQ Quantum Computers: Error Measurement, Characterization, and Implications Tirthak Patel, Abhay Potharaju, Baolin Li, Rohan Basu Roy, and Devesh Tiwari (Northeastern University)
Fast Stencil-Code Computation on a Wafer-Scale Processor Kamil Rocki (Cerebras Systems); Dirk Van Essendelft (National Energy Technology Laboratory, Morgantown, WV); Ilya Sharapov, Robert Schreiber, Michael Morrison, Vladimir Kibardin, and Andrey Portnoy (Cerebras Systems); Jean Francois Dieteker (National Energy Technology Laboratory, Morgantown, WV; Leidos Research Support Team, Pittsburgh); Madhava Syamlal (National Energy Technology Laboratory, Morgantown, WV); and Michael James (Cerebras Systems)
FatPaths: Routing in Supercomputers and Data Centers when Shortest Paths Fall Short Maciej Besta and Marcel Schneider (ETH Zurich); Marek Konieczny and Karolina Cynk (AGH University of Science and Technology, Poland); and Erik Henriksson, Salvatore Di Girolamo, Ankit Singla, and Torsten Hoefler (ETH Zurich)
FBLAS: Streaming Linear Algebra on FPGA Tiziano De Matteis, Johannes de Fine Licht, and Torsten Hoefler (ETH Zurich)
FeatGraph: A Flexible and Efficient Backend for Graph Neural Network Systems Yuwei Hu (Cornell University); Zihao Ye, Minjie Wang, Jiali Yu, Da Zheng, Mu Li, and Zheng Zhang (Amazon Web Services); Zhiru Zhang (Cornell University); and Yida Wang (Amazon Web Services)
Foresight: Analysis That Matters for Data Reduction Pascal Grosset, Christopher M. Biwer, Jesus Pulido, Arvind T. Mohan, Ayan Biswas, John Patchett, Terece L. Turton, David H. Rogers, Daniel Livescu, and James Ahrens (Los Alamos National Laboratory)
GE-SpMM: General-Purpose Sparse Matrix-Matrix Multiplication on GPUs for Graph Neural Networks Guyue Huang, Guohao Dai, Yu Wang, and Huazhong Yang (Tsinghua University, China)
GEMS: GPU-Enabled Memory-Aware Model-Parallelism System for Distributed DNN Training Arpan Jain, Ammar Ahmad Awan, Asmaa M. Aljuhani, Jahanzeb Maqbool Hashmi, Quentin G. Anthony, Hari Subramoni, Dhabaleswar K. Panda, Raghu Machiraju, and Anil Parwani (Ohio State University)
GPU Lifetimes on Titan Supercomputer: Survival Analysis and Reliability George Ostrouchov, Don Maxwell, Rizwan A. Ashraf, and Christian Engelmann (Oak Ridge National Laboratory (ORNL)); Mallikarjun Shankar (Oak Ridge National Laboratory); and James H. Rogers (Oak Ridge National Laboratory (ORNL))
GPU-Trident: Efficient Modeling of Error Propagation in GPU Programs Abdul Rehman Anwer (University of British Columbia); Guanpeng Li (University of Iowa); Karthik Pattabiraman (University of British Columbia); and Michael B. Sullivan, Timothy Tsai, and Siva Kumar Sastry Hari (Nvidia Corporation)
GraphPi: High Performance Graph Pattern Matching through Effective Redundancy Elimination Tianhui Shi, Mingshu Zhai, Yi Xu, and Jidong Zhai (Tsinghua University, China)
GVPROF: A Value Profiler for GPU-Based Clusters Keren Zhou (Rice University), Yueming Hao (North Carolina State University), John Mellor-Crummey and Xiaozhu Meng (Rice University), and Xu Liu (North Carolina State University)
Herring: Rethinking the Parameter Server at Scale for the Cloud Indu Thangakrishnan, Derya Cavdar, Can Karakus, Piyush Ghai, Yauheni Selivonchyk, and Cory Pruce (Amazon Web Services)
A Hierarchical and Load-Aware Design for Large Message Neighborhood Collectives Seyedeh Mahdieh Ghazimirsaeed, Qinghua Zhou, Amit Ruhela, Mohammadreza Bayatpour, Hari Subramoni, and Dhabaleswar K. (DK) Panda (Ohio State University)
High-Performance Parallel Graph Coloring with Strong Guarantees on Work, Depth and Quality Maciej Besta and Armon Carigiet (ETH Zurich); Kacper Janda (AGH University of Science and Technology); and Zur Vonarburg-Shmaria, Lukas Gianinazzi, and Torsten Hoefler (ETH Zurich)
High-Throughput Virtual Laboratory for Drug Discovery Using Massive Datasets Jens Glaser, Josh V. Vermaas, and David M. Rogers (Oak Ridge National Laboratory); Jeff Larkin and Scott LeGrand (Nvidia Corporation); Swen Boehm and Matthew B. Baker (Oak Ridge National Laboratory); Aaron Scheinberg (Jubilee Development); Andreas F. Tillack (Scripps Research); and Mathialakan Thavappiragasam, Ada Sedova, and Oscar Hernandez (Oak Ridge National Laboratory)
HPC I/O Throughput Bottleneck Analysis with Explainable Local Models Mihailo Isakov and Eliakin del Rosario (Texas A&M University); Sandeep Madireddy, Prasanna Balaprakash, Phillip H. Carns, and Robert Ross (Argonne National Laboratory (ANL)); and Michel A. Kinsy (Texas A&M University)
Improving All-to-Many Personalized Communication in Two-Phase I/O Qiao Kang (Northwestern University); Robert Ross and Robert Latham (Argonne National Laboratory (ANL)); and Sunwoo Lee, Ankit Agrawal, Alok Choudhary, and Wei-keng Liao (Northwestern University)
An In-Depth Analysis of the Slingshot Interconnect Daniele De Sensi and Salvatore Di Girolamo (ETH Zurich), Kim H. McMahon and Duncan Roweth (Hewlett Packard Enterprise), and Torsten Hoefler (ETH Zurich)
INEC: Fast and Coherent In-Network Erasure Coding Haiyang Shi and Xiaoyi Lu (Ohio State University)
Iris: Allocation Banking and Identity and Access Management for the Exascale Era Gabor Torok, Mark R. Day, Rebecca Hartman-Baker, and Cory Snavely (National Energy Research Scientific Computing Center (NERSC), Lawrence Berkeley National Laboratory)
Job Characteristics on Large-Scale Systems: Long-Term Analysis, Quantification and Implications Tirthak Patel (Northeastern University); Zhengchun Liu, Rajkumar Kettimuthu, Paul Rich, and William Allcock (Argonne National Laboratory (ANL)); and Devesh Tiwari (Northeastern University)
Kraken: Memory-Efficient Continual Learning for Large-Scale Real-Time Recommendation Minhui Xie (Tsinghua University, China; Kuaishou Technology); Kai Ren (Kuaishou Technology); Youyou Lu (Tsinghua University, China); Guangxu Yang, Qingxing Xu, and Bihai Wu (Kuaishou Technology); Jiazhen Lin (Tsinghua University, China); Hongbo Ao and Wanhong Xu (Kuaishou Technology); and Jiwu Shu (Tsinghua University, China)
Live Forensics for HPC Systems: A Case Study on Distributed Storage Systems Saurabh Jha, Shengkun Cui, Subho S. Banerjee, Tianyin Xu, Jeremy Enos, Mike Showerman, Zbigniew Kalbarczyk, and Ravishankar K. Iyer (University of Illinois)
Massive Parallelization for Finding Shortest Lattice Vectors Based on Ubiquity Generator Framework Nariaki Tateiwa (Kyushu University); Yuji Shinano (Zuse Institute Berlin); Satoshi Nakamura (Nippon Telegraph and Telephone Corporation); and Akihiro Yoshida, Shizuo Kaji, Masaya Yasuda, and Katsuki Fujisawa (Kyushu University)
MeshfreeFlowNet: A Physics-Constrained Deep Continuous Space-Time Super-Resolution Framework Chiyu Jiang (University of California, Berkeley); Soheil Esmaeilzadeh (Stanford University); Kamyar Azizzadenesheli (California Institute of Technology); Karthik Kashinath and Mustafa Mustafa (Lawrence Berkeley National Laboratory); Hamdi A. Tchelepi (Stanford University); Philip S. Marcus (University of California, Berkeley); Mr Prabhat (Lawrence Berkeley National Laboratory); and Anima Anandkumar (California Institute of Technology, Nvidia Corporation)
Metis: Learning to Schedule Long-Running Applications in Shared Container Clusters at Scale Luping Wang, Qizhen Weng, and Wei Wang (Hong Kong University of Science and Technology); Chen Chen (Hong Kong University of Science and Technology, Huawei Technologies Ltd); and Bo Li (Hong Kong University of Science and Technology)
MoHA: A Composable System for Efficient In-Situ Analytics on Heterogeneous HPC Systems Haoyuan Xing (Ohio State University), Gagan Agrawal (Augusta University), and Rajiv Ramnath (Ohio State University)
Multi-Node Multi-GPU Diffeomorphic Image Registration for Large-Scale Imaging Problems Malte Brunn (University of Stuttgart), Naveen Himthani and George Biros (University of Texas), Miriam Mehl (University of Stuttgart), and Andreas Mang (University of Houston)
Newton-ADMM: A Distributed GPU-Accelerated Optimizer for Multiclass Classification Problems Chih-Hao Fang and Sudhir B. Kylasa (Purdue University); Fred Roosta (University of Queensland); Michael W. Mahoney (University of California, University of California, Berkeley); and Ananth Grama (Purdue University)
OMPRacer: A Scalable and Precise Static Race Detector for OpenMP Programs Bradley Swain (Texas A&M University, Coderrect Inc); Yanze Li and Peiming Liu (Texas A&M University); Ignacio Laguna and Giorgis Georgakoudis (Lawrence Livermore National Laboratory); and Jeff Huang (Texas A&M University)
Optimizing Deep Learning Recommender Systems Training on CPU Cluster Architectures Dhiraj Kalamkar, Evangelos Georganas, Sudarshan Srinivasan, Jianping Chen, Mikhail Shiryaev, and Alexander Heinecke (Intel Corporation)
A Parallel Framework for Constraint-Based Bayesian Network Learning via Markov Blanket Discovery Ankit Srivastava, Sriram Chockalingam, and Srinivas Aluru (Georgia Institute of Technology)
Pencil: A Pipelined Algorithm for Distributed Stencils Hengjie Wang and Aparna Chandramowlishwaran (University of California, Irvine)
A Performance-Portable Nonhydrostatic Atmospheric Dycore for the Energy Exascale Earth System Model Running at Cloud-Resolving Resolutions Luca Bertagna, Oksana Guba, Mark A. Taylor, and James G. Foucar (Sandia National Laboratories); Jeff Larkin (Nvidia Corporation); and Andrew M. Bradley, Sivasankaran Rajamanickam, and Andrew G. Salinger (Sandia National Laboratories)
Petascale XCT: 3D Image Reconstruction with Hierarchical Communications on Multi-GPU Nodes Mert Hidayetoglu (University of Illinois); Tekin Bicer (Argonne National Laboratory); Simon Garcia de Gonzalo (Barcelona Supercomputing Center); Bin Ren (College of William & Mary); Vincent De Andrade, Doga Gursoy, Rajkumar Kettimuthu, and Ian T. Foster (Argonne National Laboratory); and Wen-mei W. Hwu (University of Illinois)
pLiner: Isolating Lines of Floating-Point Code for Compiler-Induced Variability Hui Guo (University of California, Davis); Ignacio Laguna (Lawrence Livermore National Laboratory); and Cindy Rubio-González (University of California, Davis)
A Population Data-Driven Workflow for COVID-19 Modeling and Learning Jonathan Ozik, Justin M. Wozniak, Nicholson Collier, and Charles M. Macal (Argonne National Laboratory (ANL)) and Mickael Binois (French Institute for Research in Computer Science and Automation (INRIA))
PREEMPT: Scalable Epidemic Interventions Using Submodular Optimization on Multi-GPU Systems Marco Minutoli (Pacific Northwest National Laboratory (PNNL)); Prathyush Sambaturu (University of Virginia); Mahantesh Halappanavar (Pacific Northwest National Laboratory (PNNL), Pacific Northwest National Laboratory); Antonino Tumeo (Pacific Northwest National Laboratory (PNNL)); Ananth Kalyanaraman (Washington State University); and Anil Vullikanti (University of Virginia)
Preparing Nuclear Astrophysics for Exascale Max Katz (NVIDIA Corporation); Ann Almgren (Lawrence Berkeley National Laboratory); Maria Barrios Sazo and Kiran Eiden (Stony Brook University); Kevin Gott (Lawrence Berkeley National Laboratory); Alice Harpole (Stony Brook University); Jean Sexton, Don Willcox, and Weiqun Zhang (Lawrence Berkeley National Laboratory); and Michael Zingale (Stony Brook University)
Processing Full-Scale Square Kilometre Array Data on the Summit Supercomputer Ruonan Wang (Oak Ridge National Laboratory); Rodrigo Tobar and Markus Dolensky (University of Western Australia; International Center for Radio Astronomy Research, Australia); Tao An (Shanghai Astronomical Observatory); Andreas Wicenec and Chen Wu (University of Western Australia; International Center for Radio Astronomy Research, Australia); Fred Dulwich (University of Oxford); Norbert Podhorszki, Valentine Anantharaj, and Eric Suchyta (Oak Ridge National Laboratory); Baoqiang Lao (Shanghai Astronomical Observatory); and Scott Klasky (Oak Ridge National Laboratory)
Pushing the Limit of Molecular Dynamics with Ab Initio Accuracy to 100 Million Atoms with Machine Learning Weile Jia (University of California, Berkeley); Han Wang (Institute of Applied Physics and Computational Mathematics, China); Mohan Chen and Denghui Lu (Peking University); Lin Lin (University of California, Berkeley; Lawrence Berkeley National Laboratory); and Roberto Car, Weinan E, and Linfeng Zhang (Princeton University)
RDMP-KV: Designing Remote Direct Memory Persistence Based Key-Value Stores with PMEM Tianxi Li, Dipti Shankar, Shashank Gugnani, and Xiaoyi Lu (Ohio State University)
Recurrent Neural Network Architecture Search for Geophysical Emulation Romit Maulik (Argonne National Laboratory (ANL)); Romain Egele (École polytechnique, Polytechnic Institute of Paris); Bethany Lusch (Argonne National Laboratory (ANL)); and Prasanna Balaprakash (Argonne National Laboratory (ANL), Polytechnic Institute of Paris)
Reducing Communication in Graph Neural Network Training Alok Tripathy (University of California, Berkeley); Katherine Yelick (University of California, Berkeley; Lawrence Berkeley National Laboratory); and Aydin Buluc (Lawrence Berkeley National Laboratory; University of California, Berkeley)
RLScheduler: An Automated HPC Batch Job Scheduler Using Reinforcement Learning Di Zhang and Dong Dai (University of North Carolina, Charlotte); Youbiao He and Forrest Sheng Bao (Iowa State University); and Bing Xie (Oak Ridge National Laboratory (ORNL))
Rocket: Efficient and Scalable All-Pairs Computations on Heterogeneous Platforms Stijn Heldens (Netherlands eScience Center, University of Amsterdam); Pieter Hijma (Vrije University Amsterdam, University of Amsterdam); Ben van Werkhoven and Jason Maassen (Netherlands eScience Center); Henri Bal (Vrije University Amsterdam); and Rob van Nieuwpoort (Netherlands eScience Center, University of Amsterdam)
Runtime-Guided ECC Protection using Online Estimation of Memory Vulnerability Luc Jaulmes, Miquel Moretó, and Mateo Valero (Barcelona Supercomputing Center, Polytechnic University of Catalonia); Mattan Erez (University of Texas); and Marc Casas (Barcelona Supercomputing Center, Polytechnic University of Catalonia)
Scalable Heterogeneous Execution of a Coupled-Cluster Model with Perturbative Triples Jinsung Kim (University of Utah); Ajay Panyala, Bo Peng, and Karol Kowalski (Pacific Northwest National Laboratory (PNNL)); Ponnuswamy Sadayappan (University of Utah, Pacific Northwest National Laboratory (PNNL)); and Sriram Krishnamoorthy (Pacific Northwest National Laboratory (PNNL), Washington State University)
Scalable Knowledge Graph Analytics at 136 PetaFLOPS Ramakrishnan Kannan, Piyush Sao, Hao Lu, and Drahomira Herrmannova (Oak Ridge National Laboratory (ORNL)); Vijay Thakkar (Georgia Institute of Technology); Robert Patton (Oak Ridge National Laboratory (ORNL)); Richard Vuduc (Georgia Institute of Technology); and Thomas Potok (Oak Ridge National Laboratory (ORNL))
Scalable yet Rigorous Floating-Point Error Analysis Arnab Das, Ian Briggs, and Ganesh Gopalakrishnan (University of Utah); Sriram Krishnamoorthy (Pacific Northwest National Laboratory (PNNL), Washington State University); and Pavel Panchekha (University of Utah)
ScalAna: Automating Scaling Loss Detection with Graph Analysis Yuyang Jin and Haojie Wang (Tsinghua University, China); Teng Yu (Tsinghua University, China; University of St Andrews); Xiongchao Tang (Tsinghua University, China); Torsten Hoefler (ETH Zurich); Xu Liu (North Carolina State University); and Jidong Zhai (Tsinghua University, China)
Scaling Distributed Deep Learning Workloads beyond the Memory Capacity with KARMA Mohamed Wahib (National Institute of Advanced Industrial Science and Technology (AIST), RIKEN Center for Computational Science (R-CCS)); Haoyu Zhang (miHoYo Ltd); Truong Thao Nguyen (National Institute of Advanced Industrial Science and Technology (AIST)); Aleksandr Drozd and Jens Domke (RIKEN Center for Computational Science (R-CCS)); Lingqi Zhang (Tokyo Institute of Technology); Ryousei Takano (National Institute of Advanced Industrial Science and Technology (AIST)); and Satoshi Matsuoka (RIKEN Center for Computational Science (R-CCS), Tokyo Institute of Technology)
Scaling the Hartree-Fock Matrix Build on Summit Giuseppe M. J. Barca (Australian National University); David Poole, Jorge Galvez Vallejo, and Melisa Alkan (Iowa State University); Colleen Bertoni (Argonne National Laboratory (ANL)); Alistair P. Rendell (Flinders University, Australia); and Mark S. Gordon (Iowa State University)
SEFEE: Lightweight Storage Error Forecasting in Large-Scale Enterprise Storage Systems Amirhessam Yazdi (University of Nevada, Reno); Xing Lin (NetApp Inc); and Lei Yang and Feng Yan (University of Nevada, Reno)
SegAlign: A Scalable GPU-Based Whole Genome Aligner Sneha D. Goenka (Stanford University); Yatish Turakhia and Benedict Paten (University of California, Santa Cruz); and Mark Horowitz (Stanford University)
Smart-PGSim: Using Neural Network to Accelerate AC-OPF Power Grid Simulation Wenqian Dong and Zhen Xie (University of California, Merced); Gokcen Kestor (Pacific Northwest National Laboratory (PNNL)); and Dong Li (University of California, Merced)
Sparse GPU Kernels for Deep Learning Trevor Gale (Stanford University, Google Brain); Matei Zaharia (Stanford University); Cliff Young (Google Brain); and Erich Elsen (Deepmind)
Speeding Up SpMV for Power-Law Graph Analytics by Enhancing Locality & Vectorization Serif Yesil and Azin Heidarshenas (University of Illinois), Adam Morrison (Tel Aviv University), and Josep Torrellas (University of Illinois)
SpTFS: Sparse Tensor Format Selection for MTTKRP via Deep Learning Qingxiao Sun, Yi Liu, Ming Dun, Hailong Yang, and Zhongzhi Luan (Beihang University); Lin Gan and Guangwen Yang (Tsinghua University, China); and Depei Qian (Beihang University)
A Submatrix-Based Method for Approximate Matrix Function Evaluation in the Quantum Chemistry Code CP2K Michael Lass, Robert Schade, Thomas D. Kühne, and Christian Plessl (Paderborn University)
TAGO: Rethinking Routing Design in High Performance Reconfigurable Networks Min Yee Teh and Yu-Han Hung (Columbia University), George Michelogiannakis (Lawrence Berkeley National Laboratory), Shijia Yan and Madeleine Glick (Columbia University), John Shalf (Lawrence Berkeley National Laboratory), and Keren Bergman (Columbia University)
Taming I/O Variation on QoS-Less HPC Storage: What Can Applications Do? Zhenbo Qiao (New Jersey Institute of Technology); Qing Liu (New Jersey Institute of Technology, Oak Ridge National Laboratory (ORNL)); and Norbert Podhorszki, Scott Klasky, and Jieyang Chen (Oak Ridge National Laboratory (ORNL))
Task Bench: A Parameterized Benchmark for Evaluating Parallel Runtime Performance Elliott Slaughter (SLAC National Accelerator Laboratory); Wei Wu (Los Alamos National Laboratory); Yuankun Fu (Purdue University); Legend Brandenburg, Nicolai Garcia, Wilhem Kautz, Emily Marx, and Kaleb S. Morris (Stanford University); Qinglei Cao and George Bosilca (University of Tennessee); Seema Mirchandaney (SLAC National Accelerator Laboratory); Wonchan Lee and Sean Treichler (Nvidia Corporation); Patrick McCormick (Los Alamos National Laboratory); and Alex Aiken (Stanford University)
Term Quantization: Furthering Quantization at Run Time HT Kung (Harvard University), Bradley McDanel (Franklin and Marshall College), and Sai Qian Zhang (Harvard University)
TOSS-2020: A Commodity Software Stack for HPC Edgar A. Leon, Trent D'Hooge, Nathan Hanford, Ian Karlin, Ramesh Pankajakshan, Jim Foraker, Chris Chambreau, and Matthew L. Leininger (Lawrence Livermore National Laboratory)
Toward Realization of Numerical Towing-Tank Tests by Wall-Resolved Large Eddy Simulation Based on 32 Billion Grid Finite-Element Computation Chisachi Kato (University of Tokyo), Yoshinobu Yamade and Katsuhiro Nagano (Mizuho Information and Research Institute Inc), Kiyoshi Kumahata and Kazuo Minami (RIKEN Center for Computational Science (R-CCS)), and Tatsuo Nishikawa (Shipbuilding Research Centre of Japan)
Tuning Floating-Point Precision Using Dynamic Program Information and Temporal Locality Hugo Brunie, Costin Iancu, and Khaled Z. Ibrahim (Lawrence Berkeley National Laboratory); Philip Brisk (University of California, Riverside); and Brandon Cook (Lawrence Berkeley National Laboratory)
VERITAS: Accurately Estimating the Correct Output on Noisy Intermediate-Scale Quantum Computers Tirthak Patel and Devesh Tiwari (Northeastern University)
Waiting Game: Optimally Provisioning Fixed Resources for Cloud-Enabled Schedulers Pradeep Ambati, Noman Bashir, David Irwin, and Prashant Shenoy (University of Massachusetts, Amherst)
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, and Yuxiong He (Microsoft Corporation)
Zerospy: Exploring Software Inefficiency with Redundant Zeros Xin You, Hailong Yang, Zhongzhi Luan, and Depei Qian (Beihang University) and Xu Liu (North Carolina State University)