BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/New_York
X-LIC-LOCATION:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20210402T160555Z
LOCATION:Track 8
DTSTART;TZID=America/New_York:20201112T100000
DTEND;TZID=America/New_York:20201112T183000
UID:submissions.supercomputing.org_SC20_sess214@linklings.com
SUMMARY:11th Workshop on Latest Advances in Scalable Algorithms for Large-
 Scale Systems
DESCRIPTION:Workshop\n\nReplacing Pivoting in Distributed Gaussian Elimina
 tion with Randomized Techniques\n\nLindquist, Luszczek, Dongarra\n\nGaussi
 an elimination is a key technique for solving\ndense, non-symmetric system
 s of linear equations. Pivoting is\nused to ensure numerical stability but
  can introduce significant\noverheads. We propose replacing pivoting with 
 recursive butterfly\ntransforms (RBTs) and iterative refinement. RBTs use\
 nan...\n\n---------------------\nA Survey of Singular Value Decomposition 
 Methods for Distributed  Tall/Skinny Data\n\nSchmidt\n\nThe Singular Value
  Decomposition (SVD) is one of the most important matrix \nfactorizations,
  enjoying a wide variety of applications across numerous \napplication dom
 ains. In statistics and data analysis, the common applications of \nSVD su
 ch as Principal Components Analysis (PCA) and linear regression...\n\n----
 -----------------\nPerformance Analysis of a Quantum Monte Carlo Applicati
 on on Multiple Hardware Architectures Using the HPX Runtime\n\nWei, Chatte
 rjee, Huck, Hernandez, Kaiser\n\nThis paper describes how we successfully 
 used the HPX programming model to port the DCA++ application on multiple a
 rchitectures that include POWER9, x86, ARM v8, and NVIDIA GPUs. We describ
 e the lessons we can learn from this experience as well as the benefits of
  enabling the HPX in the application ...\n\n---------------------\nRecursi
 ve Basic Linear Algebra Operations on TensorCore GPU\n\nZhang, Karihaloo, 
 Wu\n\nEncouraged by the requirement of high speed matrix computations and 
 training deep neural networks, TensorCore was introduced in NVIDIA GPU\nto
  further accelerate matrix-matrix multiplication. It supports very fast ha
 lf precision general matrix matrix multiplications (GEMMs), which is aroun
 d 8x faster...\n\n---------------------\nAn Integer Arithmetic-Based Spars
 e Linear Solver Using a GMRES Method and Iterative Refinement\n\nIwashita,
  Suzuki, Fukaya\n\nIn this paper, we develop a (preconditioned) GMRES solv
 er based on integer arithmetic, and introduce an iterative refinement fram
 ework for the solver. We describe the data format for the coefficient matr
 ix and vectors for the solver that is based on integer or fixed-point numb
 ers.\nTo avoid overflow ...\n\n---------------------\nHigh-Order Finite El
 ement Method Using Standard and Device-Level Batch GEMM on GPUs\n\nBeams, 
 Abdelfattah, Tomov, Dongarra, Kolev...\n\nWe present new GPU implementatio
 ns of the tensor contractions arising from basis-related computations for 
 high-order finite element methods.  We consider both tensor and non-tensor
  bases. In the case of tensor bases, we introduce new kernels based on\na 
 series of fused device-level matrix multiplicat...\n\n--------------------
 -\nImplementation and Numerical Techniques for One Eflop/s HPL-AI Benchmar
 k on Fugaku\n\nImamura, Kudo, Nitadori, Ina\n\nOur performance benchmark o
 f HPL-AI on the supercomputer Fugaku was awarded the 55th Top500. The effe
 ctive performance was 1.42 EFlop/s, and the world's first achievement to e
 xceed the wall of exascale in a floating-point arithmetic benchmark. Becau
 se HPL-AI is brand new and has no reference code fo...\n\n----------------
 -----\nRevisiting Exponential Integrator Methods for HPC with a Mini-Appli
 cation\n\nShanks\n\nIn this work we look at employing communication-avoidi
 ng techniques commonly used in Krylov methods in the context of exponentia
 l integrators for the solution of stiff partial differential equations. We
  choose an exponential integrator method based on polynomial approximation
 s, as compared to those ...\n\n---------------------\nScalA – Introduction
 : 11th Workshop on Latest Advances in Scalable Algorithms for Large-Scale 
 Systems\n\nAlexandrov, Dongarra, Geist, Engelmann\n\nNovel scalable scient
 ific algorithms are needed to enable key science applications to exploit t
 he computational power of large-scale systems.  These extreme-scale algori
 thms need to hide network and memory latency, have very high computation/c
 ommunication overlap and minimal communication and have n...\n\n----------
 -----------\nTwo-Stage Asynchronous Iterative Solvers for Multi-GPU Cluste
 rs\n\nNayak, Cojean, Anzt\n\nGiven the trend of supercomputers accumulatin
 g much of their compute power in \nGPU accelerators composed of thousand
 s of cores and operating in streaming\nmode, global synchronization poin
 ts become a bottleneck, severely confining \nthe performance of applicat
 ions. In consequence, asynchronous m...\n\n---------------------\nScalA – 
 Closing\n\nAlexandrov\n\n---------------------\nScalA – Break\n\n\n\n-----
 ----------------\nScalA – Break\n\n\n\n---------------------\nKeynote 3: E
 CP – Recent Experiences in Porting Complex Applications to Accelerator-Bas
 ed Systems\n\nSiegel\n\nhe U.S. Department of Energy's Exascale Computing 
 Project (ECP) represents a broad effort to enable mission critical science
  and engineering on next generation HPC systems. As part of this, ECP incl
 udes 24 application development teams spanning a broad range of science an
 d engineering domains. The t...\n\n---------------------\nA Fast Scalable 
 Iterative Implicit Solver with Green's Function-Based Neural Networks\n\nI
 chimura, Fujita, Hori, Maddegedara, Ueda...\n\nBased on the Green's functi
 ons that reflect mathematical properties of partial differential equations
  (PDE), we developed a novel preconditioner using neural networks (NNs) wi
 th high accuracy and small computational cost for improving the convergenc
 e property of an iterative implicit solver. As the ...\n\n----------------
 -----\nScalA – Break\n\n\n\n---------------------\nScalA – Keynote: Perfor
 mance Evaluation of the Supercomputer "Fugaku" and A64FX Manycore Processo
 r\n\nSato\n\nWe have been carrying out the FLAGSHIP 2020 to develop the Ja
 panese next-generation flagship supercomputer, Post-K, named “Fugaku”. In 
 the project, we have designed a new Arm-SVE enabled processor, called A64F
 X, as well as the system, including interconnect, with the industry partne
 r, Fujitsu. The p...\n\n---------------------\nScalA – Keynote: High Perfo
 rmance Data Analytics and Some Applications\n\nEmad\n\nIn most areas of sc
 ience, data production is now faster than compute capabilities. The comput
 ational modeling and data analysis associated with high-performance comput
 ing techniques are used to make these huge amounts of data effectively tal
 k. In this talk, we highlight some challenges in the ecosys...\n\n\nTag: A
 lgorithms, Extreme Scale Computing, Performance/Productivity Measurement a
 nd Evaluation, Scalable Computing, Scientific Computing\n\nRegistration Ca
 tegory: Workshop Reg Pass
END:VEVENT
END:VCALENDAR

