BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/New_York
X-LIC-LOCATION:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20210402T160555Z
LOCATION:Track 8
DTSTART;TZID=America/New_York:20201112T100000
DTEND;TZID=America/New_York:20201112T183000
UID:submissions.supercomputing.org_SC20_sess214@linklings.com
SUMMARY:11th Workshop on Latest Advances in Scalable Algorithms for Large-
Scale Systems
DESCRIPTION:Workshop\n\nReplacing Pivoting in Distributed Gaussian Elimina
tion with Randomized Techniques\n\nLindquist, Luszczek, Dongarra\n\nGaussi
an elimination is a key technique for solving\ndense, non-symmetric system
s of linear equations. Pivoting is\nused to ensure numerical stability but
can introduce significant\noverheads. We propose replacing pivoting with
recursive butterfly\ntransforms (RBTs) and iterative refinement. RBTs use\
nan...\n\n---------------------\nA Survey of Singular Value Decomposition
Methods for Distributed Tall/Skinny Data\n\nSchmidt\n\nThe Singular Value
Decomposition (SVD) is one of the most important matrix \nfactorizations,
enjoying a wide variety of applications across numerous \napplication dom
ains. In statistics and data analysis, the common applications of \nSVD su
ch as Principal Components Analysis (PCA) and linear regression...\n\n----
-----------------\nPerformance Analysis of a Quantum Monte Carlo Applicati
on on Multiple Hardware Architectures Using the HPX Runtime\n\nWei, Chatte
rjee, Huck, Hernandez, Kaiser\n\nThis paper describes how we successfully
used the HPX programming model to port the DCA++ application on multiple a
rchitectures that include POWER9, x86, ARM v8, and NVIDIA GPUs. We describ
e the lessons we can learn from this experience as well as the benefits of
enabling the HPX in the application ...\n\n---------------------\nRecursi
ve Basic Linear Algebra Operations on TensorCore GPU\n\nZhang, Karihaloo,
Wu\n\nEncouraged by the requirement of high speed matrix computations and
training deep neural networks, TensorCore was introduced in NVIDIA GPU\nto
further accelerate matrix-matrix multiplication. It supports very fast ha
lf precision general matrix matrix multiplications (GEMMs), which is aroun
d 8x faster...\n\n---------------------\nAn Integer Arithmetic-Based Spars
e Linear Solver Using a GMRES Method and Iterative Refinement\n\nIwashita,
Suzuki, Fukaya\n\nIn this paper, we develop a (preconditioned) GMRES solv
er based on integer arithmetic, and introduce an iterative refinement fram
ework for the solver. We describe the data format for the coefficient matr
ix and vectors for the solver that is based on integer or fixed-point numb
ers.\nTo avoid overflow ...\n\n---------------------\nHigh-Order Finite El
ement Method Using Standard and Device-Level Batch GEMM on GPUs\n\nBeams,
Abdelfattah, Tomov, Dongarra, Kolev...\n\nWe present new GPU implementatio
ns of the tensor contractions arising from basis-related computations for
high-order finite element methods. We consider both tensor and non-tensor
bases. In the case of tensor bases, we introduce new kernels based on\na
series of fused device-level matrix multiplicat...\n\n--------------------
-\nImplementation and Numerical Techniques for One Eflop/s HPL-AI Benchmar
k on Fugaku\n\nImamura, Kudo, Nitadori, Ina\n\nOur performance benchmark o
f HPL-AI on the supercomputer Fugaku was awarded the 55th Top500. The effe
ctive performance was 1.42 EFlop/s, and the world's first achievement to e
xceed the wall of exascale in a floating-point arithmetic benchmark. Becau
se HPL-AI is brand new and has no reference code fo...\n\n----------------
-----\nRevisiting Exponential Integrator Methods for HPC with a Mini-Appli
cation\n\nShanks\n\nIn this work we look at employing communication-avoidi
ng techniques commonly used in Krylov methods in the context of exponentia
l integrators for the solution of stiff partial differential equations. We
choose an exponential integrator method based on polynomial approximation
s, as compared to those ...\n\n---------------------\nScalA – Introduction
: 11th Workshop on Latest Advances in Scalable Algorithms for Large-Scale
Systems\n\nAlexandrov, Dongarra, Geist, Engelmann\n\nNovel scalable scient
ific algorithms are needed to enable key science applications to exploit t
he computational power of large-scale systems. These extreme-scale algori
thms need to hide network and memory latency, have very high computation/c
ommunication overlap and minimal communication and have n...\n\n----------
-----------\nTwo-Stage Asynchronous Iterative Solvers for Multi-GPU Cluste
rs\n\nNayak, Cojean, Anzt\n\nGiven the trend of supercomputers accumulatin
g much of their compute power in \nGPU accelerators composed of thousand
s of cores and operating in streaming\nmode, global synchronization poin
ts become a bottleneck, severely confining \nthe performance of applicat
ions. In consequence, asynchronous m...\n\n---------------------\nScalA –
Closing\n\nAlexandrov\n\n---------------------\nScalA – Break\n\n\n\n-----
----------------\nScalA – Break\n\n\n\n---------------------\nKeynote 3: E
CP – Recent Experiences in Porting Complex Applications to Accelerator-Bas
ed Systems\n\nSiegel\n\nhe U.S. Department of Energy's Exascale Computing
Project (ECP) represents a broad effort to enable mission critical science
and engineering on next generation HPC systems. As part of this, ECP incl
udes 24 application development teams spanning a broad range of science an
d engineering domains. The t...\n\n---------------------\nA Fast Scalable
Iterative Implicit Solver with Green's Function-Based Neural Networks\n\nI
chimura, Fujita, Hori, Maddegedara, Ueda...\n\nBased on the Green's functi
ons that reflect mathematical properties of partial differential equations
(PDE), we developed a novel preconditioner using neural networks (NNs) wi
th high accuracy and small computational cost for improving the convergenc
e property of an iterative implicit solver. As the ...\n\n----------------
-----\nScalA – Break\n\n\n\n---------------------\nScalA – Keynote: Perfor
mance Evaluation of the Supercomputer "Fugaku" and A64FX Manycore Processo
r\n\nSato\n\nWe have been carrying out the FLAGSHIP 2020 to develop the Ja
panese next-generation flagship supercomputer, Post-K, named “Fugaku”. In
the project, we have designed a new Arm-SVE enabled processor, called A64F
X, as well as the system, including interconnect, with the industry partne
r, Fujitsu. The p...\n\n---------------------\nScalA – Keynote: High Perfo
rmance Data Analytics and Some Applications\n\nEmad\n\nIn most areas of sc
ience, data production is now faster than compute capabilities. The comput
ational modeling and data analysis associated with high-performance comput
ing techniques are used to make these huge amounts of data effectively tal
k. In this talk, we highlight some challenges in the ecosys...\n\n\nTag: A
lgorithms, Extreme Scale Computing, Performance/Productivity Measurement a
nd Evaluation, Scalable Computing, Scientific Computing\n\nRegistration Ca
tegory: Workshop Reg Pass
END:VEVENT
END:VCALENDAR