BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/New_York
X-LIC-LOCATION:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20210402T160554Z
LOCATION:Track 2
DTSTART;TZID=America/New_York:20201112T100000
DTEND;TZID=America/New_York:20201112T175000
UID:submissions.supercomputing.org_SC20_sess208@linklings.com
SUMMARY:PMBS20: The 11th International Workshop on Performance Modeling, B
 enchmarking and Simulation of High-Performance Computer Systems
DESCRIPTION:Workshop\n\nPMBS20 – Lunch Break\n\n\n\n---------------------\
 nPMBS20 – Introduction: The 11th International Workshop on Performance Mod
 eling, Benchmarking, and Simulation of High-Performance Computer Systems\n
 \nWright, Jarvis, Hammond\n\nThe PMBS20 workshop is concerned with the com
 parison of high-performance computing systems through performance modeling
 , benchmarking or through the use of tools such as simulators. We are part
 icularly interested in research which reports the ability to measure and m
 ake tradeoffs in software/hardwar...\n\n---------------------\nWarwick Dat
 a Store: A Data Structure Abstraction Library\n\nKirk, Nolten, Kevis, Law,
  Maheswaran...\n\nWith the increasing complexity of memory architectures a
 nd scientific applications, developing data structures that are performant
 , portable, scalable, and support developer productivity, is a challenging
  task.  In this paper, we present Warwick Data Store (WDS), a lightweight 
 and extensible C++ tem...\n\n---------------------\nPerformance Tradeoffs 
 in GPU Communication: A Study of Host and Device-Initiated Approaches\n\nG
 roves, Brock, Chen, Ibrahim, Oliker...\n\nNetwork communication on GPU-bas
 ed systems is a significant roadblock for many applications with small but
  frequent messaging requirements.  One common question for application dev
 elopers is,  'How can they reduce the overheads and achieve the best commu
 nication performance on GPUs?'  This work exam...\n\n---------------------
 \nLightweight Measurement and Analysis of HPC Performance Variability\n\nD
 ominguez-Trujillo, Haskins, Jafari Khouzani, Leap, Tashakkori...\n\nPerfor
 mance variation deriving from hardware and software sources is common in m
 odern scientific and data-intensive computing systems, and synchronization
  in parallel and distributed programs often exacerbates their impacts at s
 cale. The decentralized and emergent effects of such variation are, unfo..
 .\n\n---------------------\nThe Performance and Energy Efficiency Potentia
 l of FPGAs in Scientific Computing\n\nNguyen, Williams, Siracusa, MacLean,
  Doerfler...\n\nHardware specialization is a promising direction for the f
 uture of digital computing. Reconfigurable technologies enable hardware sp
 ecialization with modest non-recurring engineering cost. In this paper, we
  use FPGAs to evaluate the benefits of building specialized hardware for n
 umerical kernels fou...\n\n---------------------\nAccelerating High-Order 
 Stencils on GPUs\n\nSai, Mellor-Crummey, Meng, Araya-Polo, Meng\n\nWhile i
 mplementation strategies for low-order stencils on GPUs have been well-stu
 died in the literature, not all of the techniques studied work well for hi
 gh-order stencils, such as those used for seismic imaging. In this paper, 
 we study practical seismic imaging computations on GPUs using high-orde...
 \n\n---------------------\nAutotuning PolyBench Benchmarks with LLVM Clang
 /Polly Loop Optimization Pragmas Using Bayesian Optimization\n\nWu, Kruse,
  Balaprakash, Finkel, Taylor...\n\nAn autotuning is an approach that explo
 res a search space of possible implementations/configurations of a kernel 
 or an application by selecting and evaluating a subset  of implementations
 /configurations on a target platform and/or use models to identify a high 
 performance implementation/configuratio...\n\n---------------------\nEvalu
 ating the Performance of NVIDIA's A100 Ampere GPU for Sparse and Batched C
 omputations\n\nAnzt, Tsai, Abdelfattah, Cojean, Dongarra\n\nGPU accelerato
 rs have become an important backbone for scientific high performance-compu
 ting, and the performance advances obtained from adopting new GPU hardware
  are significant. In this paper, we take a first look at NVIDIA's newest s
 erver-line GPU, the A100 architecture, part of the Ampere genera...\n\n---
 ------------------\nPMBS20 – Wrapup\n\n\n\n---------------------\nDevelopi
 ng Models for the Runtime of Programs with Exponential Runtime Behavior\n\
 nBurger, Nguyen, Bischof\n\nIn this paper, we present a new approach to ge
 nerate runtime models for programs whose runtime grows exponentially with 
 the value of one input parameter. Such programs are, e.g., of high interes
 t for cryptanalysis to analyze practical security of traditional and post-
 quantum secure schemes. The mode...\n\n---------------------\nExploiting t
 he Potentials of the Second Generation SX-Aurora TSUBASA\n\nEgawa, Fujimot
 o, Yamashita, Sasaki, Isobe...\n\nNEC SX-series vector supercomputers have
  provided outstanding memory bandwidths to meet the strong demands for eff
 icient execution of memory-intensive scientific applications in practice. 
 Inheriting the advantage, the 2nd generation SX-Aurora TSUBASA, Type 20B, 
 provides an extremely high memory band...\n\n---------------------\nEvalua
 tion of the Communication Motif for a Distributed Eigensolver Using the SS
 T Network Simulation Tool\n\nAfibuzzaman, Maris, Groves, Oryspayev, Cook..
 .\n\nA new motif that corresponds to the communication operations of the d
 istributed LOBPCG eigensolver used in the Many-Fermion Dynamics--nuclear, 
 or MFDn, code is constructed. The impact of communication strategy and pro
 cess placement are evaluated on current and future architectures using the
  SST netw...\n\n---------------------\nBenchmarking Julia’s Communication 
 Performance: Is Julia HPC Ready or Full HPC?\n\nHunold, Steiner\n\nJulia h
 as quickly become one of the main programming languages for computational 
 sciences, mainly due to its speed and flexibility. The speed and efficienc
 y of Julia are the main reasons why researchers in the field of High Perfo
 rmance Computing have started porting their applications to Julia.\n\nSin.
 ..\n\n---------------------\nPMBS20 – Break\n\n\n\n---------------------\n
 Performance Modeling of Streaming Kernels and Sparse Matrix-Vector Multipl
 ication on A64FX\n\nAlappat, Laukemann, Gruber, Hager, Wellein...\n\nThe A
 64FX CPU powers the current #1 supercomputer on the Top500 list. Although 
 it is a traditional cache-based multicore processor, its peak performance 
 and memory bandwidth rival accelerator devices. Generating efficient code 
 for such a new architecture requires a good understanding of its performa.
 ..\n\n---------------------\nPMBS20 – Break\n\n\n\n\nRegistration Category
 : Workshop Reg Pass
END:VEVENT
END:VCALENDAR