BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/New_York
X-LIC-LOCATION:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20210402T160544Z
LOCATION:Poster Module
DTSTART;TZID=America/New_York:20201119T083000
DTEND;TZID=America/New_York:20201119T170000
UID:submissions.supercomputing.org_SC20_sess337@linklings.com
SUMMARY:Research Poster Display
DESCRIPTION:Posters, Research Posters\n\nTaskWorks: A Task Engine for Empo
 wering Asynchronous Operations in HPC Applications\n\nHOU, Koziol, Byna\n\
 nTaskWorks is a portable, high-level task engine designed for HPC workload
 s. Applications can create tasks and define dependencies between them with
  the task engine. Once the task is defined and submitted to TaskWorks, the
  TaskWorks engine will execute it according to the specified dependencies,
  with...\n\n---------------------\nModest Scale HPC on Azure Using CGYRO\n
 \nSfiligoi, Candy\n\nFusion simulations have traditionally required the us
 e of leadership-scale HPC resources in order to produce advances in physic
 s. One such package is CGYRO, a premier tool for multi-scale plasma turbul
 ence simulation. CGYRO is a typical HPC application that will not fit into
  a single node, as it req...\n\n---------------------\nStreamBrain: An HPC
  DSL for Brain-Like Neural Networks on Heterogeneous Systems\n\nPodobas, S
 vedin, Chien, Peng, Markidis...\n\nWe introduce StreamBrain: a high-perfor
 mance DSL for brain-like neural networks. StreamBrain supports multiple ba
 ckends such as FPGAs, GPUs and CPUs on heterogeneous HPC systems while pro
 viding a convenient Keras-like interface to users. We show that training a
 n MNIST dataset on the BCPNN model only...\n\n---------------------\nMater
 ial Interface Reconstruction Using Machine Learning\n\nFenn, Lewis, Doutri
 aux\n\nIn multi-material simulations, it is important to track material in
 terfaces. These are frequently not tracked explicitly and must be reconstr
 ucted from zone data. Current methods provide either material conservation
  or interface continuity, but not both, meaning that many interfaces may b
 e construct...\n\n---------------------\nAchieving the Performance of Glob
 al Adaptive Routing Using Local Information on Dragonfly through Deep Lear
 ning\n\nChaulagain, Liza, Chunduri, Yuan, Lang\n\nThe Universal Globally A
 daptive Load-balance Routing (UGAL) with global information, referred as U
 GAL-G, represents an ideal form of adaptive routing on Dragonfly. UGAL-G i
 s impractical to implement, however, since the global information cannot b
 e maintained accurately. Practical adaptive routing sc...\n\n-------------
 --------\nState of I/O in HPC 2020\n\nBateman, Herbein, Kougkas, Sun\n\nMo
 dern supercomputers are designed to allow users access to computing and I/
 O resources that exhibit better speed, higher scales and increased reliabi
 lity compared to that available in commodity hardware (i.e., cloud infrast
 ructures). The current era is one of transition from petascale to exascale
 , ...\n\n---------------------\nTowards Optimizing Memory Mapping of Persi
 stent Memory in UMap\n\nYoussef, Iwabuchi, Feng, Gokhale, Pearce\n\nThe ex
 ponential growth in data set sizes across multiple domains creates challen
 ges in terms of storing data efficiently as well as performing scalable co
 mputations on such data. Memory mapping files on different storage types o
 ffer a uniform interface as well as programming productivity for applica..
 .\n\n---------------------\nA Simulation Study of Hardware Parameters for 
 GPU-based HPC Platforms\n\nBhowmik, Jain, Yuan, Bhatele\n\nHigh-performanc
 e computing (HPC) platforms are switching to GPU-based compute nodes; the 
 resulting trend is the increase in per-node computational capacity and the
  reduction of the number of endpoints in the system. This trend changes th
 e computation and communication balance in comparison to the pr...\n\n----
 -----------------\nSemantic Search for Self-Describing Scientific Data For
 mats\n\nNiu, Zhang, Byna, Chen\n\nIt is often a daunting and challenging t
 ask for scientists to find datasets relevant to their needs. This is espec
 ially true for self-describing file formats, which are often used for data
  storage in scientific applications. Existing solutions extract the metada
 ta and process search queries with mat...\n\n---------------------\nThe Ar
 ithmetic Intensity of High-Order Discontinuous Galerkin Methods for Hyperb
 olic Systems\n\nTandon, Johnsen\n\nHigh-fidelity numerical simulations of 
 complex flow problems require high-resolution capabilities, which can be a
 chieved by employing high-order methods. A class of recovery-assisted disc
 ontinuous Galerkin (RADG) methods can achieve high-orders of accuracy by s
 trategically combining degrees of freed...\n\n---------------------\nChara
 cterizing and Approximating I/O Behavior of HDF5 Applications\n\nRajesh, H
 eber, Kougkas, Sun\n\nWe aim to characterize the I/O of an application, an
 d by using this understanding, try to determine if we can decompose an app
 lication profile into a simple sequence or "genome" sequence that describe
 s the application, and if we can compare these different "genomes" using a
  simple similarity metric....\n\n---------------------\nEvaluation of Tsun
 ami Inundation Simulation using Vector-Scalar Hybrid MPI on SX-Aurora TSUB
 ASA\n\nMusa, Soga, Abe, Sato, Komatsu...\n\nA real-time tsunami inundation
  forecast system has been developed since the 2011 Great East Japan earthq
 uake occurred. Reducing the processing time and downsizing the system have
  been required for the forecast system. To this end, we develop a vector-s
 calar hybrid MPI code of a tsunami inundation si...\n\n-------------------
 --\nEvaluation of Power Controls and Counters on General-Purpose Graphics 
 Processing Units\n\nAli, Bhalachandra, Wright, Sill, Chen\n\nGeneral-purpo
 se graphic processing units (GPUs) are becoming increasingly important in 
 high-performance computing (HPC) systems due to their massive computationa
 l performance.  Although GPUs are high performant, modern GPU architecture
 s consume a lot of power, making it imperative to improve their e...\n\n--
 -------------------\nSaddlebagX: High Performance Data Processing with PGA
 S and UPC++\n\nB. Ovesen, M. Khan, Chau, Ha\n\nThe ever growing scale of t
 he data and the emerging challenges of processing it efficiently at that s
 cale are driving the need for high-performance data processing frameworks 
 that\nare efficient yet highly programmable across big data analytics (BDA
 ) and high-performance computing (HPC) communities. ...\n\n---------------
 ------\nOptimization of Tensor-Product Operations in Nekbone on GPUs\n\nKa
 rp, Jansson, Podobas, Schlatter, Markidis\n\nIn the CFD solver Nek5000, th
 e computation is dominated by the evaluation of small tensor operations. N
 ekbone is a proxy app for Nek5000 and has previously been ported to GPUs w
 ith a mixed OpenACC and CUDA approach. In this work, we continue this effo
 rt and further optimize the main tensor-product o...\n\n------------------
 ---\nAI Meets HPC: Learning the Cell Motion in Biofluids\n\nZhang, Zhang, 
 Han, Cong, Yang...\n\nWe generalized the century-old Jeffery orbits equati
 on, by a novel biomechanics-informed online learning framework using simul
 ation data at atomic resolutions, to a new equation of motion for flowing 
 cells to account for the fluid conditions and the cell deformable structur
 es. To validate, we examin...\n\n---------------------\nQuantum Circuit Op
 timization with SPIRAL: A First Look\n\nMionis, Franchetti, Larkin\n\nOpti
 mization of quantum circuits is an integral part of the quantum computing 
 toolchain. In many Noisy Intermediate-Scale Quantum (NISQ) devices, only l
 oose connectivity between qubits is maintained, meaning a valid quantum ci
 rcuit often requires swapping physical qubits in order to satisfy adjacenc
 ...\n\n---------------------\nDistributed BERT Pre-Training And Fine-Tunin
 g with Intel-Optimized TensorFlow On Intel Xeon Scalable Processors\n\nOzt
 urk, Wang, Szankin, Shao\n\nDistributed computing has become a key compone
 nt in the field of data science, allowing for faster prototyping and accel
 erated time to market of numerous workloads. This work examines the distri
 buted training performance of BERT, a state-of-the-art language model for 
 neural language processing (NLP)...\n\n---------------------\nQuantifying 
 the Overheads of the Modern Linux I/O Stack\n\nLogan, Kougkas, Sun\n\nThe 
 performance of the Linux I/O stack is critical to the performance of distr
 ibuted storage applications. Recent research has shown that the Linux I/O 
 stack introduces multiple overheads that significantly reduce and randomiz
 e the performance of I/O operations. A lesser amount of research has been 
 ...\n\n---------------------\nSynChrono: An MPI-Based, Scalable Physics-Ba
 sed Simulation Framework for Autonomous Vehicles Operating in Off-Road Con
 ditions\n\nTaves, Young, Elmquist, Negrut, Serban...\n\nIn this contributi
 on we outline the MPI-based, scalable, physics-based simulation framework 
 SynChrono, and its use in autonomous vehicle studies in off-road condition
 s. SynChrono builds on the simulation capabilities of Chrono, but shows be
 tter scaling behavior, making it a useful environment for mu...\n\n-------
 --------------\nxBGAS: An Address Space Extension for Scalable High-Perfor
 mance Computing\n\nWang, Leidel, Williams, Ehret, Mark...\n\nThe tremendou
 s expansion of data volume has driven the transition from monolithic archi
 tectures towards systems integrated with discrete and distributed subcompo
 nents in modern scalable high-performance computing (HPC). As such, multi-
 layered software infrastructures have become essential to bridge ...\n\n--
 -------------------\nXPSI: X-ray Free Electron Laser-Based Protein Structu
 re Identifier\n\nOlaya García, Wyatt II, Caino-Lores, Tama, Miyashita...\n
 \nA protein's structure determines its function. Different proteins have d
 ifferent structures; proteins in the same family share similar substructur
 es and thus may share similar functions. Additionally, one protein may exh
 ibit several structural states, also named conformations. Identifying diff
 erent ...\n\n---------------------\nMiniVite + Metall: A Case Study of Acc
 elerating Graph Analytics Using Persistent Memory Allocator\n\nIwabuchi, G
 hosh, Pearce, Halappanavar, Gokhale\n\nA parallel graph generation has add
 itional computation and communication overheads that often exceed the exec
 ution time for solving the original problem for which the graph was actual
 ly generated. Substantial performance improvements and cost reductions hav
 e occurred in persistent memory technology....\n\n---------------------\nI
 ntegrating FPGAs in a Heterogeneous and Portable Parallel Programming Mode
 l\n\nRodriguez-Canal, Torres, Gonzalez-Escribano\n\nThe programmability of
  FPGAs has been simplified by high level synthesis languages (HLS) and tec
 hniques, like OpenCL. These reduce the programming effort, but the user ha
 s to take care of details related to command queue management, data transf
 ers and synchronization. The Controller heterogeneous pr...\n\n-----------
 ----------\nScalable Comparative Visualization of Ensembles of Call Graphs
  Using CallFlow\n\nP. Kesavan, Bhatia, Bhatele, Brink, Pearce...\n\nOptimi
 zing the performance of large-scale parallel codes is critical for efficie
 nt utilization of computing resources. Code developers often explore multi
 ple execution parameters and are interested in detecting and understanding
  bottlenecks in different executions. They usually collect hierarchical ..
 .\n\n---------------------\nParallel Implementation of a Hybrid Particle-C
 ontinuum Finite Element Framework for Blood Clot Biomechanics\n\nTeeraratk
 ul, Mukherjee\n\nPathological blood clotting is the primary cause of major
  cardiovascular diseases. Here we present a distributed-memory parallelize
 d implementation of a hybrid particle-continuum fictitious domain finite e
 lement framework which is used to study flow and transport around a pathol
 ogically formed blood...\n\n---------------------\nResource-Efficient FPGA
  Pseudorandom Number Generation\n\nCilasun, Peng, Gokhale\n\nProbability d
 istributions play a critical role in diverse application domains. In simul
 ations, probability distributions model phenomena such as physical propert
 ies of materials, of processes, or of behaviors. For instance, molecular d
 ynamics codes often utilize the Maxwell-Boltzmann distribution fo...\n\n--
 -------------------\nDaLI: A Data Lifecycle Instrument Toward the Reproduc
 ibility of Scientific Research\n\nRunesha, Munakami\n\nWith the ever-incre
 asing volume, velocity and variety of data that is created from research l
 ab instruments come the challenges of meeting the increased demands for tr
 ansferring, storing, analyzing, sharing and publishing these data. To addr
 ess these challenges in data management and computation, th...\n\n--------
 -------------\nUnderstanding I/O behavior of Scientific Deep Learning Appl
 ications in HPC systems\n\nDevarajan, Zheng, Sun, Vishwanath\n\nDeep learn
 ing has been widely utilized in various science domains to achieve unprece
 dented results. These applications typically rely on massive datasets to t
 rain the networks. As the size of datasets grow rapidly, I/O becomes a maj
 or bottleneck in large scale distributed training. We characterize t...\n\
 n---------------------\nDFS on a Diet: Enabling Reduction Schemes on Distr
 ibuted File Systems\n\nWidodo, Abe, Kato\n\nThe selection of data reductio
 n schemes, crucial for data footprints on a distributed file system (DFS) 
 and for transferring big data, is usually limited to the schemes supported
  by the underlying platforms. If the platform's source code is available, 
 it might be possible to add user-favorite reduct...\n\n-------------------
 --\nAnalyzing Interconnect Congestion on a Production Dragonfly-Based Syst
 em\n\nKitson, Chunduri, Bhatele\n\nAs the HPC community continues along th
 e road to exascale, and HPC systems grow ever larger and busier, the quest
 ion of how network traffic on these systems affects application performanc
 e looms large. In order to fully address this question, the HPC community 
 needs a broadened understanding of the ...\n\n---------------------\nAccel
 erating GMRES with Deep Learning in Real-Time\n\nLuna, Blaschke\n\nDeep le
 arning methods show great promise, however, applications where simulation 
 data is expensive to obtain do not lend themselves easily to applications 
 of deep learning without incurring a high cost to produce data. Real-time 
 online learning is a novel strategy to minimize this cost: a model "lea...
 \n\n---------------------\nFast Scalable Implicit Solver with Convergence 
 of Physics-Based Simulation and Data-Driven Learning: Toward High-Fidelity
  Simulation with Digital Twin City\n\nIchimura, Fujita, Koyama, Kusakabe, 
 Minami...\n\nWe propose an HPC-based scalable implicit low-order unstructu
 red nonlinear finite-element solver that uses data generated during physic
 s-based simulations for data-driven learning. Here, a cost efficient preco
 nditioner is developed using the data-driven learning method for accelerat
 ing the iterative...\n\n---------------------\nEvaluating Adaptive Routing
  Performance on Large-Scale Megaﬂy Topology\n\nNewaz, Mollah, Faizi
 an, Tong\n\nThe Megaﬂy topology has recently been proposed as an e&
 #64259;cient, hierarchical way to interconnect large-scale high-performanc
 e computing systems. Megaﬂy networks may be constructed in various 
 group sizes and conﬁgurations, but it is challenging to maintain hi
 gh throughput pe...\n\n---------------------\nOrchestration of a Forecasti
 ng Chain for Forest Fire Prevention Using the LEXIS Cloud/HPC Platform\n\n
 Hayek, Ganne, Parodi, Parodi, D’Andrea...\n\nWe describe a first successfu
 l application of the LEXIS platform for advanced orchestration of complex 
 simulation and data-analysis workflows in mixed cloud/HPC environments. A 
 workflow for forest fire risk assessment based on the models WRF and RISIC
 O was executed, using IaaS cloud resources at LRZ...\n\n------------------
 ---\nMachine Learning for Data Transfer Anomaly Detection\n\nBhuiyan, Coop
 er, Arslan\n\nData transfer performance is critical for many science appli
 cations that rely on remote clusters to process the data. Despite the pres
 ence of high-speed research networks with up to 100 Gbps speeds, most data
  transfers obtain only a fraction of network bandwidth, due to a variety o
 f reasons. This pr...\n\n---------------------\nAnalysis of FastEddy Model
  Data on GPUs\n\nLiu, Suresh, Miller, Sauer\n\nData analysis of atmospheri
 c model outputs is often embarrassingly parallel and compute intensive, an
 d is traditionally done on central processing units (CPUs).  FastEddy is a
  General Purpose Graphical Processing Units (GPGPU) -based Large Eddy Simu
 lation (LES) atmospheric model developed at NCAR-Re...\n\n----------------
 -----\nA Workflow Hierarchy-Aware Fault Tolerance System\n\nBehera, Ahn, H
 erbein, Mueller, Rountree\n\nComplex scientific workflows present unpreced
 ented challenges to fault tolerance support in high-performance computing 
 (HPC). While existing solutions such as checkpoint/restart (C/R) and resou
 rce over-provisioning work well at the application level, they do not scal
 e to the demand by complex workfl...\n\n---------------------\nLong-Time S
 imulation of Temperature-Varying Conformations of COVID-19 Spike Glycoprot
 ein on IBM Supercomputers\n\nSong, Zhang, Han, Zhang, Deng\n\nWe investiga
 ted the conformational variations and phase transition properties of the s
 pike glycoprotein (S-protein) of the coronavirus SARS-CoV-2 at temperature
 s ranging from 3℃ to 300℃ on the IBM Power9-based AI supercomp
 uters. Our microsecond time-scale molecular dynamics simulations o...\n\n-
 --------------------\nA Hybrid Approach to Scientific Software Package Man
 agement on an HPC Cluster\n\nHu, Huang\n\nWe present a practical approach 
 for managing the scientific packages in the HPC cluster environment of a r
 esearch university environment that has a diverse user base. The primary g
 oal is to minimize the HPC operational team’s burden of installing and mai
 ntaining a large number of software packages t...\n\n---------------------
 \nEnabling Faster NGS Analysis on Optane-Based Heterogeneous Memory\n\nLuo
 , Guo, Ren, Wu, Li\n\nNext-Generation Sequencing (NGS) analysis technologi
 es are a pioneering approach for genome sequencing. The computation of NGS
  analysis exhibits a unique pattern, in which the execution requests a hig
 h density of small I/Os in the process de novo genome assembly. The small 
 I/Os can have a huge impac...\n\n\nRegistration Category: Tech Program Reg
  Pass, Exhibits Reg Pass
END:VEVENT
END:VCALENDAR