Flexible Runtime Reconfigurable Computing Overlay Architecture and Optimization for Dataflow Applications
Event Type
Workshop
Extreme Scale Computing
Heterogeneous Systems
Parallel Programming Languages, Libraries, and Models
Portability
Resource Management and Scheduling
Scalable Computing
W
TimeWednesday, 11 November 202012:30pm - 12:55pm EDT
LocationTrack 1
DescriptionMany computationally intensive applications are accelerated on FPGAs following the stream computing, also called dataflow computing, paradigm. This entails that data is streamed through different components of a given application in wide deep pipelines to maximize throughput. One of the main drawbacks of this computing paradigm is that it consumes a large number of hardware resources.
Thus, in this work, we propose a partial runtime reconfigurable overlay on which to map any computationally intensive application given as a behavioral description for High-Level Synthesis (HLS) composed of multiple stages, which would typically fit the stream computing paradigm. This overlay uses the internal's FPGA BlockRAM to store the intermediate results of each stage in order to speed up the computation and time-multiplexes the different stages by reconfiguring the computational part.
This work also includes a design methodology to optimize the micro-architectural implementation of each stage in order to balance the dataflow architecture as well as generating systems with unique area vs. performance trade-offs. The proposed architecture and methodology has been prototyped on a Xilinx Zedboard mounting a Zynq FPGA using a variety of synthetic dataflows and a case study of a JPEG encoder is presented highlighting the benefits of it. The overlay will be made public and open source after the publication of this paper.
Thus, in this work, we propose a partial runtime reconfigurable overlay on which to map any computationally intensive application given as a behavioral description for High-Level Synthesis (HLS) composed of multiple stages, which would typically fit the stream computing paradigm. This overlay uses the internal's FPGA BlockRAM to store the intermediate results of each stage in order to speed up the computation and time-multiplexes the different stages by reconfiguring the computational part.
This work also includes a design methodology to optimize the micro-architectural implementation of each stage in order to balance the dataflow architecture as well as generating systems with unique area vs. performance trade-offs. The proposed architecture and methodology has been prototyped on a Xilinx Zedboard mounting a Zynq FPGA using a variety of synthetic dataflows and a case study of a JPEG encoder is presented highlighting the benefits of it. The overlay will be made public and open source after the publication of this paper.