SW/HW Framework for for GASNet-enabled FPGA Hardware

Researchers from KAIST and Flapmax published a new technical paper titled “FSHMEM: Supporting Partitioned Global Address Space on FPGAs for Large-Scale Hardware Acceleration Infrastructure.”


“By providing highly efficient one-sided communication with globally shared memory space, Partitioned Global Address Space (PGAS) has become one of the most promising parallel computing models in high-performance computing (HPC). Meanwhile, FPGA is getting attention as an alternative compute platform for HPC systems with the benefit of custom computing and design flexibility. However, the exploration of PGAS has not been conducted on FPGAs, unlike the traditional message passing interface. This paper proposes FSHMEM, a software/hardware framework that enables the PGAS programming model on FPGAs. We implement the core functions of GASNet specification on FPGA for native PGAS integration in hardware, while its programming interface is designed to be highly compatible with legacy software. Our experiments show that FSHMEM achieves the peak bandwidth of 3813 MB/s, which is more than 95% of the theoretical maximum, outperforming the prior works by 9.5×. It records 0.35us and 0.59us latency for remote write and read operations, respectively. Finally, we conduct a case study on the two Intel D5005 FPGA nodes integrating Intel’s deep learning accelerator. The two-node system programmed by FSHMEM achieves 1.94× and 1.98× speedup for matrix multiplication and convolution operation, respectively, showing its scalability potential for HPC infrastructure.”

Find the technical paper here. Published July 2022.

Authors: Yashael Faith Arthanto, David Ojika, and Joo-Young Kim.

Publication: arXiv:2207.04625v1

Related Reading
What Future Processors Will Look Like
AMD CTO Mark Papermaster talks about why heterogeneous architectures will be needed to achieve improvements in PPA.
Making Sense Of New Edge-Inference Architectures
How to navigate a flood of confusing choices and terminology.
HBM3: Big Impact On Chip Design
New levels of system performance bring new tradeoffs.