Feeding High-Bandwidth Streaming-Based FPGA Accelerators

Mulder, Y.T.B.

Feeding High-Bandwidth Streaming-Based FPGA Accelerators

Title

Feeding High-Bandwidth Streaming-Based FPGA Accelerators

Author

Mulder, Y.T.B. (TU Delft Electrical Engineering, Mathematics and Computer Science)

Contributor

Hofstee, Peter (mentor)

Degree granting institution

Delft University of Technology

Programme

Computer Engineering

Date

2018-01-29

Abstract

A new class of accelerator interfaces has signi cant implications on system architecture. An order of magnitude more bandwidth forces us to reconsider FPGA design. OpenCAPI is a new interconnect standard that enables attaching FPGAs coherently to a high-bandwidth, low- latency interface. Keeping up with this bandwidth poses new challenges for the design of accelerators, and the logic feeding them.

This thesis is conducted as part of a group project, where three other master students investigate database operator accelerators. This thesis focuses on the logic to feed the accelerators, by designing a recon gurable multi-stream bu er architecture. By generalizing across multiple common streaming-like accelerator access patterns, an interface consisting of multiple read ports with a smaller than cache line granularity is desired. At the same time, multiple read ports are allowed to request any stream, including reading across a cache line boundary.

The proposed architecture exploits di erent memory primitives available on the latest genera- tion of Xilinx FPGAs. By combining a traditional multi-read port approach for data duplication with a second level of bu ering, a hierarchy typically found in caches, an architecture is pro- posed which can supply data from 64 streams to eight read ports without any access pattern restrictions.

A correct-by-construction design methodology was used to simplify the validation of the design and to speedup the implementation phase. At the same time, the design methodology is doc- umented and examples are provided for ease of adoption. With the design methodology, the proposed architecture has been implemented and is accompanied by a validation framework.

Various con gurations of the multi-stream bu er have been tested. Con gurations up to 64 streams with four read ports meet timing with an AFU request-to-response latency of ve cycles. The largest con guration with 64 streams and eight read ports fails timing. Limiting factors are the inherent architecture of FPGAs, where memories are physically located in speci c columns. This makes extracting data complex, especially at the target frequencies of 200 MHz and 400 MHz. Wires are scattered across the FPGA and wire delay becomes dominant.

FPGA design at increasing bandwidths requires new design approaches. Synthesis results are no guarantee for the implemented design, and depending on the design size, could indicate a very optimistic operating frequency. Therefore, designing accelerators to keep up with an order of magnitude more bandwidth compared to the current state-of-the-art is complex, and requires carefully thought out accelerator cores, combined with an interface capable of feeding it.

Subject

OpenCAPI
FPGA
Streaming
HPC
Heterogeneous
Low-latency
High-bandwidth
Streaming-based

To reference this document use:

http://resolver.tudelft.nl/uuid:75dd920a-0e50-49c9-9982-70ef7dab7a92

Bibliographical note

ISBN 978-94-6186-886-2

Part of collection

Student theses

Document type

master thesis

Rights

Files

PDF

YTB_Mulder_MSc_Thesis_v1.0.1.pdf

5.07 MB

Close viewer