Print Email Facebook Twitter Streaming Integer Extensions for Snitch Title Streaming Integer Extensions for Snitch Author Sun, Chen (TU Delft Electrical Engineering, Mathematics and Computer Science) Contributor Wong, J.S.S.M. (mentor) Mazzola, Sergio (graduation committee) van Leuken, T.G.R.M. (graduation committee) Degree granting institution Delft University of Technology Programme Computer Engineering Date 2022-09-30 Abstract The prosperity of the Internet-of-Things (IoT) imposes increasing demand on endpoint microcontroller-based devices' performance and energy efficiency. The MCUs are demanded to process the raw data acquired from the sensors with the integer-based workload, such as digital signal processing (DSP) algorithms and quantized neural network (QNN) inference. Snitch is a tiny RV32I control core based on RISC-V open-source instruction set architecture. Currently, the Snitch system built around the Snitch core aims to achieve high performance in floating-point applications. Novel hardware extensions have been implemented in its floating-point subsystem to achieve high floating-point unit (FPU) utilization, such as stream semantic registers (SSRs) and floating-point repetition (FREP) hardware loop. However, it only has RV32IM instruction set support for integer computation, which does not satisfy the increasing demand from the integer workload we mentioned. In this work, we present a unified Snitch architecture with integer extensions targeting integer workload acceleration. Some existing custom extensions to address performance bottlenecks in DSP and QNN applications were proposed, which are Xpulpimg ISA and sub-byte single-instruction-multiple-data (SIMD) ISA, respectively. Both extensions are built on the outdated version of Snitch in another many-core system Mempool. In our work, we first integrated the DSP-oriented ISA extension Xpulpimg and the sub-byte SIMD ISA extension into the mainline Snitch. Then we extended the existing floating-point SSR to have integer support. To evaluate the proposed extensions, we benchmarked the Snitch core complex (CC) with integer matrix multiplication algorithms and compared the performance between the baseline RV32IM and our extensions. A speedup of 5.9$\times$, 22.6$\times$, and 77.4$\times$ in terms of MACs/cycle with respect to the baseline was measured for 32-bit, 8-bit and 4-bit data sizes, respectively. Post-synthesis figures have been obtained from GlobalFoundries 22 nm technology for area and timing evaluations. Our integer extensions only introduced 12\% area overhead compared with the original FP-capable Snitch CC, and they led to no measurable impact in terms of the maximum effective frequency with FP extensions enabled. Subject RISC-VSnitchISA extensionsComputer ArchitectureStream Semantic Registers To reference this document use: http://resolver.tudelft.nl/uuid:8c35b3a9-60c3-4989-9583-d8020bcfa31f Part of collection Student theses Document type master thesis Rights © 2022 Chen Sun Files PDF _2022_10_04_updated_final ... ension.pdf 1.66 MB Close viewer /islandora/object/uuid:8c35b3a9-60c3-4989-9583-d8020bcfa31f/datastream/OBJ/view