Viterbi Faculty of Electrical Engineering, Technion
Neural Network Quantization for integer-only inferencing on FPGA
Convolutional Neural Networks (CNN) are very popular in many fields including computer vision, speech recognition, natural language processing, autonomous vehicles e.t.c. Though deep learning leads to groundbreaking performance in these domains, the networks used are very demanding computationally and are far from real-time even on a GPU, which is not power efficient and therefore does not suit low power systems such as mobile devices. Prevailing this challenge, some solutions have been proposed for quantizing the weights and activations of these networks, which accelerate the runtime significantly. This acceleration comes at the cost of a larger error. In this work, we will survey several approaches to tackle these challenges. We introduce the idea of noise injection as a mean to mitigate quantization error. We investigate both uniform and nonuniform noise injection, and quantization by performing extensive experiments on various tasks such as classification and regression. We will demonstrate an FPGA implantation of both regression and classification tasks, for uniform quantization, while maintaining integer-only arithmetic. Finally, we show multi-FPGA implementation as a mean of runtime reduction.
* MSc seminar under supervision of Prof. Avi Mendelson.