Design of Efficient Deep Neural Networks on FPGAs for Low-Power and Low-Latency Applications, 10-R6211

Principal Investigators

Mike Koets

Inclusive Dates

10/01/21 to 04/01/23

Background

Machine learning has evolved to provide effective solutions to previously unsolvable problems in computer vision. While effective, machine learning is characterized by its reliance on powerful processors to compute large matrix multiplications requiring millions of computations for a single image in computer vision applications. This can be a limiting factor for deploying machine learning based algorithms onboard satellites which have more constrained size, weight, power and cost (SWaP-C) considerations than most terrestrial machine learning systems. Recent research has explored the use of low-bit precision mathematical representations to accelerate machine learning inference on radiation-tolerant Field Programmable Gate Arrays (FPGA). Using low precision mathematics, with four or less bits representing each value, allows for greater efficiency through reduced memory movement and greatly simplified mathematics which can be effectively accelerated on FPGAs. This greater efficiency does come at the cost of reduced accuracy if standard quantization techniques are used.

Approach

SwRI researched which low precision quantization techniques would achieve the desired level of accuracy on an object detection machine learning task while being more efficient than current, higher-precision based techniques. To accomplish this, the research explored various types of low-precision quantization, weight compression in the form of weight sparsity and novel deep learning architectures that minimize the overall data movement. We evaluated the performance of each technique by how much the accuracy of the network would decrease, as well as the performance gains achieved from each technique.

Accomplishments

This research resulted in the understanding of techniques for reducing an off-the-shelf efficient deep neural network to use low-bit precision. The accuracy, resource and performance final results are shown in Table 1. The algorithms compared are a yolov3 (you only look once) model that comes in the commercially available FPGA machine learning offering, and two machine learning models made under this research with activation precision of 2/3 and weight precision of six (6) which was found to be the best trade-off in resources to accuracy. The results show that the reduced precision models use significantly less resources and run faster on a space-rated FPGA. The accuracy is less than the non-low precision model, but that plans to be mitigated as the team explores better training techniques.

Table 1: Comparison of commercially available FPGA machine learning deployment compared to the one developed in this research.

Algorithm	Accuracy	Frames per second	Look-up Tables (LUT)	Block RAM (BRAM)	Digital Signal Processing (DSP)
yolov3	77.44	37.82	104322	510	1420
Yolo finn-a2w6	49.61	70.6	47441	283	0
Yolo Finn-a3w6	45.64	63.4	66325	306	0

Select Year:

Design of Efficient Deep Neural Networks on FPGAs for Low-Power and Low-Latency Applications, 10-R6211

Background

Approach

Accomplishments