Design of high parallel CNN accelerator based on FPGA for AIoT - Details

author：

Indexed by：

EI Scopus CSCD

Abstract：

To　tackle　the　challenge　of　applying　convolutional　neural　network　(CNN)　in　field-programmable　gate　array　(FPGA)　due　to　its　computational　complexity,　a　high-performance　CNN　hardware　accelerator　based　on　Verilog　hardware　description　language　was　designed,　which　utilizes　a　pipeline　architecture　with　three　parallel　dimensions　including　input　channels,　output　channels,　and　convolution　kernels.　Firstly,　two　multiply-and-accumulate　(MAC)　operations　were　packed　into　one　digital　signal　processing　(DSP)　block　of　FPGA　to　double　the　computation　rate　of　the　CNN　accelerator.　Secondly,　strategies　of　feature　map　block　partitioning　and　special　memory　arrangement　were　proposed　to　optimize　the　total　amount　of　off-chip　access　memory　and　reduce　the　pressure　on　FPGA　bandwidth.　Finally,　an　efficient　computational　array　combining　multiplicative-additive　tree　and　Winograd　fast　convolution　algorithm　was　designed　to　balance　hardware　resource　consumption　and　computational　performance.　The　high　parallel　CNN　accelerator　was　deployed　in　ZU3EG　of　Alinx,　using　the　YOLOv3-tiny　algorithm　as　the　test　object.　The　average　computing　performance　of　the　CNN　accelerator　is　127.5　giga　operations　per　second　(GOPS).　The　experimental　results　show　that　the　hardware　architecture　effectively　improves　the　computational　power　of　CNN　and　provides　better　performance　compared　with　other　existing　schemes　in　terms　of　power　consumption　and　the　efficiency　of　DSPs　and　block　random　access　memory　(BRAMs).　©　2022,　Beijing　University　of　Posts　and　Telecommunications.　All　rights　reserved.

Keyword：

Acceleration Computational efficiency Computer hardware description languages Convolution Convolutional neural networks Digital signal processing Energy efficiency Field programmable gate arrays (FPGA) Integrated circuit design Logic gates Memory architecture Network architecture Random access storage Trees (mathematics)

Community：

[ 1 ] [Zhijian, Lin]School of Advanced Manufacturing, Fuzhou University, Quanzhou; 362251, China
[ 2 ] [Zhijian, Lin]College of Physics and Information Engineering, Fuzhou University, Fuzhou; 350108, China
[ 3 ] [Xuewei, Gao]School of Advanced Manufacturing, Fuzhou University, Quanzhou; 362251, China
[ 4 ] [Xiaopei, Chen]School of Advanced Manufacturing, Fuzhou University, Quanzhou; 362251, China
[ 5 ] [Zhipeng, Zhu]School of Advanced Manufacturing, Fuzhou University, Quanzhou; 362251, China
[ 6 ] [Xiaoyong, Du]School of Advanced Manufacturing, Fuzhou University, Quanzhou; 362251, China
[ 7 ] [Pingping, Chen]College of Physics and Information Engineering, Fuzhou University, Fuzhou; 350108, China

Reprint 's Address：

Email：

Show more details

Version：

Design of high parallel CNN accelerator based on FPGA for AIoT
2022，Journal of China Universities of Posts and Telecommunications

Related Keywords：

H-GAT: A Hardware-Efficient Accelerator for Graph Attention Networks
2023，Journal of Applied Science and Engineering (Taiwan)
IF filter design and implementation of FPGA
2009，1st International Conference on Information Science and Engineering, ICISE2009
FPGA Accelerator Design for License Plate Recognition Based on 1BIT Convolutional Neural Network
2020，2020 International Conference on Computer Science and Communication Technology, ICCSCT 2020
An design of the 16-order FIR digital filter based on FPGA
2009，1st International Conference on Information Science and Engineering, ICISE2009

Source ：

Journal of China Universities of Posts and Telecommunications

ISSN： 1005-8885

CN： 11-3486/TN

Year： 2022

Issue： 5

Volume： 29

Page： 1-9

Cited Count：

WoS CC Cited Count： 0

SCOPUS Cited Count： 3

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 3

Affiliated Colleges：

物理与信息工程学院、微电子学院本学院/部未明确归属的数据

先进制造学院本学院/部未明确归属的数据

Get Fulltext

DOI Library Discovery Baidu Scholar Search Engineering Village

Type
Departments

All Years Choose Year From to