Design of high parallel CNN accelerator based on FPGA for AIoT - Details

author：

Indexed by：

Scopus CSCD

Abstract：

To　tackle　the　challenge　of　applying　convolutional　neural　network　(CNN)　in　field-programmable　gate　array　(FPGA)　due　to　its　computational　complexity,　a　high-performance　CNN　hardware　accelerator　based　on　Verilog　hardware　description　language　was　designed,　which　utilizes　a　pipeline　architecture　with　three　parallel　dimensions　including　input　channels,　output　channels,　and　convolution　kernels.　Firstly,　two　multiply-and-accumulate　(MAC)　operations　were　packed　into　one　digital　signal　processing　(DSP)　block　of　FPGA　to　double　the　computation　rate　of　the　CNN　accelerator.　Secondly,　strategies　of　feature　map　block　partitioning　and　special　memory　arrangement　were　proposed　to　optimize　the　total　amount　of　off-chip　access　memory　and　reduce　the　pressure　on　FPGA　bandwidth.　Finally,　an　efficient　computational　array　combining　multiplicative-additive　tree　and　Winograd　fast　convolution　algorithm　was　designed　to　balance　hardware　resource　consumption　and　computational　performance.　The　high　parallel　CNN　accelerator　was　deployed　in　ZU3EG　of　Alinx,　using　the　YOLOv3-tiny　algorithm　as　the　test　object.　The　average　computing　performance　of　the　CNN　accelerator　is　127.5　giga　operations　per　second　(GOPS).　The　experimental　results　show　that　the　hardware　architecture　effectively　improves　the　computational　power　of　CNN　and　provides　better　performance　compared　with　other　existing　schemes　in　terms　of　power　consumption　and　the　efficiency　of　DSPs　and　block　random　access　memory　(BRAMs).　©　2022,　Beijing　University　of　Posts　and　Telecommunications.　All　rights　reserved.

Keyword：

artificial intelligence of things (AIoT) convolutional neural network (CNN) accelerator field-programmable gate array (FPGA) Winograd convolution

Community：

[ 1 ] [Zhijian, L.]School of Advanced Manufacturing, Fuzhou University, Quanzhou, 362251, China
[ 2 ] [Zhijian, L.]College of Physics and Information Engineering, Fuzhou University, Fuzhou, 350108, China
[ 3 ] [Xuewei, G.]School of Advanced Manufacturing, Fuzhou University, Quanzhou, 362251, China
[ 4 ] [Xiaopei, C.]School of Advanced Manufacturing, Fuzhou University, Quanzhou, 362251, China
[ 5 ] [Zhipeng, Z.]School of Advanced Manufacturing, Fuzhou University, Quanzhou, 362251, China
[ 6 ] [Xiaoyong, D.]School of Advanced Manufacturing, Fuzhou University, Quanzhou, 362251, China
[ 7 ] [Pingping, C.]College of Physics and Information Engineering, Fuzhou University, Fuzhou, 350108, China

Reprint 's Address：

[Pingping, C.]College of Physics and Information Engineering, China

Email：

ppchen.xm@gmail.com

Show more details

Related Keywords：

A survey of field programmable gate array (FPGA)-based graph convolutional neural network accelerators: challenges and opportunities
2022，PEERJ COMPUTER SCIENCE
A New ACD-OMP Accelerator With Clustered Computing Look-Ahead
2023，IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS
Clock-Aware Placement for Large-Scale Heterogeneous FPGAs
2020，IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS
A Novel Low-Power Synchronous Preamble Data Line Chip Design for Oscillator Control Interface
2020，ELECTRONICS
A Two-Stage Method for Routing in Field-Programmable Gate Arrays with Time-Division Multiplexing
2022，TSINGHUA SCIENCE AND TECHNOLOGY

Source ：

Journal of China Universities of Posts and Telecommunications

ISSN： 1005-8885

Year： 2022

Issue： 5

Volume： 29

Page： 1-9

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count： 3

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 1

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search SCOPUS

Type
Departments

All Years Choose Year From to