• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
成果搜索

author:

Zhijian, Lin (Zhijian, Lin.) [1] (Scholars:林志坚) | Xuewei, Gao (Xuewei, Gao.) [2] | Xiaopei, Chen (Xiaopei, Chen.) [3] | Zhipeng, Zhu (Zhipeng, Zhu.) [4] | Xiaoyong, Du (Xiaoyong, Du.) [5] | Pingping, Chen (Pingping, Chen.) [6] (Scholars:陈平平)

Indexed by:

EI Scopus CSCD

Abstract:

To tackle the challenge of applying convolutional neural network (CNN) in field-programmable gate array (FPGA) due to its computational complexity, a high-performance CNN hardware accelerator based on Verilog hardware description language was designed, which utilizes a pipeline architecture with three parallel dimensions including input channels, output channels, and convolution kernels. Firstly, two multiply-and-accumulate (MAC) operations were packed into one digital signal processing (DSP) block of FPGA to double the computation rate of the CNN accelerator. Secondly, strategies of feature map block partitioning and special memory arrangement were proposed to optimize the total amount of off-chip access memory and reduce the pressure on FPGA bandwidth. Finally, an efficient computational array combining multiplicative-additive tree and Winograd fast convolution algorithm was designed to balance hardware resource consumption and computational performance. The high parallel CNN accelerator was deployed in ZU3EG of Alinx, using the YOLOv3-tiny algorithm as the test object. The average computing performance of the CNN accelerator is 127.5 giga operations per second (GOPS). The experimental results show that the hardware architecture effectively improves the computational power of CNN and provides better performance compared with other existing schemes in terms of power consumption and the efficiency of DSPs and block random access memory (BRAMs). © 2022, Beijing University of Posts and Telecommunications. All rights reserved.

Keyword:

Acceleration Computational efficiency Computer hardware description languages Convolution Convolutional neural networks Digital signal processing Energy efficiency Field programmable gate arrays (FPGA) Integrated circuit design Logic gates Memory architecture Network architecture Random access storage Trees (mathematics)

Community:

  • [ 1 ] [Zhijian, Lin]School of Advanced Manufacturing, Fuzhou University, Quanzhou; 362251, China
  • [ 2 ] [Zhijian, Lin]College of Physics and Information Engineering, Fuzhou University, Fuzhou; 350108, China
  • [ 3 ] [Xuewei, Gao]School of Advanced Manufacturing, Fuzhou University, Quanzhou; 362251, China
  • [ 4 ] [Xiaopei, Chen]School of Advanced Manufacturing, Fuzhou University, Quanzhou; 362251, China
  • [ 5 ] [Zhipeng, Zhu]School of Advanced Manufacturing, Fuzhou University, Quanzhou; 362251, China
  • [ 6 ] [Xiaoyong, Du]School of Advanced Manufacturing, Fuzhou University, Quanzhou; 362251, China
  • [ 7 ] [Pingping, Chen]College of Physics and Information Engineering, Fuzhou University, Fuzhou; 350108, China

Reprint 's Address:

Email:

Show more details

Version:

Related Keywords:

Related Article:

Source :

Journal of China Universities of Posts and Telecommunications

ISSN: 1005-8885

CN: 11-3486/TN

Year: 2022

Issue: 5

Volume: 29

Page: 1-9

Cited Count:

WoS CC Cited Count: 0

SCOPUS Cited Count: 3

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 3

Online/Total:312/9886244
Address:FZU Library(No.2 Xuyuan Road, Fuzhou, Fujian, PRC Post Code:350116) Contact Us:0591-22865326
Copyright:FZU Library Technical Support:Beijing Aegean Software Co., Ltd. 闽ICP备05005463号-1