Hardware-friendly compression and hardware acceleration for transformer: A survey - Details

author：

Huang, Shizhen (Huang, Shizhen.) ^[1] (Scholars：黄世震) | Tang, Enhao (Tang, Enhao.) ^[2] | Li, Shun (Li, Shun.) ^[3] | Ping, Xiangzhan (Ping, Xiangzhan.) ^[4] | Chen, Ruiqi (Chen, Ruiqi.) ^[5]

Indexed by：

SCIE

Abstract：

The　transformer　model　has　recently　been　a　milestone　in　artificial　intelligence.　The　algorithm　has　enhanced　the　performance　of　tasks　such　as　Machine　Translation　and　Computer　Vision　to　a　level　previously　unattainable.　However,　the　transformer　model　has　a　strong　performance　but　also　requires　a　high　amount　of　memory　overhead　and　enormous　computing　power.　This　significantly　hinders　the　deployment　of　an　energy-efficient　transformer　system.　Due　to　the　high　parallelism,　low　latency,　and　low　power　consumption　of　field-programmable　gate　arrays　(FPGAs)　and　application　specific　integrated　circuits　(ASICs),　they　demonstrate　higher　energy　efficiency　than　Graphics　Processing　Units　(GPUs)　and　Central　Processing　Units　(CPUs).　Therefore,　FPGA　and　ASIC　are　widely　used　to　accelerate　deep　learning　algorithms.　Several　papers　have　addressed　the　issue　of　deploying　the　Transformer　on　dedicated　hardware　for　acceleration,　but　there　is　a　lack　of　comprehensive　studies　in　this　area.　Therefore,　we　summarize　the　transformer　model　compression　algorithm　based　on　the　hardware　accelerator　and　its　implementation　to　provide　a　comprehensive　overview　of　this　research　domain.　This　paper　first　introduces　the　transformer　model　framework　and　computation　process.　Secondly,　a　discussion　of　hardware-friendly　compression　algorithms　based　on　self-attention　and　Transformer　is　provided,　along　with　a　review　of　a　state-of-the-art　hardware　accelerator　framework.　Finally,　we　considered　some　promising　topics　in　transformer　hardware　acceleration,　such　as　a　high-level　design　framework　and　selecting　the　optimum　device　using　reinforcement　learning.

Keyword：

compression FPGA hardware accelerators self-attention transformer

Community：

[ 1 ] [Huang, Shizhen]Fuzhou Univ, Coll Phys & Informat Engn, Fuzhou 350116, Peoples R China
[ 2 ] [Tang, Enhao]Fuzhou Univ, Coll Phys & Informat Engn, Fuzhou 350116, Peoples R China
[ 3 ] [Li, Shun]Fuzhou Univ, Coll Phys & Informat Engn, Fuzhou 350116, Peoples R China
[ 4 ] [Ping, Xiangzhan]Chongqing Univ Posts & Telecommun, Dept Optoelect Informat Engn, Chongqing 400065, Peoples R China
[ 5 ] [Chen, Ruiqi]Fudan Univ, Zhangjiang Fudan Int Innovat Ctr, Shanghai 200433, Peoples R China

Reprint 's Address：

Email：

ruiqichen@ieee.org

Show more details

Related Keywords：

EXPERIMENTAL AND NUMERICAL INVESTIGATIONS ON SKEWED PLATE-TO-SHS X-JOINTS UNDER COMPRESSION
2018，ADVANCED STEEL CONSTRUCTION
Feature of flow stress of aluminum sheet used for can during hot compression
2000，Transactions of Nonferrous Metals Society of China (English Edition)
The Strategy Design of Compression and Transmission on cGML Spatial Data and Its Application in LBS
2009，5th International Conference on Wireless Communications, Networking and Mobile Computing, WiCOM 2009
Electric Load Data Compression and Classification Based on Deep Stacked Auto-Encoders
2019，ENERGIES

Source ：

ELECTRONIC RESEARCH ARCHIVE

ISSN： 2688-1594

Year： 2022

Issue： 10

Volume： 30

Page： 3755-3785

0 . 8

JCR@2022

1 . 0 0 0

JCR@2023

ESI Discipline： MATHEMATICS;

ESI HC Threshold：24

JCR Journal Grade：3

CAS Journal Grade：4

Cited Count：

WoS CC Cited Count： 2

SCOPUS Cited Count： 3

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 0

Affiliated Colleges：

物理与信息工程学院、微电子学院本学院/部未明确归属的数据

Get Fulltext

DOI Library Discovery Baidu Scholar Search Web of Science

Type
Departments

All Years Choose Year From to