Multi-modal feature fusion based on variational autoencoder for visual question answering - Details

author：

Chen, L. (Chen, L..) ^[1] | Zhuo, Y. (Zhuo, Y..) ^[2] | Wu, Y. (Wu, Y..) ^[3] | Wang, Y. (Wang, Y..) ^[4] | Zheng, X. (Zheng, X..) ^[5]

Indexed by：

Scopus

Abstract：

Visual　Question　Answering　(VQA)　tasks　must　provide　correct　answers　to　the　questions　posed　by　given　images.　Such　requirement　has　been　a　wide　concern　since　this　task　was　presented.　VQA　consists　of　four　steps:　image　feature　extraction,　question　text　feature　extraction,　multi-modal　feature　fusion　and　answer　reasoning.　During　multimodal　feature　fusion,　outer　product　calculation　is　used　in　existing　models,　which　leads　to　excessive　model　parameters,　high　training　overhead,　and　slow　convergence.　To　avoid　these　problems,　we　applied　the　Variational　Autoencoder　(VAE)　method　to　calculate　the　probability　distribution　of　the　hidden　variables　of　image　and　question　text.　Furthermore,　we　designed　a　question　feature　hierarchy　method　based　on　the　traditional　attention　mechanism　model　and　VAE.　The　objective　is　to　investigate　deep　questions　and　image　correlation　features　to　improve　the　accuracy　of　VQA　tasks.　©　Springer　Nature　Switzerland　AG　2019.

Keyword：

Attention mechanism; Multi-modal feature fusion; Variational Auroencoder; Visual Question Answering

Community：

[ 1 ] [Chen, L.]College of Mathematics and Computer Science, Fuzhou University, Fuzhou, Fujian Province, China
[ 2 ] [Zhuo, Y.]College of Mathematics and Computer Science, Fuzhou University, Fuzhou, Fujian Province, China
[ 3 ] [Wu, Y.]College of Mathematics and Computer Science, Fuzhou University, Fuzhou, Fujian Province, China
[ 4 ] [Wang, Y.]College of Mathematics and Computer Science, Fuzhou University, Fuzhou, Fujian Province, China
[ 5 ] [Zheng, X.]College of Mathematics and Computer Science, Fuzhou University, Fuzhou, Fujian Province, China

Reprint 's Address：

[Wang, Y.]College of Mathematics and Computer Science, Fuzhou UniversityChina

Email：

yilei@fzu.edu.cn

Show more details

Related Keywords：

Source ：

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ISSN： 0302-9743

Year： 2019

Volume： 11858 LNCS

Page： 657-669

Language： English

0 . 4 0 2

JCR@2005

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count： 1

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 0

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search SCOPUS

Type
Departments

All Years Choose Year From to