Multi-modal feature fusion based on variational autoencoder for visual question answering - Details

author：

Chen, Liqing (Chen, Liqing.) ^[1] | Zhuo, Yifan (Zhuo, Yifan.) ^[2] | Wu, Yingjie (Wu, Yingjie.) ^[3] | Wang, Yilei (Wang, Yilei.) ^[4] (Scholars：王一蕾) | Zheng, Xianghan (Zheng, Xianghan.) ^[5] (Scholars：郑相涵)

Indexed by：

EI Scopus

Abstract：

Visual　Question　Answering　(VQA)　tasks　must　provide　correct　answers　to　the　questions　posed　by　given　images.　Such　requirement　has　been　a　wide　concern　since　this　task　was　presented.　VQA　consists　of　four　steps:　image　feature　extraction,　question　text　feature　extraction,　multi-modal　feature　fusion　and　answer　reasoning.　During　multimodal　feature　fusion,　outer　product　calculation　is　used　in　existing　models,　which　leads　to　excessive　model　parameters,　high　training　overhead,　and　slow　convergence.　To　avoid　these　problems,　we　applied　the　Variational　Autoencoder　(VAE)　method　to　calculate　the　probability　distribution　of　the　hidden　variables　of　image　and　question　text.　Furthermore,　we　designed　a　question　feature　hierarchy　method　based　on　the　traditional　attention　mechanism　model　and　VAE.　The　objective　is　to　investigate　deep　questions　and　image　correlation　features　to　improve　the　accuracy　of　VQA　tasks.　©　Springer　Nature　Switzerland　AG　2019.

Keyword：

Computer vision Extraction Feature extraction Image enhancement Learning systems Probability distributions

Community：

[ 1 ] [Chen, Liqing]College of Mathematics and Computer Science, Fuzhou University, Fuzhou; Fujian Province, China
[ 2 ] [Zhuo, Yifan]College of Mathematics and Computer Science, Fuzhou University, Fuzhou; Fujian Province, China
[ 3 ] [Wu, Yingjie]College of Mathematics and Computer Science, Fuzhou University, Fuzhou; Fujian Province, China
[ 4 ] [Wang, Yilei]College of Mathematics and Computer Science, Fuzhou University, Fuzhou; Fujian Province, China
[ 5 ] [Zheng, Xianghan]College of Mathematics and Computer Science, Fuzhou University, Fuzhou; Fujian Province, China

Reprint 's Address：

王一蕾
[wang, yilei]college of mathematics and computer science, fuzhou university, fuzhou; fujian province, china

Email：

yilei@fzu.edu.cn

Show more details

Version：

Multi-modal feature fusion based on variational autoencoder for visual question answering
2019，Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Related Keywords：

EAFNet: Feature Enhancement and Self-Adaptive Guided Feature Fusion for Object Detection in Haze Conditions
2023，3rd International Conference on Electronic Information Engineering and Computer, EIECT 2023
Natural Images Enhancement Using Structure Extraction and Retinex
2020，20th International Conference on Advanced Concepts for Intelligent Vision Systems, ACIVS 2020
Out-of-distribution detection based on multi-classifiers
2023，Cognitive Computation and Systems
Boundary-aware multi-task neural networks for agricultural field extraction from very high-resolution satellite images
2022，29th International Conference on Geoinformatics, Geoinformatics 2022

Source ：

ISSN： 0302-9743

Year： 2019

Volume： 11858 LNCS

Page： 657-669

Language： English

0 . 4 0 2

JCR@2005

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count： 1

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 2

Affiliated Colleges：

计算机与大数据学院、软件学院本学院/部未明确归属的数据

Get Fulltext

DOI Library Discovery Baidu Scholar Search Engineering Village

Type
Departments

All Years Choose Year From to