Indexed by:
Abstract:
Visual Question Answering (VQA) tasks must provide correct answers to the questions posed by given images. Such requirement has been a wide concern since this task was presented. VQA consists of four steps: image feature extraction, question text feature extraction, multi-modal feature fusion and answer reasoning. During multimodal feature fusion, outer product calculation is used in existing models, which leads to excessive model parameters, high training overhead, and slow convergence. To avoid these problems, we applied the Variational Autoencoder (VAE) method to calculate the probability distribution of the hidden variables of image and question text. Furthermore, we designed a question feature hierarchy method based on the traditional attention mechanism model and VAE. The objective is to investigate deep questions and image correlation features to improve the accuracy of VQA tasks. © Springer Nature Switzerland AG 2019.
Keyword:
Reprint 's Address:
Email:
Version:
Source :
ISSN: 0302-9743
Year: 2019
Volume: 11858 LNCS
Page: 657-669
Language: English
0 . 4 0 2
JCR@2005
Cited Count:
SCOPUS Cited Count: 1
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 6
Affiliated Colleges: