• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
成果搜索

author:

Chen, L. (Chen, L..) [1] | Zhuo, Y. (Zhuo, Y..) [2] | Wu, Y. (Wu, Y..) [3] | Wang, Y. (Wang, Y..) [4] | Zheng, X. (Zheng, X..) [5]

Indexed by:

Scopus

Abstract:

Visual Question Answering (VQA) tasks must provide correct answers to the questions posed by given images. Such requirement has been a wide concern since this task was presented. VQA consists of four steps: image feature extraction, question text feature extraction, multi-modal feature fusion and answer reasoning. During multimodal feature fusion, outer product calculation is used in existing models, which leads to excessive model parameters, high training overhead, and slow convergence. To avoid these problems, we applied the Variational Autoencoder (VAE) method to calculate the probability distribution of the hidden variables of image and question text. Furthermore, we designed a question feature hierarchy method based on the traditional attention mechanism model and VAE. The objective is to investigate deep questions and image correlation features to improve the accuracy of VQA tasks. © Springer Nature Switzerland AG 2019.

Keyword:

Attention mechanism; Multi-modal feature fusion; Variational Auroencoder; Visual Question Answering

Community:

  • [ 1 ] [Chen, L.]College of Mathematics and Computer Science, Fuzhou University, Fuzhou, Fujian Province, China
  • [ 2 ] [Zhuo, Y.]College of Mathematics and Computer Science, Fuzhou University, Fuzhou, Fujian Province, China
  • [ 3 ] [Wu, Y.]College of Mathematics and Computer Science, Fuzhou University, Fuzhou, Fujian Province, China
  • [ 4 ] [Wang, Y.]College of Mathematics and Computer Science, Fuzhou University, Fuzhou, Fujian Province, China
  • [ 5 ] [Zheng, X.]College of Mathematics and Computer Science, Fuzhou University, Fuzhou, Fujian Province, China

Reprint 's Address:

  • [Wang, Y.]College of Mathematics and Computer Science, Fuzhou UniversityChina

Show more details

Related Keywords:

Related Article:

Source :

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ISSN: 0302-9743

Year: 2019

Volume: 11858 LNCS

Page: 657-669

Language: English

0 . 4 0 2

JCR@2005

Cited Count:

WoS CC Cited Count:

SCOPUS Cited Count: 1

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 2

Affiliated Colleges:

Online/Total:441/9730651
Address:FZU Library(No.2 Xuyuan Road, Fuzhou, Fujian, PRC Post Code:350116) Contact Us:0591-22865326
Copyright:FZU Library Technical Support:Beijing Aegean Software Co., Ltd. 闽ICP备05005463号-1