The primary purpose of this workshop is to hold the 2nd edition of the Visual Question Answering Challenge on the 2nd edition (v2.0) of the VQA dataset introduced in Goyal et al., CVPR 2017. The 1st edition of the VQA Challenge was organized in CVPR 2016 on the 1st edition (v1.0) of the VQA dataset introduced in Antol et al., ICCV 2015. VQA v2.0 dataset is a more balanced version of VQA v1.0 which significantly reduces the language biases. VQA v2.0 is about twice the size of VQA v1.0.
Our idea in creating this new "balanced" VQA dataset is the following: For every (image, question, answer) triplet (I,Q,A) in the VQA v1.0 dataset, we identify an image I’ that has an answer A’ to Q such that A and A’ are different. Both the old (I,Q,A) and the new (I’,Q,A’) triplets are present in the VQA v2.0 dataset balancing the VQA v1.0 dataset on a per question basis. Since I and I’ are semantically similar, a VQA model will have to understand the subtle differences between I and I’ to provide the right answer to both images. It cannot succeed as easily by making "guesses" based on the language alone.
This workshop will provide an opportunity to benchmark algorithms on VQA v2.0 and to identify state-of-the-art algorithms that need to truly understand the image content in order to perform well on this balanced VQA dataset. A secondary goal of this workshop is to continue to bring together researchers interested in Visual Question Answering to share state-of-the-art approaches, best practices, and future directions in multi-modal AI.
In addition to invited talks from established researchers, we invite submissions of extended abstracts of at most 2 pages describing work in the areas relevant to Visual Question Answering such as: Visual Question Answering, (Textual) Question Answering, Dialog, Commonsense knowledge, Video Question Answering, Image/Video Captioning, Language + Vision. Accepted submissions will be presented as posters at the workshop. The workshop will be held on July 26th, 2017 at the IEEE Conference on Computer Vision and Pattern Recognition, 2017.