Optimizing Average Precision using Weakly Supervised Data
Aseem Behl, Pritish Mohapatra, C. V. Jawahar and M. Pawan Kumar
Overview
We develop a novel latent AP-SVM that minimizes a carefully designed upper bound on the AP-based loss function over a weakly supervised dataset. Our approach is based on the hypothesis that in the challenging setting of weakly supervised learning, it becomes crucial to optimize the right accuracy measure. Using publicly available datasets, we demonstrate the advantage of our approach over standard loss-based binary classifiers on challenging problems in computer vision.
Optimizing Average Precision
* Given training dataset consists of n samples X along with the pairwise ranking matrix Y, we would like to learn the parameters of the classifier w such that the Average Precision (AP) over the training dataset is maximised.
* In order to maximize the AP, we minimize a loss function defined as Δ(Y;Y*) = 1 - AP(Y;Y*), where AP(Y;Y*) denotes the AP of the ranking matrix Y with respect to the true ranking Y*.
* The additional annotations for the positive and negative samples HP and HN respectively are unknown. Since the best value of the additional annotation for each positive sample needs to be imputed automatically, we consider these as latent variables. However, since a negative sample remains negative regardless of the value of the additional annotation, we need to consider all possible values of HN during parameter estimation.
Our Method: Latent AP-SVM
Significant difference between the
optimization corresponding to latent AP-SVM and the standard
latent structured SVM (latent SSVM) is that, the constraints of latent AP-SVM are specified using
the same values of the additional annotations. In contrast, in the constraints of latent SSVM problem, the values of the additional annotations for a correct
and incorrect ranking are independent of each other. This modification in our formulation provides an intuitive prediction rule, a tighter upper bound on the AP loss, and lends itself to efficient
optimization.
The local minimum or saddle point solution of latent AP-SVM problem can be obtained using the concave-convex procedure (CCCP) algorithm. The algorithm involves two main steps. In the first step, it imputes the best additional annotations HP of the positive samples given the current estimate of the parameters. In the second step, given the imputed values of HP, CCCP updates the parameters by solving the resulting convex optimization problem.
Results
We tested latent AP-SVM using two problems in computer vision that can be formulated as a weakly supervised learning problem. Traditionally, the parameters are learned by optimizing a surrogate like 0-1 loss (latent SVM) or the standard latent structured SVM (latent SSVM). Therefore, we provide a comparison of latent AP-SVM with the latent SVM and latent SSVM. The details of all the experiments are provided in our CVPR 2014 paper.
Action Classification
Input is a set of person images X along with the category of the actions being performed in each image. The hidden variables H model the bounding box of the person in the image. We use the standard poselet-based activations to specify the features.
We use the PASCAL VOC 2011 action classification dataset, which consists of 4846 images depicting 10 action classes. The dataset is divided into two subsets: 2424 'trainval' images for which we are provided the bounding boxes of the persons in the image together with their action class; and 2422 'test' images for which we are only provided with the person bounding boxes. The putative values of each latent variable are restricted to boxes obtained by a standard person detector.
Character Recognition in Natural Images
Input is a set of word images X along with the characters present in each image. The hidden variables H model the bounding box of the characters in the image. We use the histogram of oriented gradients (HOG) to specify the features.
We use the IIIT 5K-WORD scene text dataset, which consists of 5000 cropped word images from scene texts and born-digital images, which are divided into 2000 'trainval' images and 3000 'test' images. Each image is annotated with the corresponding word. We treat the bounding box of the characters as latent variables whose putative values restricted to boxes obtained by a standard character detector. Using this dataset, we perform binary classification for the 22 classes that contain at least 150 samples in the 'trainval' dataset.
Resources
All the code used in our experiments is available here.
Poselet features and meta files for PASCAL VOC 2011 action classification used in the experiments are available here and here respectively.
Please feel free to email me at aseem dot behl at gmail dot com in case of any difficulties while running the experiment.
Publications
Aseem Behl, C. V. Jawahar and M. Pawan Kumar
Optimizing Average Precision using Weakly Supervised Data
In Computer Vision and Pattern Recognition (CVPR), 2014
Aseem Behl, C. V. Jawahar and M. Pawan Kumar
Optimizing Average Precision using Weakly Supervised Data (supplementary material)
In Computer Vision and Pattern Recognition (CVPR), 2014
Aseem Behl, Pritish Mohapatra, C. V. Jawahar and M. Pawan Kumar
Optimizing Average Precision using Weakly Supervised Data
In Pattern Analysis and Machine Intelligence (PAMI), 2015