Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Huntersxsx/RIS-Learning-List

Folders and files

NameName
Last commit message
Last commit date

Latest commit

History

12 Commits

Repository files navigation

RIS-Learning-List

Introduction

This repository introduces Referring Image Segmentation task, and collects some related works.

Content

Definition

Referring Image Segmentation (RIS) is a challenging problem at the intersection of computer vision and natural language processing. Given an image and a natural language expression, the goal is to produce a segmentation mask in the image corresponding to the objects referred by the the natural language expression.

Datsets

  • RefCOCO: It contains 19,994 images with 142,210 referring expressions for 50,000 objects, which are collected from the MSCOCO via a two-player game. The dataset is split into 120,624 train, 10,834 validation, 5,657 test A, and 5,095 test B samples, respectively.
  • RefCOCO+: It contains 141,564 language expressions with 49,856 objects in 19,992 images. The datasetis split into train, validation, test A, and test B with 120,624, 10,758, 5,726, and 4,889 samples, respectively. Compared with RefCOCO dataset, some kinds of absolute-location words are excluded from the RefCOCO+ dataset.
  • G-Ref: It includes 104,560 referring expressions for 54,822 objects in 26,711 images.
  • Expressions in RefCOCO and RefCOCO+ are very succinct (containing 3.5 words on average). In contrast, expressionsin G-Ref are more complex (containing 8.4 words on average). Conversely, RefCOCO and RefCOCO+ tend to have more objects of the same category per image (3.9 on average) compared to G-Ref (1.6 on average).

Evaluation Metric

  • overall IoU: It is the total intersection area divided by the total union area, where both intersection area and union area are accumulated over all test samples (each test sample is an image and a referential expression).
  • mean IoU: It is the IoU between the prediction and ground truth averaged across all test samples.
  • Precision@X: It measures the percentage of test images with an IoU score higher than the threshold X ∈ {0.5, 0.6, 0.7, 0.8, 0.9}.

Related Works

Performance

Reference

MarkMoHR / Awesome-Referring-Image-Segmentation

About

Related papers about Referring Image Segmentation (RIS)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

AltStyle によって変換されたページ (->オリジナル) /