Name	Name	Last commit message	Last commit date
Latest commit History 12 Commits
img	img
BibTex.md	BibTex.md
README.md	README.md

RIS-Learning-List

Introduction

This repository introduces Referring Image Segmentation task, and collects some related works.

Content

Definition
Dataset
Evaluation Metric
Related Works
Performance
Reference

Definition

Referring Image Segmentation (RIS) is a challenging problem at the intersection of computer vision and natural language processing. Given an image and a natural language expression, the goal is to produce a segmentation mask in the image corresponding to the objects referred by the the natural language expression.

Datsets

RefCOCO: It contains 19,994 images with 142,210 referring expressions for 50,000 objects, which are collected from the MSCOCO via a two-player game. The dataset is split into 120,624 train, 10,834 validation, 5,657 test A, and 5,095 test B samples, respectively.
RefCOCO+: It contains 141,564 language expressions with 49,856 objects in 19,992 images. The datasetis split into train, validation, test A, and test B with 120,624, 10,758, 5,726, and 4,889 samples, respectively. Compared with RefCOCO dataset, some kinds of absolute-location words are excluded from the RefCOCO+ dataset.
G-Ref: It includes 104,560 referring expressions for 54,822 objects in 26,711 images.
Expressions in RefCOCO and RefCOCO+ are very succinct (containing 3.5 words on average). In contrast, expressionsin G-Ref are more complex (containing 8.4 words on average). Conversely, RefCOCO and RefCOCO+ tend to have more objects of the same category per image (3.9 on average) compared to G-Ref (1.6 on average).

Evaluation Metric

overall IoU: It is the total intersection area divided by the total union area, where both intersection area and union area are accumulated over all test samples (each test sample is an image and a referential expression).
mean IoU: It is the IoU between the prediction and ground truth averaged across all test samples.
Precision@X: It measures the percentage of test images with an IoU score higher than the threshold X ∈ {0.5, 0.6, 0.7, 0.8, 0.9}.

Related Works

MagNet: Mask Grounding for Referring Image Segmentation. in Arxiv 2023.
MRES: Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentation. in Arxiv 2023. code
Towards Generalizable Referring Image Segmentation via Target Prompt and Visual Coherence. in Arxiv 2023.
BTMAE: Synchronizing Vision and Language: Bidirectional Token-Masking AutoEncoder for Referring Image Segmentation. in Arxiv 2023.
MARIS: MARIS: Referring Image Segmentation via Mutual-Aware Attention Features. in Arxiv 2023.
Omni-RES: Towards Omni-supervised Referring Expression Segmentation. in Arxiv 2023. code
JMCELN: Referring Image Segmentation via Joint Mask Contextual Embedding Learning and Progressive Alignment Network. in EMNLP 2023. code
TAS: Text Augmented Spatial-aware Zero-shot Referring Image Segmentation. in EMNLP 2023 Findings.
CVMN: Unsupervised Domain Adaptation for Referring Semantic Segmentation. in ACM MM 2023. code
CARIS: CARIS: Context-Aware Referring Image Segmentation. in ACM MM 2023. code
Shatter and Gather: Shatter and Gather: Learning Referring Image Segmentation with Text Supervision. in ICCV 2023.
Group-RES: Advancing Referring Expression Segmentation Beyond Single Image. in ICCV 2023. code
ETRIS: Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation. in ICCV 2023. code
TRIS: Referring Image Segmentation Using Text Supervision. in ICCV 2023. code
RIS-DMMI: Beyond One-to-One: Rethinking the Referring Image Segmentation. in ICCV 2023. code
BKINet: Bilateral Knowledge Interaction Network for Referring Image Segmentation. in TMM 2023. code
SLViT: SLViT: Scale-Wise Language-Guided Vision Transformer for Referring Image Segmentation. in IJCAI 2023. code
WiCo: WiCo: Win-win Cooperation of Bottom-up and Top-down Referring Image Segmentation. in IJCAI 2023.
CM-MaskSD: CM-MaskSD: Cross-Modality Masked Self-Distillation for Referring Image Segmentation. in Arxiv 2023.
CGFormer: Contrastive Grouping with Transformer for Referring Image Segmentation. in CVPR 2023. code
Partial-RES: Learning to Segment Every Referring Object Point by Point. in CVPR 2023. code
Zero-shot RIS: Zero-shot Referring Image Segmentation with Global-Local Context Features. in CVPR 2023. code
MCRES: Meta Compositional Referring Expression Segmentation. in CVPR 2023.
PolyFormer: PolyFormer: Referring Image Segmentation as Sequential Polygon Generation. in CVPR 2023. project
GRES: Generalized Referring Expression Segmentation. in CVPR 2023. project
SADLR: Semantics-Aware Dynamic Localization and Refinement for Referring Image Segmentation. in AAAI 2023.
PCAN: Position-Aware Contrastive Alignment for Referring Image Segmentation. in Arxiv 2022.
CoupAlign: CoupAlign: Coupling Word-Pixel with Sentence-Mask Alignments for Referring Image Segmentation. in NeurIPS 2022. code
CRSCNet: Cross-Modal Recurrent Semantic Comprehension for Referring Image Segmentation. in TCSVT 2022.
LGCT: Local-global coordination with transformers for referring image segmentation. in Neurocomputing 2022.
RES&REG: A Unified Mutual Supervision Framework for Referring Expression Segmentation and Generation. in Arxiv 2022.
VLT: VLT: Vision-Language Transformer and Query Generation for Referring Segmentation. in TPAMI 2022. code
Learning From Box Annotations for Referring Image Segmentation. in TNNLS 2022. code
Instance-Specific Feature Propagation for Referring Segmentation. in TMM 2022.
SeqTR: SeqTR: A Simple Yet Universal Network for Visual Grounding. in ECCV 2022. code
LAVT: LAVT: Language-Aware Vision Transformer for Referring Image Segmentation. in CVPR 2022. code
CRIS: CRIS: CLIP-Driven Referring Image Segmentation. in CVPR 2022. code
CRIS: CRIS: CLIP-Driven Referring Image Segmentation. in CVPR 2022. code
ReSTR: ReSTR: Convolution-free Referring Image Segmentation Using Transformers. in CVPR 2022. project
Bidirectional relationship inferring network for referring image localization and segmentation. in TNNLS 2021.
RefTR: Referring Transformer: A One-step Approach to Multi-task Visual Grounding. in NeurIPS 2021.
TV-Net: Two-stage Visual Cues Enhancement Network for Referring Image Segmentation. in ACM MM 2021. code
VLT: Vision-Language Transformer and Query Generation for Referring Segmentation. in ICCV 2021. code
MDETR: MDETR - Modulated Detection for End-to-End Multi-Modal Understanding. in ICCV 2021. code
EFNet: Encoder Fusion Network with Co-Attention Embedding for Referring Image Segmentation. in CVPR 2021. code
BUSNet: Bottom-Up Shift and Reasoning for Referring Image Segmentation. in CVPR 2021. code
LTS: Locate then Segment: A Strong Pipeline for Referring Image Segmentation. in CVPR 2021.
CGAN: Cascade Grouped Attention Network for Referring Expression Segmentation. in ACM MM 2020.
LSCM: Linguistic Structure Guided Context Modeling for Referring Image Segmentation. in ECCV 2020.
CMPC-Refseg: Referring Image Segmentation via Cross-Modal Progressive Comprehension. in CVPR 2020. code
BRINet: Bi-directional Relationship Inferring Network for Referring Image Segmentation. in CVPR 2020. code
PhraseCut: PhraseCut: Language-based Image Segmentation in the Wild. in CVPR 2020. code
MCN: Multi-task Collaborative Network for Joint Referring Expression Comprehension and Segmentation. in CVPR 2020. code
Dual Convolutional LSTM Network for Referring Image Segmentation. in TMM 2020.
lang2seg: Referring Expression Object Segmentation with Caption-Aware Consistency. in BMVC 2019. code
STEP: See-Through-Text Grouping for Referring Image Segmentation. in ICCV 2019.
CMSA-Net: Cross-Modal Self-Attention Network for Referring Image Segmentation. in CVPR 2019. code
KWA: Key-Word-Aware Network for Referring Expression Image Segmentation. in ECCV 2018. code
DMN: Dynamic Multimodal Instance Segmentation Guided by Natural Language Queries. in ECCV 2018. code
RRN: Referring Image Segmentation via Recurrent Refinement Networks. in CVPR 2018. code
MAttNet: MAttNet: Modular Attention Network for Referring Expression Comprehension. in CVPR 2018. code
RMI: Recurrent Multimodal Interaction for Referring Image Segmentation. in ICCV 2017. code
LSTM-CNN: Segmentation from natural language expressions. in ECCV 2016. code

Performance

Reference

MarkMoHR / Awesome-Referring-Image-Segmentation

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Huntersxsx/RIS-Learning-List

Folders and files

Latest commit

History

Repository files navigation

RIS-Learning-List

Introduction

Content

Definition

Datsets

Evaluation Metric

Related Works

Performance

Reference

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Huntersxsx/RIS-Learning-List

Folders and files

Latest commit

History

Repository files navigation

RIS-Learning-List

Introduction

Content

Definition

Datsets

Evaluation Metric

Related Works

Performance

Reference

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages