Lesion segmentation is an essential task in medical imaging to support diagnosis and assessment of pulmonary diseases. While deep learning models have shown success in various domains, their reliance on large-scale annotated datasets limits applicability in the medical domain due to labeling cost. To address this issue, recent studies in medical image segmentation have utilized clinical texts as complementary semantic cues without additional annotations. However, most existing methods utilize a single textual embedding and fail to capture hierarchical interactions between language and visual features, which limits their ability to leverage fine-grained cues essential for precise and detailed segmentation. In this regime, we propose Hierarchical Visual-Textual Mixing Network (HiMix), a novel multi-modal segmentation framework that mixes multi-scale image and text representations throughout the mask decoding process. HiMix progressively injects hierarchical text embedding, from high-level semantics to fine-grained spatial details, into corresponding image decoder layers to bridge the modality gap and enhance visual feature refinement at multiple levels of abstraction. Experiments on the QaTa-COV19 and MosMedData+ datasets demonstrate that HiMix consistently outperforms uni-modal and multi-modal methods. Furthermore, HiMix exhibits strong generalization to unstructured textual formats, highlighting its practical applicability in real-world clinical scenarios.

Requirements

Environment:

python=3.10.11
torch=2.0.1 
torchvision=0.15.2 
pytorch_lightning=1.9.0 
torchmetrics=1.6.1 
transformers=4.24.0 
monai=1.0.0 
pandas=2.2.3 
einops=0.8.0

Citation

If you find our work useful for your research, please cite the our paper:

@inproceedings{hwang2026himix,
 title={HiMix : Hierarchical Visual-Textual Mixing Network for Lesion Segmentation},
 author={Hwang, Soojing and Sim, Jaeyoon and Kim, Won Hwa},
 booktitle={2026 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
 year={2026},
 organization={IEEE}
}

Acknowledgements

Our work is built based on MMI-UNet and GuideDecoder. We really thank the authors for making the source code publicly available.

About

[WACV'26] Official Pytorch Code for HiMix : Hierarchical Visual-Textual Mixing Network for Lesion Segmentation

Resources

Stars

Watchers

Forks

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JaeyoonSSim/HiMix

Folders and files

Latest commit

History

Repository files navigation

[WACV'26] HiMix : Hierarchical Visual-Textual Mixing Network for Lesion Segmentation

Abstract

Requirements

Citation

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

[WACV'26] HiMix : Hierarchical Visual-Textual Mixing Network for Lesion Segmentation

Abstract

Requirements

Citation

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages