Accepted to ICML 2026

FlowSeg: Dynamic Semantic Flow for LLM-Conditioned Segmentation

Bidirectional semantic guidance for better language-mask alignment in query-based LLM-conditioned segmentation.

Zekang Zhang1, Guangyu Gao1, Youyun Tang2, ChengJing Wu2, Xiaochao Qu2, Chi Harold Liu1, Jianbo Jiao3, Yunchao Wei4,5, Luoqi Liu2,*, Ting Liu2,*

1School of Computer Science, Beijing Institute of Technology
2Meitu.inc   3School of Computer Science, University of Birmingham
4WEI Lab, Institute of Information Science, Beijing Jiaotong University
5Beijing Key Laboratory of Advanced Information Science and Network
*Corresponding authors

Abstract

LLM-conditioned segmentation has recently advanced by coupling large language models with iterative mask generation frameworks. However, current query-based propose-then-select pipelines can generate high-quality mask candidates while still failing to select the mask that matches the linguistic condition. FlowSeg addresses this semantic misalignment by introducing dynamic semantic guidance through a bidirectional semantic flow between intermediate decoding states and LLM-derived condition embeddings.

Language conditions actively guide mask refinement at each decoding stage, while condition embeddings are progressively updated by emerging visual evidence. A lightweight boundary-aware refinement module further enhances uncertain regions without perturbing confident interiors. Experiments on referring expression segmentation and reasoning segmentation demonstrate consistent improvements and state-of-the-art performance.

Motivation

In query-based LLM-conditioned segmentation, the model may already produce a candidate mask that overlaps well with the target object, but the final matching step can select a semantically wrong candidate. FlowSeg treats language grounding as part of the generation dynamics rather than only a post-hoc selection signal.

Motivation of FlowSeg
Existing query-based propose-then-select pipelines often generate accurate mask candidates but fail to select the one that matches the linguistic condition.

Method

FlowSeg is built on a standard LLM-segmentor scaffold with dual visual encoders and a query-based segmentation decoder. Its key contribution is the decoder-side Bidirectional Semantic Flow, where condition embeddings guide query refinement and are updated by decoder queries throughout the generation process.

Semantic Cross-Attention Queries attend to LLM-derived condition embeddings at each decoder layer, injecting linguistic constraints during mask generation.
Condition Refinement Condition embeddings absorb emerging visual evidence from refined queries, making language representations visually grounded.
Boundary-Aware Refinement A lightweight refinement module selectively improves uncertain boundary regions while preserving confident mask interiors.
Main pipeline of FlowSeg
Bidirectional Semantic Flow enables language condition embeddings to guide mask generation at each decoding stage, while progressively updating them with emerging query embeddings.

Results

Referring Expression Segmentation

FlowSeg improves over prior methods on RefCOCO, RefCOCO+, and RefCOCOg, with stronger gains on more challenging splits.

Referring expression segmentation results on RefCOCO, RefCOCO+, and RefCOCOg
Referring expression segmentation results on RefCOCO, RefCOCO+, and RefCOCOg.

Reasoning Segmentation

On ReasonSeg test, FlowSeg reaches 54.7 cIoU, outperforming the baseline by 13.7 points.

Reasoning segmentation results on ReasonSeg validation and test sets
Reasoning segmentation results on ReasonSeg validation and test sets.

Qualitative Results

FlowSeg produces more accurate masks with finer details compared with prior work, especially for ambiguous referring expressions and complex object boundaries.

Qualitative comparison on RefCOCO, RefCOCO+, and RefCOCOg
Qualitative comparison on RefCOCO/+/g.
Qualitative results on ReasonSeg
Qualitative results on ReasonSeg.

Citation

@inproceedings{flowseg2026,
  title     = {FlowSeg: Dynamic Semantic Flow for LLM-Conditioned Segmentation},
  author    = {Zekang Zhang and Guangyu Gao and Youyun Tang and ChengJing Wu and Xiaochao Qu and Chi Harold Liu and Jianbo Jiao and Yunchao Wei and Luoqi Liu and Ting Liu},
  booktitle = {Proceedings of the International Conference on Machine Learning (ICML)},
  year      = {2026}
}