Autonomous Synthetic Data Construction via VLM-Guided Iterative Rendering
Build photorealistic, scene-aware training data with a closed loop of 3D rendering, VLM review, and targeted parameter repair.
Paper: arXiv Project website License: Apache-2.0 Python 3.10+ Blender 4.2+
🇨🇳 中文 • 🌐 Website • 📄 Paper • 🏆 Recognition • 🧩 Dataset: DataEvolver-Rotate
🧠 VLM free-form feedback • 🎬 3D render-review-fix loop • 📦 Training-ready multimodal data
DataEvolver turns synthetic data construction into a goal-driven optimization loop: a VLM reviews rendered scenes in natural language, an agent diagnoses concrete rendering issues, and the scene is re-rendered with targeted parameter updates until the result is worth keeping.
- Perceive beyond rigid scores — free-form VLM feedback catches scene context, object placement, lighting, and material issues that fixed rules miss.
- Repair with structured actions — 24 bounded atomic actions adjust lighting, object pose, scene environment, and material appearance without uncontrolled drift.
- Export training-ready data — RGB, masks, depth, normals, geometry metadata, and object-disjoint splits are produced for downstream model training.
- Goal-Driven Loop Agents — VLM reviewer provides semantic feedback ("flat lighting", "floating object") → AI agent selects targeted actions → re-render → repeat until quality goals are met
- 24 Atomic Actions — Structured action space across 5 groups: lighting, object placement, scene environment, and material properties — with anti-oscillation control and step-scale scheduling
- Scene-Aware Rendering — Objects placed in real Blender scenes with HDRI environments, raycast ground detection, and preserved scene lighting
- Multi-Modal Output — RGB, mask, depth, normal maps, and geometry metadata
- End-to-End Automation — From natural language seed concept to training-ready dataset, zero human intervention
| Stage | What it does | Model / Tool |
|---|---|---|
| 1. Text Expansion | LLM expands seed concept into detailed T2I prompt | Claude API (Anthropic) |
| 2. T2I Generation | Generate ×ばつ1024 object image | Qwen-Image-2512 |
| 2.5. Segmentation | Extract RGBA foreground, remove background | SAM3 |
| 3. 3D Reconstruction | Reconstruct textured mesh from single image | Hunyuan3D-2.1 |
| 4. Scene Rendering | Scene-aware Blender insertion with configurable EEVEE/Cycles rendering | Blender 4.2+ |
| 5. VLM Review Loop | Free-form review → agent action → re-render until keep | Qwen3.5-35B-A3B |
The core innovation: a goal-driven loop agent that iteratively improves rendering quality.
Anti-oscillation control prevents parameter thrashing:
- Sign-flip tracking: freeze a parameter after 3 direction reversals
- Step-scale scheduling: Round 0 → 100%, Round 1 → 70%, Round 2 → 50%, Round 3+ → 40%
- Score-adaptive boost: ×ばつ1.2 when hybrid_score < 0.65
- OS: Linux (tested on Ubuntu 20.04+)
- GPU: NVIDIA GPU with ≥24 GB VRAM for rendering; ≥80 GB for VLM inference
- Python: 3.10+
- Blender: 4.2+
- CUDA: Compatible with your PyTorch version
| Model | Purpose | Approx. Size |
|---|---|---|
| Qwen-Image-2512 | T2I generation | ~56 GB |
| SAM3 | Foreground segmentation | ~2 GB |
| Hunyuan3D-2.1 | Image-to-3D reconstruction | ~20 GB |
| Qwen3.5-35B-A3B | VLM quality reviewer | ~35 GB |
| Blender 4.2+ | 3D rendering engine | ~300 MB |
Start with the lightweight onboarding dry-run. It checks the shape of your environment, prints the setup and model download plan, and writes local non-sensitive config files. It does not install dependencies, download model weights, write tokens, or launch GPU jobs.
git clone https://github.com/PRIS-CV/DataEvolver.git
cd DataEvolverFor the fastest first pass, use the quick profile:
bash scripts/bootstrap_dataevolver_default.sh \ --profile quick \ --dry-run \ --write-local-config
For the default DataEvolver pipeline plan, choose a model root and run:
bash scripts/bootstrap_dataevolver_default.sh \
--profile default \
--model-root /path/to/dataevolver-models \
--workspace-root "$PWD" \
--python-backend auto \
--dry-run \
--write-local-configThe script prints four sections:
| Section | What it tells you |
|---|---|
preflight |
Linux, GPU, Python, uv/conda, Blender, and Hugging Face CLI availability |
env plan |
The preferred uv environment plan and conda fallback |
model plan |
Hugging Face repos, target paths, gated-access notes, and printed download commands |
config plan |
Non-sensitive environment variables that the pipeline can read |
Available profiles:
| Profile | Use it when |
|---|---|
quick |
You only want environment discovery and a dry-run route |
default |
You want the current core pipeline plan: Qwen-Image-2512, SAM3, Hunyuan3D-2.1, DINOv2 Giant, Qwen3.5-35B-A3B, and Blender |
full |
You also want the default Edit/T2V plan: Qwen-Image-Edit-2511 and Wan2.1-T2V |
custom |
You already have replacement models and want them recorded for later compatibility review |
When --write-local-config is used, the generated local files are:
.dataevolver/local/ENVIRONMENT.md— human-readable setup summary.dataevolver/local/env.config.json— structured config for agents and scripts.dataevolver/local/env.sh.example— sourceable non-sensitive path variables
.dataevolver/local/ is ignored by Git. Do not put Hugging Face, Anthropic,
OpenAI, SSH, cookie, or API tokens in these files.
If you are working with an agent environment that supports skills, ask it to use
the dataevolver-onboarding skill. It should interview you for only five setup
areas: target route, runtime location, install policy, model strategy, and
workspace/output paths, then run the dry-run script above.
Before a real run, review the generated path variables:
sed -n '1,160p' .dataevolver/local/env.sh.example source .dataevolver/local/env.sh.example
The pipeline now reads these environment overrides while preserving the legacy fallback paths:
| Variable | Used by |
|---|---|
QWEN_IMAGE_MODEL_PATH |
Stage 2 text-to-image model path |
SAM3_CKPT, SAM3_DIR |
Stage 2.5 SAM3 checkpoint and package/source path |
HUNYUAN3D_REPO, MODEL_HUB, PAINT_MODEL_HUB, DINO_MODEL_PATH, REALESRGAN_CKPT |
Stage 3 Hunyuan3D, paint, DINO, and RealESRGAN paths |
VLM_MODEL_PATH |
Stage 5.5 VLM reviewer model path |
For real Hugging Face downloads, set HF_TOKEN in your shell only. The dry-run
script prints uvx --from huggingface_hub hf download ... commands first and
hf download ... fallback commands, but v0 never executes them.
After dependencies, model weights, and path overrides are ready:
# Place your .blend scene file in assets/scene/ # Place HDRI environment maps in assets/hdri/ # Configure configs/scene_template.json for blend_path and blender_binary. # Set the Stage 1 LLM credential in your shell if you use text expansion. bash pipeline/run_all.sh
The base pipeline prepares assets and render outputs. The scene-aware VLM
optimization loop is launched separately through the scene-agent workflow, for
example with scripts/run_scene_agent_monitor.py after model paths,
BLENDER_BIN, and configs/scene_template.json are configured for your
environment.
The public repository includes the code and schema needed to reproduce the
WebSearch-driven universal 3D dataset workflow. Large generated datasets,
private server paths, mesh assets, .blend scene files, model weights, API
keys, and paper source are intentionally not committed.
Core entry points:
- Stage0 WebSearch prior:
scripts/stage0_web_research.py - Universal dataset schema:
configs/schemas/universal_3d_layout_dataset_schema.json - Universal rendering scripts:
scripts/universal_3d_layout/ - Workflow notes and validation summary:
docs/UNIVERSAL_3D_LAYOUT_DATASET.md
Start a multi-paper universal session:
python scripts/stage0_web_research.py init \
--query "SeeThrough3D: Occlusion Aware 3D Control in Text-to-Image Generation" \
--output-dir runtime/research_priors/demo_universal \
--dataset-mode universal \
--tag universal-3d-layoutRun a small Blender-backed smoke batch after configuring scene, mesh, runtime, and model paths:
python scripts/universal_3d_layout/run_dataevolver_universal_contract_batch.py \ --dataevolver-root /path/to/DataEvolver \ --runtime-root /path/to/runtime \ --blender /path/to/blender \ --out /path/to/universal_contract_smoke \ --num-samples 1 \ --resolution 512 \ --engine EEVEE
The universal contract emits RGB targets, per-object masks, structure/OSCR
views, depth-order proxies, orientation proxies, camera metadata, mesh metadata,
3D boxes, scene graphs, spatial relations, validation traces, and
metadata/records.jsonl. The current depth and normal outputs are proxy
artifacts, not dense depth or dense surface-normal supervision.
DataEvolver/
├── pipeline/ # Core pipeline stages
│ ├── stage1_text_expansion.py # LLM prompt generation
│ ├── stage2_t2i_generate.py # Text-to-image (Qwen-Image-2512)
│ ├── stage2_5_sam2_segment.py # SAM3 foreground extraction
│ ├── stage3_image_to_3d.py # 3D mesh reconstruction (Hunyuan3D-2.1)
│ ├── stage4_scene_render.py # Blender scene-aware rendering
│ ├── stage5_5_vlm_review.py # VLM quality review (Qwen3.5-35B-A3B)
│ ├── stage5_6_feedback_apply.py # Action selection & anti-oscillation
│ ├── asset_lifecycle.py # Asset lifecycle management
│ └── rotation_geomodal_dataset.py # Training dataset loader
├── configs/
│ ├── scene_action_space.json # 24 atomic actions definition
│ ├── scene_template.json # Blender scene template config
│ ├── vlm_review_schema.json # VLM review output schema
│ ├── dataset_profiles/ # Dataset configuration profiles
│ └── seed_concepts/ # Seed object definitions (20/50 objects)
├── scripts/ # Utility & build scripts
│ ├── run_scene_agent_monitor.py # VLM loop agent monitor
│ ├── run_scene_agent_step.py # Single-step agent execution
│ ├── export_rotation8_from_best_object_state.py # Consistent rotation export
│ ├── build_rotation8_trainready_dataset.py # Build training pairs
│ ├── build_object_split_for_rotation_dataset.py # Object-disjoint split
│ ├── run_full_pipeline.py # Full pipeline orchestrator
│ ├── stage0_web_research.py # WebSearch prior and handoff builder
│ ├── universal_3d_layout/ # Universal 3D dataset workflow
│ ├── run_vlm_quality_gate_loop.py # VLM quality gate loop
│ ├── feedback_loop/ # Feedback loop utilities
│ └── ... # Additional build & eval scripts
├── assets/
│ ├── hdri/ # HDRI environment maps
│ └── scene/ # Blender scene files (.blend)
└── web/ # Project website (GitHub Pages)
The AI agent selects from 24 structured atomic actions organized in 5 groups:
| Group | Actions | Parameters |
|---|---|---|
| Lighting (4) | Key light intensity ↑↓, key light yaw ±15° | Multiplicative ×ばつ0.8 or additive, bounded |
| Object (6) | Elevation ±0.02, yaw ±15°, scale ×ばつ0.9 | Bounded within safe ranges |
| Scene (5) | Env rotation ±30°, env intensity ↑↓, contact shadow | HDRI and environment controls |
| Material (9) | Saturation, value/brightness, hue offset, roughness, specular/sheen | Fine-grained material tuning |
| Camera (0) | Reserved for future use | — |
Full action definitions: configs/scene_action_space.json
The first benchmark dataset produced by DataEvolver — for rotation-conditioned image editing.
| Metric | Value |
|---|---|
| Unique Objects | 50 |
| Rotation Angles | 8 (0°, 45°, 90°, 135°, 180°, 225°, 270°, 315°) |
| Training Pairs | 350 (front → 7 target views) |
| Train / Val / Test | 245 / 49 / 56 pairs (object-disjoint, seed=42) |
| Modalities | RGB, mask, depth, normal |
Each object uses a single canonical yaw-0° best state as the base. The object is then rotated while the scene, camera, lighting, and material remain fixed — ensuring cross-angle consistency.
import json from pathlib import Path from PIL import Image root = Path("path/to/dataset_split") rows = [] with (root / "pairs" / "train_pairs.jsonl").open("r") as f: for line in f: rows.append(json.loads(line)) row = rows[0] source = Image.open(root / row["source_image"]).convert("RGB") target = Image.open(root / row["target_image"]).convert("RGB") instruction = row["instruction"] # e.g., "Rotate the object 45 degrees clockwise"
DataEvolver is designed to work with Claude Code as an AI-powered development and operations assistant. Claude Code reads a project-level CLAUDE.md file to understand your environment, then helps you run pipelines, analyze results, and build datasets through natural language.
npm install -g @anthropic-ai/claude-code
Create a CLAUDE.md in the project root (it's gitignored — each user maintains their own). This file tells Claude Code about your specific environment:
# CLAUDE.md ## Remote Server - SSH alias: `my-server` - GPU: 3x A800 80GB (or your setup) - Python: `/path/to/python3` (3.10+, with PyTorch) - Blender: `/path/to/blender` (4.2+) - Code directory: `/path/to/DataEvolver` ## Model Paths - Qwen-Image-2512: `/path/to/Qwen-Image-2512` - SAM3 checkpoint: `/path/to/sam3/sam3.pt` - Hunyuan3D-2.1 repo: `/path/to/Hunyuan3D-2.1` - Hunyuan3D-2.1 weights: `/path/to/model_hub/Hunyuan3D-2.1` - Qwen3.5-35B-A3B: `/path/to/Qwen3.5-35B-A3B` ## Scene Config - Blender scene file: `/path/to/scene.blend` - HDRI directory: `/path/to/hdri/` - Render engine: CYCLES, 512 samples, 1024x1024 ## Key Configs - Action space: `configs/scene_action_space.json` (24 atomic actions) - Scene template: `configs/scene_template.json` - Dataset profiles: `configs/dataset_profiles/` ## Pipeline Stage 1 (Text Expansion) → Stage 2 (T2I) → Stage 2.5 (SAM3) → Stage 3 (3D Reconstruction) → Stage 4 (Blender Render) → Stage 5 (VLM Loop) ## Working Rules - Always test on the remote server, not locally - Use tmux for long-running tasks (screen not available on all servers) - Read trace.json free-form text for VLM results — not just agg.json scores - Only stop VLM loop when reviewer explicitly says "keep" - Check GPU usage before launching new jobs
You can create reusable skills in a skills/ directory for common workflows:
mkdir -p skills/scene-agent-loop
# skills/scene-agent-loop/SKILL.md --- name: scene-agent-loop description: Manage the VLM review loop for scene rendering --- # Scene Agent Loop Monitor and continue the VLM review → render → agent decision loop. Read trace.json to understand current state, then decide next action.
cd DataEvolver
claudeClaude Code will read your CLAUDE.md and understand the full project context. Example commands:
> Check GPU usage on the server
> Run the full pipeline for 10 new furniture objects
> Export rotation8 dataset from the latest best states
> Build train-ready dataset with object-disjoint split
@misc{zhang2026dataevolverletdatabuild, title={DataEvolver: Let Your Data Build and Improve Itself via Goal-Driven Loop Agents}, author={Qisong Zhang and Wenzhuo Wu and Zhuangzhuang Jia and Yunhao Yang and Huayu Zhang and Xianghao Zang and Zhixiang He and Zhongjiang He and Kongming Liang and Zhanyu Ma}, year={2026}, eprint={2605.01789}, archivePrefix={arXiv}, primaryClass={cs.AI}, url={https://arxiv.org/abs/2605.01789}, }
- 🏆 AIDataSci 2026, KDD 2026 Workshop — DataEvolver was accepted as an Oral Presentation. View on OpenReview.
- 🎖️ 2026 BAAI Conference Agent for Science Competition — DataEvolver received a Certificate of Poster Presentation and was selected for on-site poster presentation.
DataEvolver BAAI Agent for Science poster presentation certificate
- WebSearch-integrated AI Agent — Stage0 WebSearch supports both single-paper reproduction and multi-paper universal dataset modes. It records paper evidence, exports handoff documents, and has been validated on the remote rendering environment.
- Multi-paper universal 3D dataset synthesis — the workflow starts from an anchor paper, expands to related 3D-control papers, and extracts shared dataset requirements for target images, masks, OSCR/structure views, camera metadata, object layout, occlusion relations, scene graphs, and validation traces.
- Universal 3D layout dataset contract — DataEvolver now tracks a reusable schema for Blender-backed target renders, per-object masks, structure views, depth-order proxies, orientation proxies, camera pose/intrinsics/viewpoint tokens, mesh metadata, 3D boxes, scene graphs, spatial relations, and promotion decisions.
- Blender-backed dual-object scenes — the current universal 3D workflow can generate real-scene, dual-object records with DataEvolver scene and mesh assets. General N-object compositional scenes remain a separate extension.
- VLM/CV self-evolution loop — selected universal 3D records can be reviewed with calibrated hybrid VLM and CV geometry scores, bounded repair actions, and explicit accept/reject promotion traces.
- Paper-specific and universal export modes — the WebSearch and universal layout scripts support both single-paper dataset reproduction and multi-paper universal schema export.
- Structured VLM knowledge base — current scoring already uses structured VLM/CV criteria, mask coverage checks, depth-order proxy checks, geometry-review metadata, and a VGGT-Omega proxy hook. A persistent prior-knowledge store for reusable VLM assessment criteria is still under development.
- Real-world dataset ingestion — WebSearch can discover public datasets and paper resources, and public artifacts such as SeeThrough3D are used as references. A full crawl-clean-process-ingest pipeline for arbitrary real-world datasets is still pending.
- Auto-generated reasoning-evaluation datasets — current records include validation logs, hybrid scores, and promotion traces. Fully automated benchmark generation after downstream model training remains a future release target.
- Video object datasets — extend image-level rendering to temporally consistent object removal, translation, rotation, insertion, and attribute editing across frames.
- Temporal review and filtering — add sequence-level checks for flicker, trajectory smoothness, mask stability, depth continuity, and action consistency.
- General N-object compositional scenes — beyond the current dual-object workflow, this requires relation-aware sampling, collision handling, occlusion planning, and stronger multi-object VLM/CV review.
- Large-scale public dataset ingestion — scale WebSearch-discovered dataset ingestion into repeatable crawling, cleaning, licensing, provenance, and structured export workflows.