Name	Name	Last commit message	Last commit date
Latest commit History 45 Commits
.agents/skills/dataevolver-onboarding	.agents/skills/dataevolver-onboarding
.github/workflows	.github/workflows
assets	assets
configs	configs
docs	docs
pipeline	pipeline
scripts	scripts
tests	tests
web	web
.gitattributes	.gitattributes
.gitignore	.gitignore
LICENSE	LICENSE
README.md	README.md
README_zh.md	README_zh.md

DataEvolver

Autonomous Synthetic Data Construction via VLM-Guided Iterative Rendering
Build photorealistic, scene-aware training data with a closed loop of 3D rendering, VLM review, and targeted parameter repair.

Paper: arXiv Project website License: Apache-2.0 Python 3.10+ Blender 4.2+

🇨🇳 中文 • 🌐 Website • 📄 Paper • 🏆 Recognition • 🧩 Dataset: DataEvolver-Rotate

🧠 VLM free-form feedback • 🎬 3D render-review-fix loop • 📦 Training-ready multimodal data

Why DataEvolver?

DataEvolver turns synthetic data construction into a goal-driven optimization loop: a VLM reviews rendered scenes in natural language, an agent diagnoses concrete rendering issues, and the scene is re-rendered with targeted parameter updates until the result is worth keeping.

Perceive beyond rigid scores — free-form VLM feedback catches scene context, object placement, lighting, and material issues that fixed rules miss.
Repair with structured actions — 24 bounded atomic actions adjust lighting, object pose, scene environment, and material appearance without uncontrolled drift.
Export training-ready data — RGB, masks, depth, normals, geometry metadata, and object-disjoint splits are produced for downstream model training.

Key Features

Goal-Driven Loop Agents — VLM reviewer provides semantic feedback ("flat lighting", "floating object") → AI agent selects targeted actions → re-render → repeat until quality goals are met
24 Atomic Actions — Structured action space across 5 groups: lighting, object placement, scene environment, and material properties — with anti-oscillation control and step-scale scheduling
Scene-Aware Rendering — Objects placed in real Blender scenes with HDRI environments, raycast ground detection, and preserved scene lighting
Multi-Modal Output — RGB, mask, depth, normal maps, and geometry metadata
End-to-End Automation — From natural language seed concept to training-ready dataset, zero human intervention

Pipeline Overview

Stage	What it does	Model / Tool
1. Text Expansion	LLM expands seed concept into detailed T2I prompt	Claude API (Anthropic)
2. T2I Generation	Generate ×ばつ1024 object image	Qwen-Image-2512
2.5. Segmentation	Extract RGBA foreground, remove background	SAM3
3. 3D Reconstruction	Reconstruct textured mesh from single image	Hunyuan3D-2.1
4. Scene Rendering	Scene-aware Blender insertion with configurable EEVEE/Cycles rendering	Blender 4.2+
5. VLM Review Loop	Free-form review → agent action → re-render until keep	Qwen3.5-35B-A3B

The VLM Review Loop (Stage 5)

The core innovation: a goal-driven loop agent that iteratively improves rendering quality.

VLM Review Loop

Anti-oscillation control prevents parameter thrashing:

Sign-flip tracking: freeze a parameter after 3 direction reversals
Step-scale scheduling: Round 0 → 100%, Round 1 → 70%, Round 2 → 50%, Round 3+ → 40%
Score-adaptive boost: ×ばつ1.2 when hybrid_score < 0.65

Prerequisites

OS: Linux (tested on Ubuntu 20.04+)
GPU: NVIDIA GPU with ≥24 GB VRAM for rendering; ≥80 GB for VLM inference
Python: 3.10+
Blender: 4.2+
CUDA: Compatible with your PyTorch version

Required Models

Model	Purpose	Approx. Size
Qwen-Image-2512	T2I generation	~56 GB
SAM3	Foreground segmentation	~2 GB
Hunyuan3D-2.1	Image-to-3D reconstruction	~20 GB
Qwen3.5-35B-A3B	VLM quality reviewer	~35 GB
Blender 4.2+	3D rendering engine	~300 MB

Quick Start

Start with the lightweight onboarding dry-run. It checks the shape of your environment, prints the setup and model download plan, and writes local non-sensitive config files. It does not install dependencies, download model weights, write tokens, or launch GPU jobs.

1. Clone the repository

git clone https://github.com/PRIS-CV/DataEvolver.git
cd DataEvolver

2. Run the safe onboarding dry-run

For the fastest first pass, use the quick profile:

bash scripts/bootstrap_dataevolver_default.sh \
 --profile quick \
 --dry-run \
 --write-local-config

For the default DataEvolver pipeline plan, choose a model root and run:

bash scripts/bootstrap_dataevolver_default.sh \
 --profile default \
 --model-root /path/to/dataevolver-models \
 --workspace-root "$PWD" \
 --python-backend auto \
 --dry-run \
 --write-local-config

The script prints four sections:

Section	What it tells you
`preflight`	Linux, GPU, Python, `uv`/`conda`, Blender, and Hugging Face CLI availability
`env plan`	The preferred `uv` environment plan and `conda` fallback
`model plan`	Hugging Face repos, target paths, gated-access notes, and printed download commands
`config plan`	Non-sensitive environment variables that the pipeline can read

Available profiles:

Profile	Use it when
`quick`	You only want environment discovery and a dry-run route
`default`	You want the current core pipeline plan: Qwen-Image-2512, SAM3, Hunyuan3D-2.1, DINOv2 Giant, Qwen3.5-35B-A3B, and Blender
`full`	You also want the default Edit/T2V plan: Qwen-Image-Edit-2511 and Wan2.1-T2V
`custom`	You already have replacement models and want them recorded for later compatibility review

When --write-local-config is used, the generated local files are:

.dataevolver/local/ENVIRONMENT.md — human-readable setup summary
.dataevolver/local/env.config.json — structured config for agents and scripts
.dataevolver/local/env.sh.example — sourceable non-sensitive path variables

.dataevolver/local/ is ignored by Git. Do not put Hugging Face, Anthropic, OpenAI, SSH, cookie, or API tokens in these files.

If you are working with an agent environment that supports skills, ask it to use the dataevolver-onboarding skill. It should interview you for only five setup areas: target route, runtime location, install policy, model strategy, and workspace/output paths, then run the dry-run script above.

3. Review paths and source environment overrides

Before a real run, review the generated path variables:

sed -n '1,160p' .dataevolver/local/env.sh.example
source .dataevolver/local/env.sh.example

The pipeline now reads these environment overrides while preserving the legacy fallback paths:

Variable	Used by
`QWEN_IMAGE_MODEL_PATH`	Stage 2 text-to-image model path
`SAM3_CKPT`, `SAM3_DIR`	Stage 2.5 SAM3 checkpoint and package/source path
`HUNYUAN3D_REPO`, `MODEL_HUB`, `PAINT_MODEL_HUB`, `DINO_MODEL_PATH`, `REALESRGAN_CKPT`	Stage 3 Hunyuan3D, paint, DINO, and RealESRGAN paths
`VLM_MODEL_PATH`	Stage 5.5 VLM reviewer model path

For real Hugging Face downloads, set HF_TOKEN in your shell only. The dry-run script prints uvx --from huggingface_hub hf download ... commands first and hf download ... fallback commands, but v0 never executes them.

4. Prepare scene assets and run the pipeline

After dependencies, model weights, and path overrides are ready:

# Place your .blend scene file in assets/scene/
# Place HDRI environment maps in assets/hdri/
# Configure configs/scene_template.json for blend_path and blender_binary.
# Set the Stage 1 LLM credential in your shell if you use text expansion.
bash pipeline/run_all.sh

The base pipeline prepares assets and render outputs. The scene-aware VLM optimization loop is launched separately through the scene-agent workflow, for example with scripts/run_scene_agent_monitor.py after model paths, BLENDER_BIN, and configs/scene_template.json are configured for your environment.

Reproducible Universal 3D Workflow

The public repository includes the code and schema needed to reproduce the WebSearch-driven universal 3D dataset workflow. Large generated datasets, private server paths, mesh assets, .blend scene files, model weights, API keys, and paper source are intentionally not committed.

Core entry points:

Stage0 WebSearch prior: scripts/stage0_web_research.py
Universal dataset schema: configs/schemas/universal_3d_layout_dataset_schema.json
Universal rendering scripts: scripts/universal_3d_layout/
Workflow notes and validation summary: docs/UNIVERSAL_3D_LAYOUT_DATASET.md

Start a multi-paper universal session:

python scripts/stage0_web_research.py init \
 --query "SeeThrough3D: Occlusion Aware 3D Control in Text-to-Image Generation" \
 --output-dir runtime/research_priors/demo_universal \
 --dataset-mode universal \
 --tag universal-3d-layout

Run a small Blender-backed smoke batch after configuring scene, mesh, runtime, and model paths:

python scripts/universal_3d_layout/run_dataevolver_universal_contract_batch.py \
 --dataevolver-root /path/to/DataEvolver \
 --runtime-root /path/to/runtime \
 --blender /path/to/blender \
 --out /path/to/universal_contract_smoke \
 --num-samples 1 \
 --resolution 512 \
 --engine EEVEE

The universal contract emits RGB targets, per-object masks, structure/OSCR views, depth-order proxies, orientation proxies, camera metadata, mesh metadata, 3D boxes, scene graphs, spatial relations, validation traces, and metadata/records.jsonl. The current depth and normal outputs are proxy artifacts, not dense depth or dense surface-normal supervision.

Project Structure

DataEvolver/
├── pipeline/ # Core pipeline stages
│ ├── stage1_text_expansion.py # LLM prompt generation
│ ├── stage2_t2i_generate.py # Text-to-image (Qwen-Image-2512)
│ ├── stage2_5_sam2_segment.py # SAM3 foreground extraction
│ ├── stage3_image_to_3d.py # 3D mesh reconstruction (Hunyuan3D-2.1)
│ ├── stage4_scene_render.py # Blender scene-aware rendering
│ ├── stage5_5_vlm_review.py # VLM quality review (Qwen3.5-35B-A3B)
│ ├── stage5_6_feedback_apply.py # Action selection & anti-oscillation
│ ├── asset_lifecycle.py # Asset lifecycle management
│ └── rotation_geomodal_dataset.py # Training dataset loader
├── configs/
│ ├── scene_action_space.json # 24 atomic actions definition
│ ├── scene_template.json # Blender scene template config
│ ├── vlm_review_schema.json # VLM review output schema
│ ├── dataset_profiles/ # Dataset configuration profiles
│ └── seed_concepts/ # Seed object definitions (20/50 objects)
├── scripts/ # Utility & build scripts
│ ├── run_scene_agent_monitor.py # VLM loop agent monitor
│ ├── run_scene_agent_step.py # Single-step agent execution
│ ├── export_rotation8_from_best_object_state.py # Consistent rotation export
│ ├── build_rotation8_trainready_dataset.py # Build training pairs
│ ├── build_object_split_for_rotation_dataset.py # Object-disjoint split
│ ├── run_full_pipeline.py # Full pipeline orchestrator
│ ├── stage0_web_research.py # WebSearch prior and handoff builder
│ ├── universal_3d_layout/ # Universal 3D dataset workflow
│ ├── run_vlm_quality_gate_loop.py # VLM quality gate loop
│ ├── feedback_loop/ # Feedback loop utilities
│ └── ... # Additional build & eval scripts
├── assets/
│ ├── hdri/ # HDRI environment maps
│ └── scene/ # Blender scene files (.blend)
└── web/ # Project website (GitHub Pages)

Action Space

The AI agent selects from 24 structured atomic actions organized in 5 groups:

Group	Actions	Parameters
Lighting (4)	Key light intensity ↑↓, key light yaw ±15°	Multiplicative ×ばつ0.8 or additive, bounded
Object (6)	Elevation ±0.02, yaw ±15°, scale ×ばつ0.9	Bounded within safe ranges
Scene (5)	Env rotation ±30°, env intensity ↑↓, contact shadow	HDRI and environment controls
Material (9)	Saturation, value/brightness, hue offset, roughness, specular/sheen	Fine-grained material tuning
Camera (0)	Reserved for future use	—

Full action definitions: configs/scene_action_space.json

DataEvolver-Rotate

The first benchmark dataset produced by DataEvolver — for rotation-conditioned image editing.

Metric	Value
Unique Objects	50
Rotation Angles	8 (0°, 45°, 90°, 135°, 180°, 225°, 270°, 315°)
Training Pairs	350 (front → 7 target views)
Train / Val / Test	245 / 49 / 56 pairs (object-disjoint, seed=42)
Modalities	RGB, mask, depth, normal

Methodology

Each object uses a single canonical yaw-0° best state as the base. The object is then rotated while the scene, camera, lighting, and material remain fixed — ensuring cross-angle consistency.

Loading the Dataset

import json
from pathlib import Path
from PIL import Image
root = Path("path/to/dataset_split")
rows = []
with (root / "pairs" / "train_pairs.jsonl").open("r") as f:
 for line in f:
 rows.append(json.loads(line))
row = rows[0]
source = Image.open(root / row["source_image"]).convert("RGB")
target = Image.open(root / row["target_image"]).convert("RGB")
instruction = row["instruction"] # e.g., "Rotate the object 45 degrees clockwise"

Using with Claude Code

DataEvolver is designed to work with Claude Code as an AI-powered development and operations assistant. Claude Code reads a project-level CLAUDE.md file to understand your environment, then helps you run pipelines, analyze results, and build datasets through natural language.

Step 1: Install Claude Code

npm install -g @anthropic-ai/claude-code

Step 2: Create your `CLAUDE.md`

Create a CLAUDE.md in the project root (it's gitignored — each user maintains their own). This file tells Claude Code about your specific environment:

# CLAUDE.md
## Remote Server
- SSH alias: `my-server`
- GPU: 3x A800 80GB (or your setup)
- Python: `/path/to/python3` (3.10+, with PyTorch)
- Blender: `/path/to/blender` (4.2+)
- Code directory: `/path/to/DataEvolver`
## Model Paths
- Qwen-Image-2512: `/path/to/Qwen-Image-2512`
- SAM3 checkpoint: `/path/to/sam3/sam3.pt`
- Hunyuan3D-2.1 repo: `/path/to/Hunyuan3D-2.1`
- Hunyuan3D-2.1 weights: `/path/to/model_hub/Hunyuan3D-2.1`
- Qwen3.5-35B-A3B: `/path/to/Qwen3.5-35B-A3B`
## Scene Config
- Blender scene file: `/path/to/scene.blend`
- HDRI directory: `/path/to/hdri/`
- Render engine: CYCLES, 512 samples, 1024x1024
## Key Configs
- Action space: `configs/scene_action_space.json` (24 atomic actions)
- Scene template: `configs/scene_template.json`
- Dataset profiles: `configs/dataset_profiles/`
## Pipeline
Stage 1 (Text Expansion) → Stage 2 (T2I) → Stage 2.5 (SAM3)
→ Stage 3 (3D Reconstruction) → Stage 4 (Blender Render) → Stage 5 (VLM Loop)
## Working Rules
- Always test on the remote server, not locally
- Use tmux for long-running tasks (screen not available on all servers)
- Read trace.json free-form text for VLM results — not just agg.json scores
- Only stop VLM loop when reviewer explicitly says "keep"
- Check GPU usage before launching new jobs

Step 3: Create Claude Code Skills (Optional)

You can create reusable skills in a skills/ directory for common workflows:

mkdir -p skills/scene-agent-loop

# skills/scene-agent-loop/SKILL.md
---
name: scene-agent-loop
description: Manage the VLM review loop for scene rendering
---
# Scene Agent Loop
Monitor and continue the VLM review → render → agent decision loop.
Read trace.json to understand current state, then decide next action.

Step 4: Launch Claude Code

cd DataEvolver
claude

Claude Code will read your CLAUDE.md and understand the full project context. Example commands:

> Check GPU usage on the server
> Run the full pipeline for 10 new furniture objects
> Export rotation8 dataset from the latest best states
> Build train-ready dataset with object-disjoint split

Citation

@misc{zhang2026dataevolverletdatabuild,
 title={DataEvolver: Let Your Data Build and Improve Itself via Goal-Driven Loop Agents},
 author={Qisong Zhang and Wenzhuo Wu and Zhuangzhuang Jia and Yunhao Yang and Huayu Zhang and Xianghao Zang and Zhixiang He and Zhongjiang He and Kongming Liang and Zhanyu Ma},
 year={2026},
 eprint={2605.01789},
 archivePrefix={arXiv},
 primaryClass={cs.AI},
 url={https://arxiv.org/abs/2605.01789},
}

Honors & Recognition

🏆 AIDataSci 2026, KDD 2026 Workshop — DataEvolver was accepted as an Oral Presentation. View on OpenReview.
🎖️ 2026 BAAI Conference Agent for Science Competition — DataEvolver received a Certificate of Poster Presentation and was selected for on-site poster presentation.

DataEvolver BAAI Agent for Science poster presentation certificate

Roadmap

Completed

WebSearch-integrated AI Agent — Stage0 WebSearch supports both single-paper reproduction and multi-paper universal dataset modes. It records paper evidence, exports handoff documents, and has been validated on the remote rendering environment.
Multi-paper universal 3D dataset synthesis — the workflow starts from an anchor paper, expands to related 3D-control papers, and extracts shared dataset requirements for target images, masks, OSCR/structure views, camera metadata, object layout, occlusion relations, scene graphs, and validation traces.
Universal 3D layout dataset contract — DataEvolver now tracks a reusable schema for Blender-backed target renders, per-object masks, structure views, depth-order proxies, orientation proxies, camera pose/intrinsics/viewpoint tokens, mesh metadata, 3D boxes, scene graphs, spatial relations, and promotion decisions.
Blender-backed dual-object scenes — the current universal 3D workflow can generate real-scene, dual-object records with DataEvolver scene and mesh assets. General N-object compositional scenes remain a separate extension.
VLM/CV self-evolution loop — selected universal 3D records can be reviewed with calibrated hybrid VLM and CV geometry scores, bounded repair actions, and explicit accept/reject promotion traces.
Paper-specific and universal export modes — the WebSearch and universal layout scripts support both single-paper dataset reproduction and multi-paper universal schema export.

In Progress

Structured VLM knowledge base — current scoring already uses structured VLM/CV criteria, mask coverage checks, depth-order proxy checks, geometry-review metadata, and a VGGT-Omega proxy hook. A persistent prior-knowledge store for reusable VLM assessment criteria is still under development.
Real-world dataset ingestion — WebSearch can discover public datasets and paper resources, and public artifacts such as SeeThrough3D are used as references. A full crawl-clean-process-ingest pipeline for arbitrary real-world datasets is still pending.
Auto-generated reasoning-evaluation datasets — current records include validation logs, hybrid scores, and promotion traces. Fully automated benchmark generation after downstream model training remains a future release target.

Pending

Video object datasets — extend image-level rendering to temporally consistent object removal, translation, rotation, insertion, and attribute editing across frames.
Temporal review and filtering — add sequence-level checks for flicker, trajectory smoothness, mask stability, depth continuity, and action consistency.
General N-object compositional scenes — beyond the current dual-object workflow, this requires relation-aware sampling, collision handling, occlusion planning, and stronger multi-object VLM/CV review.
Large-scale public dataset ingestion — scale WebSearch-discovered dataset ingestion into repeatable crawling, cleaning, licensing, provenance, and structured export workflows.

License

Apache-2.0

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

DataEvolver

Why DataEvolver?

Key Features

Pipeline Overview

The VLM Review Loop (Stage 5)

Prerequisites

Required Models

Quick Start

1. Clone the repository

2. Run the safe onboarding dry-run

3. Review paths and source environment overrides

4. Prepare scene assets and run the pipeline

Reproducible Universal 3D Workflow

Project Structure

Action Space

DataEvolver-Rotate

Methodology

Loading the Dataset

Using with Claude Code

Step 1: Install Claude Code

Step 2: Create your CLAUDE.md

Step 3: Create Claude Code Skills (Optional)

Step 4: Launch Claude Code

Citation

Honors & Recognition

Roadmap

Completed

In Progress

Pending

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Step 2: Create your `CLAUDE.md`

Packages