Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

PRIS-CV/DataEvolver

Repository files navigation

DataEvolver Showcase

DataEvolver

Autonomous Synthetic Data Construction via VLM-Guided Iterative Rendering
Build photorealistic, scene-aware training data with a closed loop of 3D rendering, VLM review, and targeted parameter repair.

Paper: arXiv Project website License: Apache-2.0 Python 3.10+ Blender 4.2+

🇨🇳 中文🌐 Website📄 Paper🏆 Recognition🧩 Dataset: DataEvolver-Rotate

🧠 VLM free-form feedback🎬 3D render-review-fix loop📦 Training-ready multimodal data


Why DataEvolver?

DataEvolver turns synthetic data construction into a goal-driven optimization loop: a VLM reviews rendered scenes in natural language, an agent diagnoses concrete rendering issues, and the scene is re-rendered with targeted parameter updates until the result is worth keeping.

  • Perceive beyond rigid scores — free-form VLM feedback catches scene context, object placement, lighting, and material issues that fixed rules miss.
  • Repair with structured actions — 24 bounded atomic actions adjust lighting, object pose, scene environment, and material appearance without uncontrolled drift.
  • Export training-ready data — RGB, masks, depth, normals, geometry metadata, and object-disjoint splits are produced for downstream model training.

Key Features

  • Goal-Driven Loop Agents — VLM reviewer provides semantic feedback ("flat lighting", "floating object") → AI agent selects targeted actions → re-render → repeat until quality goals are met
  • 24 Atomic Actions — Structured action space across 5 groups: lighting, object placement, scene environment, and material properties — with anti-oscillation control and step-scale scheduling
  • Scene-Aware Rendering — Objects placed in real Blender scenes with HDRI environments, raycast ground detection, and preserved scene lighting
  • Multi-Modal Output — RGB, mask, depth, normal maps, and geometry metadata
  • End-to-End Automation — From natural language seed concept to training-ready dataset, zero human intervention

Pipeline Overview

Pipeline Overview

Stage What it does Model / Tool
1. Text Expansion LLM expands seed concept into detailed T2I prompt Claude API (Anthropic)
2. T2I Generation Generate ×ばつ1024 object image Qwen-Image-2512
2.5. Segmentation Extract RGBA foreground, remove background SAM3
3. 3D Reconstruction Reconstruct textured mesh from single image Hunyuan3D-2.1
4. Scene Rendering Scene-aware Blender insertion with configurable EEVEE/Cycles rendering Blender 4.2+
5. VLM Review Loop Free-form review → agent action → re-render until keep Qwen3.5-35B-A3B

The VLM Review Loop (Stage 5)

The core innovation: a goal-driven loop agent that iteratively improves rendering quality.

VLM Review Loop

Anti-oscillation control prevents parameter thrashing:

  • Sign-flip tracking: freeze a parameter after 3 direction reversals
  • Step-scale scheduling: Round 0 → 100%, Round 1 → 70%, Round 2 → 50%, Round 3+ → 40%
  • Score-adaptive boost: ×ばつ1.2 when hybrid_score < 0.65

Prerequisites

  • OS: Linux (tested on Ubuntu 20.04+)
  • GPU: NVIDIA GPU with ≥24 GB VRAM for rendering; ≥80 GB for VLM inference
  • Python: 3.10+
  • Blender: 4.2+
  • CUDA: Compatible with your PyTorch version

Required Models

Model Purpose Approx. Size
Qwen-Image-2512 T2I generation ~56 GB
SAM3 Foreground segmentation ~2 GB
Hunyuan3D-2.1 Image-to-3D reconstruction ~20 GB
Qwen3.5-35B-A3B VLM quality reviewer ~35 GB
Blender 4.2+ 3D rendering engine ~300 MB

Quick Start

Start with the lightweight onboarding dry-run. It checks the shape of your environment, prints the setup and model download plan, and writes local non-sensitive config files. It does not install dependencies, download model weights, write tokens, or launch GPU jobs.

1. Clone the repository

git clone https://github.com/PRIS-CV/DataEvolver.git
cd DataEvolver

2. Run the safe onboarding dry-run

For the fastest first pass, use the quick profile:

bash scripts/bootstrap_dataevolver_default.sh \
 --profile quick \
 --dry-run \
 --write-local-config

For the default DataEvolver pipeline plan, choose a model root and run:

bash scripts/bootstrap_dataevolver_default.sh \
 --profile default \
 --model-root /path/to/dataevolver-models \
 --workspace-root "$PWD" \
 --python-backend auto \
 --dry-run \
 --write-local-config

The script prints four sections:

Section What it tells you
preflight Linux, GPU, Python, uv/conda, Blender, and Hugging Face CLI availability
env plan The preferred uv environment plan and conda fallback
model plan Hugging Face repos, target paths, gated-access notes, and printed download commands
config plan Non-sensitive environment variables that the pipeline can read

Available profiles:

Profile Use it when
quick You only want environment discovery and a dry-run route
default You want the current core pipeline plan: Qwen-Image-2512, SAM3, Hunyuan3D-2.1, DINOv2 Giant, Qwen3.5-35B-A3B, and Blender
full You also want the default Edit/T2V plan: Qwen-Image-Edit-2511 and Wan2.1-T2V
custom You already have replacement models and want them recorded for later compatibility review

When --write-local-config is used, the generated local files are:

  • .dataevolver/local/ENVIRONMENT.md — human-readable setup summary
  • .dataevolver/local/env.config.json — structured config for agents and scripts
  • .dataevolver/local/env.sh.example — sourceable non-sensitive path variables

.dataevolver/local/ is ignored by Git. Do not put Hugging Face, Anthropic, OpenAI, SSH, cookie, or API tokens in these files.

If you are working with an agent environment that supports skills, ask it to use the dataevolver-onboarding skill. It should interview you for only five setup areas: target route, runtime location, install policy, model strategy, and workspace/output paths, then run the dry-run script above.

3. Review paths and source environment overrides

Before a real run, review the generated path variables:

sed -n '1,160p' .dataevolver/local/env.sh.example
source .dataevolver/local/env.sh.example

The pipeline now reads these environment overrides while preserving the legacy fallback paths:

Variable Used by
QWEN_IMAGE_MODEL_PATH Stage 2 text-to-image model path
SAM3_CKPT, SAM3_DIR Stage 2.5 SAM3 checkpoint and package/source path
HUNYUAN3D_REPO, MODEL_HUB, PAINT_MODEL_HUB, DINO_MODEL_PATH, REALESRGAN_CKPT Stage 3 Hunyuan3D, paint, DINO, and RealESRGAN paths
VLM_MODEL_PATH Stage 5.5 VLM reviewer model path

For real Hugging Face downloads, set HF_TOKEN in your shell only. The dry-run script prints uvx --from huggingface_hub hf download ... commands first and hf download ... fallback commands, but v0 never executes them.

4. Prepare scene assets and run the pipeline

After dependencies, model weights, and path overrides are ready:

# Place your .blend scene file in assets/scene/
# Place HDRI environment maps in assets/hdri/
# Configure configs/scene_template.json for blend_path and blender_binary.
# Set the Stage 1 LLM credential in your shell if you use text expansion.
bash pipeline/run_all.sh

The base pipeline prepares assets and render outputs. The scene-aware VLM optimization loop is launched separately through the scene-agent workflow, for example with scripts/run_scene_agent_monitor.py after model paths, BLENDER_BIN, and configs/scene_template.json are configured for your environment.


Reproducible Universal 3D Workflow

The public repository includes the code and schema needed to reproduce the WebSearch-driven universal 3D dataset workflow. Large generated datasets, private server paths, mesh assets, .blend scene files, model weights, API keys, and paper source are intentionally not committed.

Core entry points:

Start a multi-paper universal session:

python scripts/stage0_web_research.py init \
 --query "SeeThrough3D: Occlusion Aware 3D Control in Text-to-Image Generation" \
 --output-dir runtime/research_priors/demo_universal \
 --dataset-mode universal \
 --tag universal-3d-layout

Run a small Blender-backed smoke batch after configuring scene, mesh, runtime, and model paths:

python scripts/universal_3d_layout/run_dataevolver_universal_contract_batch.py \
 --dataevolver-root /path/to/DataEvolver \
 --runtime-root /path/to/runtime \
 --blender /path/to/blender \
 --out /path/to/universal_contract_smoke \
 --num-samples 1 \
 --resolution 512 \
 --engine EEVEE

The universal contract emits RGB targets, per-object masks, structure/OSCR views, depth-order proxies, orientation proxies, camera metadata, mesh metadata, 3D boxes, scene graphs, spatial relations, validation traces, and metadata/records.jsonl. The current depth and normal outputs are proxy artifacts, not dense depth or dense surface-normal supervision.


Project Structure

DataEvolver/
├── pipeline/ # Core pipeline stages
│ ├── stage1_text_expansion.py # LLM prompt generation
│ ├── stage2_t2i_generate.py # Text-to-image (Qwen-Image-2512)
│ ├── stage2_5_sam2_segment.py # SAM3 foreground extraction
│ ├── stage3_image_to_3d.py # 3D mesh reconstruction (Hunyuan3D-2.1)
│ ├── stage4_scene_render.py # Blender scene-aware rendering
│ ├── stage5_5_vlm_review.py # VLM quality review (Qwen3.5-35B-A3B)
│ ├── stage5_6_feedback_apply.py # Action selection & anti-oscillation
│ ├── asset_lifecycle.py # Asset lifecycle management
│ └── rotation_geomodal_dataset.py # Training dataset loader
├── configs/
│ ├── scene_action_space.json # 24 atomic actions definition
│ ├── scene_template.json # Blender scene template config
│ ├── vlm_review_schema.json # VLM review output schema
│ ├── dataset_profiles/ # Dataset configuration profiles
│ └── seed_concepts/ # Seed object definitions (20/50 objects)
├── scripts/ # Utility & build scripts
│ ├── run_scene_agent_monitor.py # VLM loop agent monitor
│ ├── run_scene_agent_step.py # Single-step agent execution
│ ├── export_rotation8_from_best_object_state.py # Consistent rotation export
│ ├── build_rotation8_trainready_dataset.py # Build training pairs
│ ├── build_object_split_for_rotation_dataset.py # Object-disjoint split
│ ├── run_full_pipeline.py # Full pipeline orchestrator
│ ├── stage0_web_research.py # WebSearch prior and handoff builder
│ ├── universal_3d_layout/ # Universal 3D dataset workflow
│ ├── run_vlm_quality_gate_loop.py # VLM quality gate loop
│ ├── feedback_loop/ # Feedback loop utilities
│ └── ... # Additional build & eval scripts
├── assets/
│ ├── hdri/ # HDRI environment maps
│ └── scene/ # Blender scene files (.blend)
└── web/ # Project website (GitHub Pages)

Action Space

The AI agent selects from 24 structured atomic actions organized in 5 groups:

Group Actions Parameters
Lighting (4) Key light intensity ↑↓, key light yaw ±15° Multiplicative ×ばつ0.8 or additive, bounded
Object (6) Elevation ±0.02, yaw ±15°, scale ×ばつ0.9 Bounded within safe ranges
Scene (5) Env rotation ±30°, env intensity ↑↓, contact shadow HDRI and environment controls
Material (9) Saturation, value/brightness, hue offset, roughness, specular/sheen Fine-grained material tuning
Camera (0) Reserved for future use

Full action definitions: configs/scene_action_space.json


DataEvolver-Rotate

The first benchmark dataset produced by DataEvolver — for rotation-conditioned image editing.

Metric Value
Unique Objects 50
Rotation Angles 8 (0°, 45°, 90°, 135°, 180°, 225°, 270°, 315°)
Training Pairs 350 (front → 7 target views)
Train / Val / Test 245 / 49 / 56 pairs (object-disjoint, seed=42)
Modalities RGB, mask, depth, normal

Methodology

Each object uses a single canonical yaw-0° best state as the base. The object is then rotated while the scene, camera, lighting, and material remain fixed — ensuring cross-angle consistency.

Loading the Dataset

import json
from pathlib import Path
from PIL import Image
root = Path("path/to/dataset_split")
rows = []
with (root / "pairs" / "train_pairs.jsonl").open("r") as f:
 for line in f:
 rows.append(json.loads(line))
row = rows[0]
source = Image.open(root / row["source_image"]).convert("RGB")
target = Image.open(root / row["target_image"]).convert("RGB")
instruction = row["instruction"] # e.g., "Rotate the object 45 degrees clockwise"

Using with Claude Code

DataEvolver is designed to work with Claude Code as an AI-powered development and operations assistant. Claude Code reads a project-level CLAUDE.md file to understand your environment, then helps you run pipelines, analyze results, and build datasets through natural language.

Step 1: Install Claude Code

npm install -g @anthropic-ai/claude-code

Step 2: Create your CLAUDE.md

Create a CLAUDE.md in the project root (it's gitignored — each user maintains their own). This file tells Claude Code about your specific environment:

# CLAUDE.md
## Remote Server
- SSH alias: `my-server`
- GPU: 3x A800 80GB (or your setup)
- Python: `/path/to/python3` (3.10+, with PyTorch)
- Blender: `/path/to/blender` (4.2+)
- Code directory: `/path/to/DataEvolver`
## Model Paths
- Qwen-Image-2512: `/path/to/Qwen-Image-2512`
- SAM3 checkpoint: `/path/to/sam3/sam3.pt`
- Hunyuan3D-2.1 repo: `/path/to/Hunyuan3D-2.1`
- Hunyuan3D-2.1 weights: `/path/to/model_hub/Hunyuan3D-2.1`
- Qwen3.5-35B-A3B: `/path/to/Qwen3.5-35B-A3B`
## Scene Config
- Blender scene file: `/path/to/scene.blend`
- HDRI directory: `/path/to/hdri/`
- Render engine: CYCLES, 512 samples, 1024x1024
## Key Configs
- Action space: `configs/scene_action_space.json` (24 atomic actions)
- Scene template: `configs/scene_template.json`
- Dataset profiles: `configs/dataset_profiles/`
## Pipeline
Stage 1 (Text Expansion) → Stage 2 (T2I) → Stage 2.5 (SAM3)
→ Stage 3 (3D Reconstruction) → Stage 4 (Blender Render) → Stage 5 (VLM Loop)
## Working Rules
- Always test on the remote server, not locally
- Use tmux for long-running tasks (screen not available on all servers)
- Read trace.json free-form text for VLM results — not just agg.json scores
- Only stop VLM loop when reviewer explicitly says "keep"
- Check GPU usage before launching new jobs

Step 3: Create Claude Code Skills (Optional)

You can create reusable skills in a skills/ directory for common workflows:

mkdir -p skills/scene-agent-loop
# skills/scene-agent-loop/SKILL.md
---
name: scene-agent-loop
description: Manage the VLM review loop for scene rendering
---
# Scene Agent Loop
Monitor and continue the VLM review → render → agent decision loop.
Read trace.json to understand current state, then decide next action.

Step 4: Launch Claude Code

cd DataEvolver
claude

Claude Code will read your CLAUDE.md and understand the full project context. Example commands:

> Check GPU usage on the server
> Run the full pipeline for 10 new furniture objects
> Export rotation8 dataset from the latest best states
> Build train-ready dataset with object-disjoint split

Citation

@misc{zhang2026dataevolverletdatabuild,
 title={DataEvolver: Let Your Data Build and Improve Itself via Goal-Driven Loop Agents},
 author={Qisong Zhang and Wenzhuo Wu and Zhuangzhuang Jia and Yunhao Yang and Huayu Zhang and Xianghao Zang and Zhixiang He and Zhongjiang He and Kongming Liang and Zhanyu Ma},
 year={2026},
 eprint={2605.01789},
 archivePrefix={arXiv},
 primaryClass={cs.AI},
 url={https://arxiv.org/abs/2605.01789},
}

Honors & Recognition

  • 🏆 AIDataSci 2026, KDD 2026 Workshop — DataEvolver was accepted as an Oral Presentation. View on OpenReview.
  • 🎖️ 2026 BAAI Conference Agent for Science Competition — DataEvolver received a Certificate of Poster Presentation and was selected for on-site poster presentation.

DataEvolver BAAI Agent for Science poster presentation certificate

Roadmap

Completed

  • WebSearch-integrated AI Agent — Stage0 WebSearch supports both single-paper reproduction and multi-paper universal dataset modes. It records paper evidence, exports handoff documents, and has been validated on the remote rendering environment.
  • Multi-paper universal 3D dataset synthesis — the workflow starts from an anchor paper, expands to related 3D-control papers, and extracts shared dataset requirements for target images, masks, OSCR/structure views, camera metadata, object layout, occlusion relations, scene graphs, and validation traces.
  • Universal 3D layout dataset contract — DataEvolver now tracks a reusable schema for Blender-backed target renders, per-object masks, structure views, depth-order proxies, orientation proxies, camera pose/intrinsics/viewpoint tokens, mesh metadata, 3D boxes, scene graphs, spatial relations, and promotion decisions.
  • Blender-backed dual-object scenes — the current universal 3D workflow can generate real-scene, dual-object records with DataEvolver scene and mesh assets. General N-object compositional scenes remain a separate extension.
  • VLM/CV self-evolution loop — selected universal 3D records can be reviewed with calibrated hybrid VLM and CV geometry scores, bounded repair actions, and explicit accept/reject promotion traces.
  • Paper-specific and universal export modes — the WebSearch and universal layout scripts support both single-paper dataset reproduction and multi-paper universal schema export.

In Progress

  • Structured VLM knowledge base — current scoring already uses structured VLM/CV criteria, mask coverage checks, depth-order proxy checks, geometry-review metadata, and a VGGT-Omega proxy hook. A persistent prior-knowledge store for reusable VLM assessment criteria is still under development.
  • Real-world dataset ingestion — WebSearch can discover public datasets and paper resources, and public artifacts such as SeeThrough3D are used as references. A full crawl-clean-process-ingest pipeline for arbitrary real-world datasets is still pending.
  • Auto-generated reasoning-evaluation datasets — current records include validation logs, hybrid scores, and promotion traces. Fully automated benchmark generation after downstream model training remains a future release target.

Pending

  • Video object datasets — extend image-level rendering to temporally consistent object removal, translation, rotation, insertion, and attribute editing across frames.
  • Temporal review and filtering — add sequence-level checks for flicker, trajectory smoothness, mask stability, depth continuity, and action consistency.
  • General N-object compositional scenes — beyond the current dual-object workflow, this requires relation-aware sampling, collision handling, occlusion planning, and stronger multi-object VLM/CV review.
  • Large-scale public dataset ingestion — scale WebSearch-discovered dataset ingestion into repeatable crawling, cleaning, licensing, provenance, and structured export workflows.

License

Apache-2.0

About

Autonomous synthetic data construction via VLM-guided iterative rendering — let your data build and improve itself.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

Contributors

AltStyle によって変換されたページ (->オリジナル) /