Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Releases: aiming-lab/AutoResearchClaw

AutoResearchClaw v0.5.0

20 May 04:39
@Jiaaqiliu Jiaaqiliu

Choose a tag to compare

AutoResearchClaw v0.5.0

Highlights

  • Multi-Domain Architecture: Expanded beyond ML to support HEP Physics, Biology, Quantum Computing, and Statistics domains with profile-driven deployment
  • ARC-Bench Evaluation Framework: Standardized benchmark suite with 50+ topics across 5 domains (ML01-ML25, P01-P10, Q01-Q10, B01-B07, S01-S03), rubric-based judging, and baseline adapters for AIDE, Agent Laboratory, and AI-Scientist-v2
  • ColliderAgent Integration: Full HEP physics simulation pipeline support (MadGraph → Pythia → Delphes) with incremental experiment mode and Stage-12 re-entry
  • Biology-Agent Integration: Metabolic modeling with COBRApy/Biopython skills, FBA simulation, and GSMM validation
  • Quantum-Qiskit Skill: Qiskit-based quantum computing experiment support for quantum topics
  • Statistics Domain Agent: Statistical method design, experiment evaluation, and theory analysis
  • Requirements Gate: LLM capability validation before pipeline execution
  • Profile-Driven Deployment: Interactive CLI for domain profile creation and management
  • Incremental Experiment Mode: Resume experiments at Stage-12 with delta-prompt assembly
  • Expanded Test Suite: New tests for HEP prompt hygiene, incremental experiments, and domain integrations

Breaking Changes

  • Topic IDs renamed: T01-T25 → ML01-ML25 in ARC-Bench
  • researchclaw/prompts.py refactored into researchclaw/prompts/ package (domain-aware prompt banks)

Documentation

  • Domain Integration Guide for adding new scientific domains
  • Tester guides in English, Chinese, and Japanese
  • ARC-Bench experiment design docs and run guides
  • Showcase papers demonstrating pipeline outputs

Full Changelog: v0.4.0...v0.5.0

Assets 2
Loading
zunami, dnzengou, Despairrrrr, lehahh, and 2936330194 reacted with thumbs up emoji
5 people reacted

v0.4.0: Human-in-the-Loop Co-Pilot System

01 Apr 18:37
@Jiaaqiliu Jiaaqiliu

Choose a tag to compare

v0.4.0 — Human-in-the-Loop Co-Pilot System

AutoResearchClaw is no longer purely autonomous. The new HITL Co-Pilot system transforms the pipeline into a human-AI collaborative research engine.

Highlights

  • 6+ Intervention Modes: full-auto, gate-only, checkpoint, step-by-step, co-pilot, custom, express
  • Idea Workshop: Brainstorm and refine hypotheses collaboratively (Stages 7-8)
  • Baseline Navigator: Review and customize experiment designs (Stage 9)
  • Paper Co-Writer: Section-by-section collaborative drafting (Stages 16-19)
  • SmartPause: Confidence-driven dynamic intervention
  • ALHF Intervention Learning: Learns from your review patterns
  • Claim Verification: Inline fact-checking against collected literature
  • Cost Guardrails: Budget monitoring with threshold alerts
  • Pipeline Branching: Fork to explore multiple research directions
  • CLI Commands: attach, status, approve, reject, guide
  • 3 Adapters: CLI, WebSocket, MCP

New Files

  • researchclaw/hitl/ — 34 modules (7,500+ lines)
  • tests/test_hitl_*.py — 9 test files (242 tests)
  • docs/HITL_GUIDE.md — 620-line guide
  • 3 new builtin skills

Testing

  • 2,753 tests passed, 0 failures

Full Changelog: v0.3.2...v0.4.0

Loading
MAIOWtheCODER, torienmwaura, Anmol-png-ux, zhiwenluo0802, and JOMYYYYYY reacted with thumbs up emoji MAIOWtheCODER and Gwaps25 reacted with laugh emoji MAIOWtheCODER and bivianicbasseka7-cell reacted with hooray emoji MAIOWtheCODER reacted with heart emoji MAIOWtheCODER reacted with rocket emoji MAIOWtheCODER reacted with eyes emoji
7 people reacted

v0.3.2: Cross-Platform Support + Major Stability

23 Mar 03:27
@Jiaaqiliu Jiaaqiliu

Choose a tag to compare

What's New

Cross-Platform Support

  • ACP-compatible agent backends: Claude Code, Codex CLI, Copilot CLI, Gemini CLI, Kimi CLI
  • OpenClaw bridge: messaging platform integration (Discord, Telegram, Lark, WeChat)
  • CLI-agent code generation backend: delegates Stages 10 & 13 to external CLI agents with budget control and timeout management

Anti-Fabrication System

  • VerifiedRegistry: ground-truth whitelist from experiment results with tolerance matching
  • Experiment diagnosis & repair loop: 13 deficiency categories, auto-repair with best-result selection
  • Always-on sanitization: unverified numbers replaced in paper tables

Stability & Quality

  • 100+ bug fixes across 8 deep audit rounds
  • Modular executor refactoring (10K → 400-line facade)
  • --resume auto-detection for interrupted runs
  • LLM retry hardening with exponential backoff
  • Community-reported fixes (macOS M3, math/theoretical topics)

New Subsystems

  • Assessor (paper quality scoring + venue recommendation)
  • Calendar (conference deadline tracking)
  • Collaboration (multi-user research coordination)
  • Copilot (interactive steering modes)
  • Dashboard (real-time metrics broadcasting)
  • Knowledge Graph (entity extraction + visualization)
  • Memory (cross-run experiment/ideation/writing memory)
  • MCP (Model Context Protocol server)
  • Overleaf (live sync with conflict resolution)
  • Project Manager (multi-project scheduling)
  • Remote Servers (SSH/SLURM/cloud execution)
  • Skills Library (12 built-in domain/tooling skills)
  • Trends (daily arXiv digest + opportunity finder)
  • Voice (speech-to-text commands)
  • Wizard (guided project setup)

Testing

  • 1,935 tests passing

Full Changelog: v0.3.1...v0.3.2

Loading
MAIOWtheCODER reacted with thumbs up emoji MAIOWtheCODER reacted with laugh emoji MAIOWtheCODER reacted with hooray emoji luisalbertopaes-design, Palito221, PClipemoran-dev, NguyenTrongNguyen04, rayyan666, MAIOWtheCODER, aliya4bijTyURqt, and heuF9R6ArEGJ reacted with heart emoji KostakisMT and MAIOWtheCODER reacted with rocket emoji MAIOWtheCODER reacted with eyes emoji
9 people reacted

v0.3.1 — OpenCode Beast Mode + Community Contributions

18 Mar 15:41
@Jiaaqiliu Jiaaqiliu

Choose a tag to compare

What's New

OpenCode Beast Mode

New "Beast Mode" routes complex code generation to OpenCode with automatic 6-signal complexity scoring and graceful fallback to CodeAgent.

Universal Cross-Domain Support

  • Domain detector for 7 research domains (ML, physics, chemistry, economics, math, biology, security)
  • 25+ domain-specific experiment profiles with tailored datasets, metrics, and evaluation protocols
  • Domain-aware prompt adapters and Docker images

Code Searcher Agent

GitHub-integrated code search for experiment design reference — query generation, pattern extraction, and result caching.

Web Integration Layer

Web search, crawling, PDF extraction, and Google Scholar support for enhanced literature discovery.

Community Contributions

  • Novita AI provider — added as built-in LLM provider preset (#80)
  • Thread-safety hardening — all module-level globals protected with locks (#77)
  • Figure agent config — fixed 6 missing config fields (#75)
  • Robust LLM output parsing — 4-strategy JSON extractor, ACP-aware YAML extraction (#69)
  • Test collection fix — safe skipif guard for Anthropic tests (#53)

Bug Fixes

  • 20+ edge-case fixes from 3-round deep audit
  • Fixed sleep-inside-lock contention in Semantic Scholar batch API
  • Fixed false-positive regex on Markdown links in thinking-tag stripper
  • Caught RecursionError in JSON parser for deeply nested payloads
  • Convergence evaluator with early stopping recommendations

Full Changelog: v0.3.0...v0.3.1

Loading
MAIOWtheCODER reacted with thumbs up emoji MAIOWtheCODER and heuF9R6ArEGJ reacted with laugh emoji M0nteCarl0, rayyan666, MAIOWtheCODER, and aliya4bijTyURqt reacted with heart emoji
5 people reacted

v0.3.0 — MetaClaw Integration, CodeAgent v2, 50+ Bug Fixes

17 Mar 13:50
@Jiaaqiliu Jiaaqiliu

Choose a tag to compare

What's New in v0.3.0

MetaClaw Cross-Run Learning Integration

  • New researchclaw/metaclaw_bridge/ module: skill injection, lesson-to-skill conversion, PRM quality gates, session lifecycle management
  • Pipeline failures → structured lessons → reusable skills, injected into all 23 stages
  • +18.3% pipeline robustness in controlled experiments
  • Opt-in via metaclaw_bridge.enabled: true, fully backward-compatible

CodeAgent v2 — Enhanced Code Generation

  • Enhanced Blueprint: deep implementation specs with per-file pseudocode, tensor shapes, generation order
  • Sequential File Generation: dependency-ordered with AST-based CodeMem
  • Hard Validation Gates: block identical ablations, hardcoded metrics, cross-file import errors
  • Targeted Error Repair: parse traceback to fix surgically instead of full regeneration

BenchmarkAgent & FigureAgent Improvements

  • BenchmarkAgent: domain-aware benchmarks, import validation, pretrained resize
  • FigureAgent: LLM output type safety, Paul Tol colorblind-safe palette, heatmap/ablation chart types
  • visualize.py full rewrite: academic styling, 300 DPI, 6 enhanced chart types

50+ Pipeline Bug Fixes (BUG-06 through BUG-51)

  • Metric direction, citation verify, CodeGen guard, condition drift, RL stability
  • BST ordering, raw metrics, bracket citations, arXiv categories
  • Docker HF permission, KD stability, ablation detection
  • references.bib fallback generation, ICML LaTeX template fix

Paper Quality Hardening (4 Rounds)

  • Post-compilation quality checks, weasel/duplicate word lint, NeurIPS checklist
  • LaTeX escaping, 7-dim AI-Scientist-style review, AI-slop detection (50+ phrase blocklist)
  • Related work depth checker, stats rigor validator, anti-boilerplate prompts
  • Cross-discipline support for 7 domains

Docker Sandbox & Infrastructure

  • Network-policy-aware code generation & sandbox execution
  • Rate-limit defense for literature search APIs (OpenAlex → Semantic Scholar → arXiv cascade)

Full Changelog: v0.2.0...v0.3.0

Loading
janitorbob7800, Hour-Meng, lehahh, MAIOWtheCODER, and aliya4bijTyURqt reacted with thumbs up emoji
5 people reacted

v0.2.0 — Multi-Agent Pipeline, Docker Sandbox & Quality Hardening

16 Mar 16:27
@Jiaaqiliu Jiaaqiliu

Choose a tag to compare

Highlights

This release introduces three multi-agent subsystems, a hardened Docker sandbox, and 4 rounds of paper quality auditing — significantly improving the end-to-end quality of generated research papers.

New Multi-Agent Subsystems

CodeAgent (4-phase architecture)

  • LLM generates multi-file experiment code (main.py + setup.py + requirements.txt)
  • Static analysis & deep validation (AST-based class/method checks)
  • LLM-guided code review with structured JSON feedback
  • Iterative repair loop (up to 3 rounds) with automatic UnboundLocalError fix

BenchmarkAgent (4 sub-agents: Surveyor → Selector → Acquirer → Validator)

  • Domain-aware dataset and baseline selection from 13-domain knowledge base
  • Automatic benchmark acquisition with Docker compatibility validation
  • Integrated at Stage 9 (experiment_design), output injected into Stage 10

FigureAgent (5 sub-agents: Planner → CodeGen → Renderer → Critic → Integrator)

  • Academic-quality chart generation with SciencePlots, 300 DPI, colorblind-safe palette
  • 6 built-in chart templates + LLM fallback for custom visualizations
  • Tri-modal critic review (data accuracy, aesthetics, academic convention)

Docker Sandbox Enhancements

  • Network-policy-aware code generation: none | setup_only | pip_only | full
  • Dynamic dependency installation via requirements.txt
  • Pre-cached datasets: CIFAR-10/100, MNIST, FashionMNIST, STL-10, SVHN
  • Extended ML stack: torch, torchvision, timm, einops, transformers, etc.

Paper Quality Hardening (4-round audit)

  • Post-compilation quality checks, weasel/duplicate word lint
  • 7-dimension AI-Scientist-style review scoring
  • AI-slop detection (50+ phrases), statistical rigor validator
  • Cross-discipline support for 7 research domains (ML/physics/chem/econ/math/eng/bio)
  • NeurIPS checklist integration

Bug Fixes (15+)

  • Fix baselines dict-to-list crash in BenchmarkAgent
  • Fix Gymnasium environment versions (v4 → v5)
  • Fix experiment condition drift in iterative refinement (anchor to exp_plan.yaml)
  • Fix compute budget constraint for experiment design
  • Fix metric direction mismatch, citation verification batching
  • Fix LaTeX output sanitization, figure plan format handling
  • Add RL stability guidance (gradient clipping, NaN guard)
  • And more — see full commit message for details

Compatibility

All changes are backward-compatible with v0.1.0 configuration files.

Full Changelog: v0.1.0...v0.2.0

Loading
MAIOWtheCODER and heuF9R6ArEGJ reacted with thumbs up emoji
2 people reacted

v0.1.0 — Initial Release

15 Mar 04:35
@huaxiuyao huaxiuyao

Choose a tag to compare

AutoResearchClaw v0.1.0

Fully autonomous research pipeline: one message in, full conference paper out. 🦞

Highlights

  • 23-stage pipeline: Research Scoping → Literature Discovery → Knowledge Synthesis → Hypothesis Generation → Experiment Design → Self-Healing Execution → Analysis & Decision → Paper Writing → Citation Verification
  • Multi-agent debate: 3 agents (Innovator, Pragmatist, Contrarian) argue over hypotheses; adversarial analysis panel reviews results
  • Self-healing executor: autonomous crash diagnosis, code repair, and Pivot/Refine decisions
  • Cross-run evolution: time-decayed lesson store that improves future runs
  • Citation verification: 4-layer pipeline (arXiv, DOI, Semantic Scholar, LLM relevance check)
  • OpenClaw integration: trigger full runs from a chat message

Results (6 end-to-end runs)

  • 100% pipeline completion (124/124 steps)
  • 94.3% citation integrity
  • Mean quality 6.2/10 on conference review scale

Requirements

  • Python 3.9+
  • OpenAI-compatible LLM API
Loading
caming2003, zzjzzj1, and aliya4bijTyURqt reacted with thumbs up emoji
3 people reacted

AltStyle によって変換されたページ (->オリジナル) /