Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

[Dream Cycle 2026年06月10日] performance: DeLM shared-context +10.5pp SWE-bench gap (−50% cost) + security,hive-mind scan #2343

Open

Description

Tonight's Rotation

Field Value
SLOT 0
DEEP surface performance
SCAN surfaces security, hive-mind
Session commit 16a55f7a537c4a405e448e59859866eebbdd45a0
Date 2026年06月10日

Drift Check


Deep Dive Findings — Performance SOTA 2026

SOTA Summary

Finding Source Confidence
DeLM: shared verified context + async task queue → +10.5pp SWE-bench Verified, −50% cost arXiv:2606.10662, Mao & Mirhoseini, Jun 9 2026 A
DeLM: +5.7pp LongBench-v2 Multi-Doc QA across four frontier model families arXiv:2606.10662 A
3SPO: step-wise state-score RL → +22.6% ALFWorld, +15.6pp WebShop vs GRPO; ×ばつ state exploration, ×ばつ faster convergence arXiv:2606.09961, Han et al., Jun 8 2026 A — code available
DocTrace multi-agent RAG +8.85% F1, −53.32% compute cost arXiv:2606.10921, Jun 9 2026 A
TRACE rollout-budget allocation +2.8pp Multi-Hop QA at equal sampling cost (Qwen3-14B) arXiv:2606.11119, Jun 9 2026 B — single model
LangGraph 0ドル.08/task fastest latency; AutoGen ×ばつ costlier at high volume (2K-task benchmark) tensoria.fr 2026 B — independent, methodology public

Gap vs Current Ruflo

Capability Ruflo v3.6.10 DeLM SOTA Gap
Shared context at agent spawn Agents read AgentDB after spawn (sequential) Immutable snapshot passed at spawn Critical
Decentralized task queue Fixed hierarchical dispatch (maxAgents=8) Async task claim, no central router Significant
Step-wise RL credit assignment SONA: trajectory-level patterns only 3SPO: per-state bandit abstraction Significant
Published SWE-bench score None DeLM best-in-class Visibility gap
Per-task cost tracking None 0ドル.08/task (LangGraph) Visibility gap

DeLM's core insight: shared context eliminates the sequential SendMessage rounds that force serialized agent work. Ruflo's SONA adaptation (0.0043ms/adapt, measured) applies at trajectory level — step-wise state scores are the prerequisite for 3SPO's +22.6% gain.

Recommended Action

ADR-148 filed (see PR): add snapshot_context: boolean to swarm_init; serialize AgentDB namespace into immutable SwarmContextSnapshot passed to each agent at spawn. ~80 LOC across 3 files. Feature-flagged, benchmarked before default flip.


Scan Findings — Security


Scan Findings — Hive-Mind

  • Source: arXiv:2605.09076 (Robust Multi-Agent LLMs under Byzantine Faults, May 2026); IEEE DataPort AgentShield dataset (15K scenarios, 2026)
  • Competitive signal: CP-WBFT (confidence probe-based weighted BFT) and SAC (Self-Anchored Consensus) are 2026 SOTA upgrades. Neither is implemented in LangGraph, AutoGen, CrewAI, or OpenAI Agents SDK — first-mover opportunity.
  • Finding: Ruflo's byzantine consensus mode uses standard BFT voting without agent confidence scores. CP-WBFT adds confidence-weighted votes; SAC adds decentralized iterative filtering. Same gap identified in [Dream Cycle 2026年06月05日] performance: LAMaS 38-46% critical-path gap — Ruflo fixed-hierarchical misses it + security,hive-mind scan #2294 from different papers — cross-source validation. No ADR — implementation-level addition to quorum-manager.ts. (Grade A — peer-reviewed, reproducible)

Competitors Reviewed

Framework SWE-bench (2026) Latency Cost/task Key 2026 Change
DeLM (arXiv research) Best-in-class +10.5pp −50% vs baseline Shared context + async task queue
LangGraph v0.4 76% (B) Fastest (B) 0ドル.08 Per-node timeouts, graceful shutdown
AutoGen AG2 1.0 68% (B) Mid 0ドル.40–0.48 Event-driven rearchitecture
CrewAI 0.105 71% (B) Mid ×ばつ tokens on simple tasks Role-based v2, enterprise observability
OpenAI Agents SDK Not disclosed Sandbox + approval callbacks

Gist Link

Full research report committed to branch at v3/docs/dream/dream-gist-2026年06月10日.md in dream/2026-06-10-performance. No standalone GitHub Gist MCP tool available in this remote environment.


Witness

Field Value
Session commit 16a55f7a537c4a405e448e59859866eebbdd45a0
Report SHA-256 0fda968f312910561ada694b801ecf60f887c9a6f07c79e3604d895a749d103e
Witness stamp 530b39939260058cfbe8d03c457a0aab8149ed7d77683a6f6a647be89f14f0e3

Verifier: fetch raw v3/docs/dream/dream-gist-2026年06月10日.md from branch → sha256sum → concat 16a55f7a537c4a405e448e59859866eebbdd45a0sha256sum → must equal witness stamp.

ADR filed: ADR-148 (Shared-Context Parallel Dispatch). Branch: dream/2026-06-10-performance.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

      Relationships

      None yet

      Development

      No branches or pull requests

      Issue actions

        AltStyle によって変換されたページ (->オリジナル) /