Name	Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets	assets
core	core
data	data
diagnostics	diagnostics
docs	docs
tests	tests
.env.example	.env.example
.gitignore	.gitignore
PORTFOLIO.fr.md	PORTFOLIO.fr.md
PORTFOLIO.md	PORTFOLIO.md
README.md	README.md
conftest.py	conftest.py
requirements.txt	requirements.txt
validate.py	validate.py

Au2fut — CME Micro Futures Prop Edge Harness

Honest backtest — prove an edge out-of-sample or kill it

Research only. No live trading, no execution code. This phase answers one question honestly: does a tradeable edge survive real futures costs and a prop firm's drawdown rules? Built on the discipline of the au2 / Au2qwen edge investigation (see docs/METHODOLOGY.md).

Why

au2 proved on BTC that a prop challenge is −EV by construction when no edge clears costs. Au2fut re-asks the question on CME micro futures (MES/MNQ/MGC) and futures prop firms (Topstep/Apex), where costs are a few $ per contract and equity indices carry real trend — but it refuses to build any trading infra until the edge is proven net of costs, out-of-sample and forward.

Install

pip install -r requirements.txt
cp .env.example .env # research-only; no live keys

The pipeline

data/fetch.py bars — pluggable source (yahoo|databento|ibkr|csv), cached
 └─ core/instruments.py exact tick value + $ cost model (env-overridable)
 └─ diagnostics/edge_scan.py breakout net $/contract verdict
 └─ diagnostics/mr_session.py session mean-reversion (pre-registered, OOS)
 └─ diagnostics/oos_validate.py train/test + walk-forward OOS
 └─ core/prop_rules.py Topstep/Apex trailing-DD machine
 └─ diagnostics/prop_mc.py P(pass), EV (--strategy mr|breakout)

Data sources (set `FUT_DATA_SOURCE`)

Source	Depth	Setup
`yahoo` (default)	intraday ~60d (tiny OOS)	none
`databento`	CME Globex minute/tick, multi-year	`pip install databento`, `DATABENTO_API_KEY`
`ibkr`	IBKR historical (pacing-limited)	`pip install ib_async`, TWS/Gateway running
`csv`	whatever you export	`data/csv/<SYM>_<interval>.csv` or `FUT_CSV_PATH`

Sub-hourly bars (5m/15m/30m) for databento/ibkr are aggregated from 1m, aligned to midnight UTC so RTH filtering stays correct. All diagnostics are source-agnostic — switching FUT_DATA_SOURCE changes nothing in the strategy code. The whole point: re-run the MR OOS verdict on years of multi-regime data the moment you plug in Databento/IBKR.

Run it

One command, any strategy, honest out-of-sample verdict (never an in-sample number):

python validate.py edge MES 5m 1y # Donchian breakout
python validate.py mr MES 5m 1y # session mean-reversion
python validate.py spread MES MNQ 5m 1y # cointegrated spread MR

For deep multi-regime data (the only way to trust the verdict), set the source:

# PowerShell: $env:FUT_DATA_SOURCE="databento"; $env:DATABENTO_API_KEY="db-..."
FUT_DATA_SOURCE=databento DATABENTO_API_KEY=db-... python validate.py mr MNQ 5m 1y

Lower-level tools the CLI wraps:

python -m data.fetch MES 1h 60d # sanity-check data
python -m diagnostics.edge_scan MES 1h 60d # in-sample sweep (context)
python -m diagnostics.prop_mc MES 5m 1y topstep_50k --strategy mr --contracts 2
python -m pytest tests/ -q # trust the rule engine

First read — and why it did NOT survive honest OOS

In-sample (edge_scan, whole window) looked encouraging:

Metric	In-sample result
MES 1h Donchian	positive across ~22/24 configs, best ~48ドル/trade PF 1.54
MNQ 1h Donchian	strongly positive (~21ドルk/yr/contract best)
Topstep 50K P(pass) @ 2 ct (`prop_mc`)	~59%

Then oos_validate.py (select params on train, trade held-out test) deflated it:

Test	OOS result	Verdict
MES 1h walk-forward	net_mean −0ドル.19, PF 1.00 (n=20)	edge gone — in-sample was a fit
MNQ 1h walk-forward	net_mean +75,ドル PF 1.39 (n=18)	weakly positive, too thin
MES/MNQ/MGC 1d 2y	n=4–12 per slice, signs flip	noise — inconclusive

Pushing Yahoo to its limit (5m/60d ≈ 13k bars) gave a statistically real OOS sample — and it was conclusive:

Instrument	TF	OOS n	net_mean	PF	win%
MES	5m	162	−7ドル.88	0.73	31%
MNQ	5m	160	−15ドル.38	0.80	41%
MES	30m	33	−6ドル.61	0.93	30%
MNQ	30m	37	−59ドル.51	0.73	30%

Verdict (Donchian breakout, MES/MNQ intraday, this data): no edge. Every TF collapses OOS — TRAIN PF 1.4–19, TEST negative, the textbook overfit signature. The n=162 5m sample is large enough to trust. The cheap futures cost structure did NOT rescue it because the signal itself is non-predictive intraday — the same thing au2 found for the seconds-scale BTC signal. The in-sample 59% prop pass-rate was a complete mirage.

Scope of this verdict: it kills the breakout hypothesis on these instruments on this data — not "no edge of any kind exists." Testing more strategy families is possible but must be done OOS-first / pre-registered to avoid data-mining a false winner (test enough strategies and one looks great in-sample by luck). Deeper minute data (Databento/IBKR) would also let mean-reversion / session strategies be judged on hundreds of OOS trades — see docs/METHODOLOGY.md.

Session mean-reversion — the first hypothesis to SURVIVE OOS

diagnostics/mr_session.py (pre-registered: fade Bollinger extremes inside RTH, flat at close — the inverse of breakout). Anchored walk-forward, OOS:

Instrument	TF	OOS n	net_mean	PF	survives 2-tick slip?	3-tick?
MES	5m	68	+5ドル.67 → +4ドル.42	1.22	yes (+4ドル.42)	no (−2ドル.14)
MNQ	5m	90	+3ドル.13 → +2ドル.63	1.05	yes (thin)	—
MES	15m	29	+16ドル.76	1.86	—	—

On the shallow 60-71d Yahoo window this looked like the first thing across BTC and futures to survive a pre-registered OOS test — but deep data killed it.

Deep-data verdict (Databento, 1 full year, 5m) — MR EDGE IS DEAD

Instrument	OOS n	net_mean	PF	verdict
MES 5m 1y	184	−5ドル.76	0.84	rejected
MNQ 5m 1y	279	−2ドル.68	0.96	rejected

Both negative even in-sample (MES −4ドル.89, MNQ −2ドル.41 over 441/427 trades). The summer-2026 positive was a regime artifact — gone on a multi-regime year. The few euros of Databento bought certainty and saved a real-capital deposit. This is the same verdict au2 reached on BTC: no edge clears costs. The line below about "thin but real" applied only to the shallow window and no longer holds.

(Historical note — the shallow-data reading that did NOT survive:) the edge was ~3-4 ticks gross and died at 3-tick slippage; fading the open is where slippage is worst.

Does it pass a prop challenge? No, not reliably (prop_mc --strategy mr, MES 5m, Topstep 50K, 2-tick slip): P(pass) ~0% at 1 ct, ~5% at 2 ct, ~28% at 4 ct — but 4 ct carries a 64% blow-up rate. The thin edge can't both hit a fixed-$ target and survive a fixed-$ trailing drawdown at the size required. Same structural wall au2's crypto prop_mc found.

Where it could matter: a personal account (no deadline, no trailing DD, small size, slow compounding) — not "fast cash," and only if real execution slippage stays ≤ 2 ticks. Mandatory next steps before any money: deeper minute data across multiple regimes, and forward live-paper measuring actual fill slippage on the open fade.

Layout

Au2fut/
├── core/
│ ├── instruments.py CME micro specs + $ cost model
│ └── prop_rules.py Topstep/Apex trailing-DD / daily-loss / target engine
├── data/
│ └── fetch.py Yahoo bar fetcher (cached); swap for Databento/IBKR later
├── diagnostics/
│ ├── edge_scan.py vol-gated Donchian backtest, net $ verdict
│ └── prop_mc.py Monte Carlo P(pass)/EV through the prop rules
├── tests/ prop-rules engine tests (9, all green)
└── docs/METHODOLOGY.md the discipline, ported from au2

Status / next steps

Honest $ cost model + prop-rules engine (tested)
Edge-scan + prop Monte Carlo, runnable on real data
OOS validation (oos_validate.py) — in-sample edge did NOT survive on free data
Deeper data: minute bars w/ years of history (Databento/IBKR) → swap data/fetch.py
Re-run OOS for a statistically meaningful sample (target ≥ 100 OOS trades)
Calibrate FUT_RT_COMMISSION to a real prop plan fee schedule
Forward live-paper to confirm backtest net $/trade
Only then: executor + prop-risk guard (port from au2 prop_guard.py)

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Makeph/honest-backtest

Folders and files

Latest commit

History

Repository files navigation

Au2fut — CME Micro Futures Prop Edge Harness

Why

Install

The pipeline

Data sources (set `FUT_DATA_SOURCE`)

Run it

First read — and why it did NOT survive honest OOS

Session mean-reversion — the first hypothesis to SURVIVE OOS

Deep-data verdict (Databento, 1 full year, 5m) — MR EDGE IS DEAD

Layout

Status / next steps

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Au2fut — CME Micro Futures Prop Edge Harness

Why

Install

The pipeline

Data sources (set FUT_DATA_SOURCE)

Run it

First read — and why it did NOT survive honest OOS

Session mean-reversion — the first hypothesis to SURVIVE OOS

Deep-data verdict (Databento, 1 full year, 5m) — MR EDGE IS DEAD

Layout

Status / next steps

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Data sources (set `FUT_DATA_SOURCE`)

Packages