Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

For backtesters: what's missing from the historical Polymarket data that you'd actually use? #4

Unanswered
manja316 asked this question in Q&A
Discussion options

The screener is the surface, but the thing behind it is a snapshot pipeline — 10.8M+ rows across 13,963 markets, refreshed daily. I want to ask people who actually backtest prediction-market strategies what fields they keep wishing were there.

What the snapshots currently capture (per market per refresh):

  • price (yes-side mid)
  • volume cumulative + volume_24h
  • liquidity (Gamma's number, not orderbook-derived)
  • one_day_change, one_hour_change, one_week_change
  • closed, archived, active flags
  • end_date, category, tags
  • outcome_prices (full multi-outcome where applicable)

What I know is missing and have not (yet) added:

  1. Order-book depth at multiple levels. Currently zero. Would need to hit CLOB per market, expensive at 13k markets but maybe doable for a curated top-N by liquidity.
  2. Trade-by-trade tape. Not snapshotted — only aggregated volumes. Without the tape you can't reconstruct VWAP or detect single-fill spikes.
  3. Resolution outcome + timestamp for closed markets. We have closed=true but not always the resolved outcome cleanly joined back to the historical snapshots, so survival-bias-aware backtests are awkward.
  4. News/event tagging. Markets that moved 20% in an hour — was there a tweet, a court ruling, an earnings print? Currently zero linkage.
  5. Funding-rate / borrow analogues. Polymarket doesn't have these, but the cost-of-carry equivalent (capital lockup until resolution) is computable from end_date + price and we don't expose it as a field.

Question for anyone running models on prediction-market data:

  • Which of (1)–(5) would change what you can backtest vs just being nice-to-have?
  • Is there a 6th thing I'm not listing that you've had to scrape yourself?
  • If you could only add one field per snapshot row, what would it be?

The full historical pull (SQLite + CSV) is on Gumroad — 9,ドル freely redistributable for research. The screener stays free. Answers here genuinely shape what the next refresh adds, so be specific.

Methodology background on the existing crash-signal column lives in Discussion #2 if useful context.

You must be logged in to vote

Replies: 0 comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
1 participant

AltStyle によって変換されたページ (->オリジナル) /