Status: IN PROGRESS Started: April 6, 2026 Last Updated: April 9, 2026
| Source | What it provides | Database | Schedule |
|---|---|---|---|
| Kalshi game lines | ML, spread, total, F5, RFI, team total | kalshi-mlb-prices.db | :00 past hour |
| Pinnacle MLB odds | ML, run line, total (sharp anchor) | pinnacle-mlb.db | :02 past hour |
| Pinnacle F5 | First 5 innings ML + total (NOTE: scraper unreliable — F5 lines now derived from full-game + SP ratio) | pinnacle-mlb.db | :02 past hour |
| SBR Multi-Book | ML, spreads, totals from 6 US sportsbooks | sbr-mlb.db | :00 past hour |
| Pregame.com | Cash vs ticket %, line movement, RLM, steam | research-pipeline.db (filtered) | :00 past hour |
| Source | What it provides | Database | Schedule |
|---|---|---|---|
| Kalshi Futures | WS, teams in WS, AL/NL pennant, playoffs, best/worst record (7 series) | kalshi-mlb-futures.db | Weekly Mon → 3d Jul 27 → daily Sep 1 |
| Kalshi Win Totals | Per-team season win over/unders (30 teams) | kalshi-mlb-win-totals.db | Same as above |
| Kalshi Divisions | AL/NL division winners (6 divisions, 30 teams) | kalshi-mlb-divisions.db | Same as above |
| Pinnacle Futures | WS, AL/NL pennant (60 selections, sharp anchor) | pinnacle-mlb-futures.db | Same as above, fires at :05 |
| Source | What it provides | Database |
|---|---|---|
| MLB Stats API | Schedule, lineups, standings, transactions, pitchers | mlb-stats.db |
| MLB Stats API | SP game logs, SP baselines, bullpen usage/status/log, opener flags | mlb-pitching.db |
| MLB Stats API | Batter game logs, baselines, platoon splits, career stats, crosswalk | mlb-batting.db |
| xFIP, quality composite, bullpen index, ABS challenges | Advanced computed metrics | mlb-model.db |
| NWS API | Game-day weather + park factors | mlb-weather.db |
| Derived metrics | Team variance, fatigue, inning scoring, park factors, model daily | mlb-model.db |
| Baseball Reference | Team batting + pitching stats | mlb-batting.db / mlb-pitching.db |
| Baseball Savant | Pitcher stats, catcher framing | mlb-pitching.db |
| UmpScorecards | Umpire assignments + tendencies | mlb-stats.db |
| DRatings | Team power ratings + predictions | research-pipeline.db (filtered) |
| Dimers | Pythagorean win probabilities | research-pipeline.db (filtered) |
| GameSim | Monte Carlo simulation predictions | research-pipeline.db (filtered) |
| Source | What it provides | Database |
|---|---|---|
| Edge Scanner | Kalshi vs Pinnacle mispricings — ML, spread, total, F5, team total | mlb-edges.db |
| F5 Probability Curves | SP-scaled spread/total/ML curves for First 5 Innings | mlb-edges.db |
| Implied Curves | Alt-line probability curves from Kalshi ladders | mlb-edges.db |
| Edge Summary | Per-day edge counts and averages | mlb-edges.db |
| Source | What it provides | Database |
|---|---|---|
| Kalshi MLB props | HR, hits, K, TB, HRR, season stats | kalshi-mlb-props.db |
| FanDuel/DK prop lines | Over/under lines via Odds API | mlb-prop-lines.db |
| Prop edge scanner | Book + model edges for props | mlb-prop-edges.db |
| Awards (future addition) | MVP, Cy Young, ROY, etc. — individual player futures | TBD |
| Time | What fires |
|---|---|
| :00 | Kalshi MLB game prices, SBR multi-book, Pregame sharp money, MLB props |
| :02 | Pinnacle MLB full game + F5 |
| :10 | Edge scanner (reads fresh data from all sources) |
| Game start - 1 min | Closing snapshot (Kalshi + Pinnacle) |
| Phase | Dates | Frequency | Time |
|---|---|---|---|
| Phase 1 | Now → Jul 26, 2026 | Every Monday | 6:00 AM (Kalshi), 6:03 (divisions), 6:05 (Pinnacle) |
| Phase 2 | Jul 27 → Aug 31 | Every 3 days | Same times |
| Phase 3 | Sep 1 → Sep 27 | Daily | Same times |
| Expired | After Sep 27 | Stops scanning | — |
| Group | Time | What runs |
|---|---|---|
| Group 1 — Raw Data Pulls | 9:00 AM | External APIs + web scrapes: MLB Stats API, batter/SP game logs, bullpen, weather, Baseball Reference, Savant, umpires, ratings (Sagarin, DRatings, MoneyPuck, Dimers), NBA/NHL/NCAAB/Soccer/Golf/Motorsports/UFC scrapers, MLB props |
| Group 2 — Baselines & Splits | 9:05 AM | Light computation on Group 1 data: batter baselines, player crosswalk, opener flags, model daily stats, SP baselines, platoon splits |
| Group 3 — Derived Metrics | 9:10 AM | Heavy math needing Groups 1+2: NHL/NBA/MLB variance + fatigue + period scoring, MLB park factors + inning scoring + derived metrics, soccer metrics, player prop analytics, usage cascade, matchup adjustments, MLB props edge scanner + steam |
Each group runs sequentially within itself. Groups are staggered 5 minutes apart so raw data finishes before computation starts. If one group fails, the others still run.
The edge scanner compares Kalshi alt-line prices against Pinnacle sharp odds:
The edge scanner implies team totals from Pinnacle's game total + spread:
Built from Kalshi's alt-line ladders. Each game gets a curve showing the market-implied probability at each threshold. Per-team curves for spreads, game-level for totals. Pinned contracts (bid ≤5 and ask ≥90) are filtered out.
F5 lines are derived from Pinnacle's full-game line, not scraped directly (Pinnacle's F5 scraper is unreliable). The model engine provides per-game SP quality ratios to scale full-game lines to F5.
F5 Spreads — Normal distribution, sigma 2.7 (tighter than full-game 4.0 because 5 innings = less variance). Per-game ratio: model_f5Spread / model_fullSpread applied to Pinnacle's spread. Fallback: flat 0.55 scale when no SP data.
F5 Totals — Poisson distribution. Per-game ratio: model_f5Total / model_fullTotal applied to Pinnacle's total. Produces F5 totals in the ~4-5 run range (vs 7-9 full game). SP quality dominates — an ace matchup gets a lower ratio than two bad starters.
F5 Moneyline (3-way) — Home win / Away win / Tie. Poisson tie model: P(tie) = Σ P(Home=k) × P(Away=k) using SP-scaled team lambdas. Pinnacle's 2-way ML de-vigged to conditional probabilities, then multiplied by (1 - P(tie)) for unconditional. 3 rows per game in the database. F5 ML edge scanner deferred until historical tie rate data validates the Poisson model (~2 weeks of settlements needed). Tie rate tracked passively through Kalshi F5 TIE contract settlements.
All F5 curves stored in mlb-edges.db → mlb_probability_curves table with market_type = f5_spread / f5_total / f5_moneyline.
Dashboard views: /data-status/view/mlb-f5-spread-curves, /data-status/view/mlb-f5-total-curves, /data-status/view/mlb-f5-ml-curves.
F5 spread and total edges are live. The edge scanner detects F5 tickers (contains "F5" in Kalshi ticker), looks up precomputed SP-scaled probabilities from the mlb_probability_curves table, and compares against Kalshi prices. Falls back to on-the-fly curve building with sigma 2.7 if precomputed curve is unavailable. F5 edges tagged with sub_market = 'f5' in sports_edges.
Dashboard views: /data-status/view/edge-scanner-mlb-f5-spreads, /data-status/view/edge-scanner-mlb-f5-totals.
Compare Kalshi futures prices against Pinnacle sharp odds. WS winner, pennant, division odds from Pinnacle serve as the anchor. Gap between Kalshi and Pinnacle = the edge.
Passive poller runs every 5 minutes. Checks MAX(timestamp) per table across all databases. No scraper cooperation needed.
| Source type | Yellow (stale) | Red (alert) |
|---|---|---|
| Game-day odds (6/8/10/2/6) | 30 min after window | 90 min after window |
| Daily stats (9 AM) | 60 min after expected | 180 min after expected |
| Futures (weekly) | 2 days after expected | 7 days after expected |
| Edge scanner | 30 min after window | 90 min after window |
Telegram alert after 2 consecutive misses (10 min of staleness). Consolidated message with color-coded status. Dead-man's switch on the poller itself.
| # | Item | Status |
|---|---|---|
| 1 | Databases isolated | DONE — 16 .db files. Clean schemas: mlb_probability_curves (27 cols with settlement), mlb_edges (37 cols MLB-only). Cross-DB reads use ATTACH pattern. Pregame/DRatings/Dimers/GameSim still in shared db (non-critical). |
| 2 | Scraper queue independent | DONE — Split into 3 staggered groups at 9:00/9:05/9:10 AM ET |
| 3 | Recovery queue | DONE — MLB scrapers in recovery, ToDo desks removed |
| 4 | Freshness tracking | DONE — 24+ sources in passive poller |
| 5 | Dashboard views | DONE — All pointing to isolated .db files |
| 6 | Edge scanner | DONE — Reads Pinnacle from pinnacle-mlb.db, Kalshi from kalshi-mlb-prices.db, writes to mlb-edges.db. Curves built before each scan. |
| 7 | Scan windows | DONE — All tagged 6am/8am/10am/2pm/6pm/close |
| 8 | Data cleanliness | DONE — No live, no settled, same-day, pinned filter |
| 9 | Column formatting | DONE — Human-readable tickers, dates, exec prices |
| 10 | Filters | DONE — Column filters on all MLB views |
| 11 | No cross-desk dependencies | DONE — Stats split into 3 independent groups (9:00/9:05/9:10 AM) |
| 12 | Doubleheader handling | DONE — game_number, G1/G2 detection |
| 13 | Alerts | DONE — Freshness poller with Telegram alerts |
This spec will be updated as the desk is built. Final version when all checklist items pass.