MLB Data Audit — Council Ruling

Date: 2026-04-01 Process: Full 5-phase council (Advisory → Anonymization → Peer Review → Chairman Synthesis → Boss Ruling) Advisors: Opus, Sonnet, Gemini 3.1 Pro, Grok 4.20 Reasoning, gpt-oss-120b Winner: gpt-oss (3 of 5 peer review votes) Status: PENDING BOSS RULING on open questions


COUNCIL SUMMARY

Where Advisors Agreed

  1. Starting pitcher is the #1 data input — SP quality metrics (ERA, FIP, xFIP, WHIP, K/9, BB/9) are foundational
  2. Bullpen availability tracking is #2 — pitch counts, days rest, multi-day workload all must be tracked daily
  3. Weather data from NWS API — critical for outdoor parks, affects totals significantly
  4. Park factors per stadium — Coors (extreme), Yankee Stadium (short porch), Oracle Park (marine layer)
  5. Platoon splits (L/R matchups) — 30-50 point wOBA swings at team level
  6. EWMA with stat-specific decay rates — different alphas for SP metrics vs team batting vs bullpen
  7. 4 separate edge scanners — Moneyline, Run Line, Totals, First 5 Innings
  8. Poisson/Negative Binomial for run distributions — vanilla Poisson insufficient due to overdispersion
  9. First-time-through-order (FTTO) splits essential for F5 market
  10. Early-season protocol — limited SP sample in April requires blending with projections

Where Advisors Disagreed

  1. Database engine: gpt-oss recommended PostgreSQL + TimescaleDB with 20+ tables. Opus used SQLite with 14 tables. Council verdict: SQLite WAL for current scale, design migration-ready.
  2. Distribution model: gpt-oss used Zero-Inflated Negative Binomial, Opus used Poisson with walk-off correction, Gemini used basic Poisson. Council verdict: Negative Binomial preferred for run scoring (overdispersion), with walk-off correction for run line.
  3. BvP (batter vs pitcher) data: Opus explicitly disqualified individual BvP due to small samples. Others included it with caveats. Council verdict: Drop individual BvP, use platoon splits at team level instead.
  4. Weather source: Grok recommended OpenWeatherMap, Opus specified NWS API only. Council verdict: NWS API only (free, reliable, already established policy).

Strongest Arguments (from peer review)

gpt-oss wins with the most complete data architecture design:

Opus runner-up with deepest baseball analytics knowledge:

Biggest Blind Spot

Gemini: Skeleton schema (4 tables, no indexes, no constraints), recommended wrong weather API (OpenWeatherMap instead of NWS), no formulas for distribution parameters, vague source references without API specs.

What Everyone Missed (from peer reviews)

  1. Real audit pipeline vs data feeds — All advisors designed data collection but none built proper data quality observability: freshness SLAs, source reconciliation, anomaly detection, data lineage, reproducibility.
  2. Market data integration layer — Real-time odds ingestion, line movement tracking, de-vigging architecture, steam/reverse-edge alerts.
  3. P&L attribution per data input — No way to measure whether SPQC, bullpen availability, or weather adjustments are actually profitable over time.
  4. Lineup delta detection — Star position player rest days happen daily; need automated parser comparing official vs projected lineup.
  5. Kalshi-specific liquidity constraints — Thin exchange, position sizing must account for market impact.

BUILD PLAN

Phase 1: Core MLB Data Tables

mlb_sp_game_logs:

mlb_sp_baselines:

mlb_team_batting:

mlb_bullpen_status:

mlb_game_weather:

mlb_park_factors:

mlb_umpire_data:

mlb_lineups:

Phase 2: Derived Metrics

Metric Formula Purpose
SP Quality Composite (SPQC) Weighted: 0.3×xFIP + 0.3×FIP + 0.2×ERA + 0.2×EWMA_GS Single SP quality number
Bullpen Availability Index (BAI) Weighted avg of available arms × role importance Team bullpen readiness score
Weather Run Factor (WRF) wind_component × park_multiplier + temp_adj + humidity_adj + altitude_adj Total weather impact on runs
Platoon Advantage Score Team wOBA vs SP hand − team season wOBA Measures platoon edge
FTTO Decay Rate SP's innings 1-3 wOBA vs innings 4-5 wOBA How much SP degrades through order
Day-Night Fatigue Team batting stats in day-after-night games vs baseline Quantified fatigue effect
Lineup Strength Delta Actual lineup wRC+ − projected lineup wRC+ Detects star rest days

Phase 3: Distribution Models Per Market

Market Distribution Parameters Notes
Moneyline Negative Binomial (each team's runs) μ from SPQC × batting × park × weather; k from team variance Win prob = P(runs_home > runs_away)
Run Line (-1.5) Negative Binomial with walk-off correction Same μ, k + home walk-off truncation Home teams don't bat bottom 9th if leading → reduces home -1.5 cover prob
Totals Negative Binomial (combined runs) μ_total = μ_home + μ_away; adjusted for weather, park, bullpen Over/under probability at each threshold
First 5 Innings Modified NB (SP-only, no bullpen) μ from FTTO splits × batting vs SP hand × park; NO bullpen component Isolates SP — use innings 1-5 specific rates only

Phase 4: Edge Scanners (4 scanners)

Common engine:

  1. Ingest Pinnacle odds for all 4 markets
  2. De-vig using Shin + Power methods
  3. Build NB probability curves with all adjustments
  4. Compare to Kalshi contract prices
  5. Min edge: 4 cents after Kalshi 7% fee
  6. Min sample: SP must have 5+ starts this season (early-season gate)
  7. Output: {game_id, market_type, side, model_prob, kalshi_price, edge, confidence, sp_status, weather_flag}

Per-market unique logic:

Scanner Unique Logic
Moneyline SP quality is primary driver, bullpen quality secondary, weather minimal impact
Run Line Walk-off correction for home favorites, bullpen quality MORE important (late-game leverage)
Totals Weather is PRIMARY driver (wind × park × temp × humidity), bullpen quality important, umpire zone
First 5 SP-only — FTTO splits, umpire zone, NO bullpen factor, weather less impactful (fewer innings)

Phase 5: Matchup Card Format

GAME: [Away] @ [Home] | [Date] [Time ET] | [Park]
WEATHER: [Temp]°F | Wind: [Speed]mph [Direction] | Humidity: [%] | WRF: [+/-runs]
ROOF: [Status] | PARK: Runs [factor] | HR-L [factor] | HR-R [factor]
UMPIRE: [Name] | K+[adj] | BB+[adj] | R+[adj] | ABS Overturn: [rate]

HOME SP: [Name] ([L/R]) | Status: [Confirmed/Probable/TBD]
  SPQC: [composite] | xFIP: [val] | FIP: [val] | ERA: [val] | WHIP: [val]
  K/9: [val] | BB/9: [val] | HR/9: [val] | GB%: [val]
  FTTO: wOBA [val] | K% [val] (innings 1-3 vs 4-5)
  Last 3 Starts: [date, opp, IP, ER, K, pitches] × 3
  Days Rest: [n] | Season IP: [total] | Trend: [up/stable/down]

AWAY SP: [Name] ([L/R]) | Status: [Confirmed/Probable/TBD]
  [Same fields]

HOME BATTING vs [Away SP Hand]:
  Team wOBA: [season] | vs [L/R]HP: [platoon] | Platoon Advantage: [+/- pts]
  Last 7 wOBA: [val] | Barrel Rate: [val] | K Rate: [val]
  Lineup wRC+: [total] | Delta from Projected: [+/-]
  Key Rest Day: [player name if delta > 15 wRC+]

AWAY BATTING vs [Home SP Hand]:
  [Same fields]

HOME BULLPEN: [GREEN/YELLOW/RED/BLACK]
  BAI: [score] | Closer: [Name]-[status] | Setup: [Names]-[status]
  Team Pitches Last 3 Days: [total] | High-Leverage Available: [Y/N]

AWAY BULLPEN: [Status]
  [Same fields]

SCHEDULING:
  Day Game After Night Game: [Home Y/N] [Away Y/N]
  Series Game: [1/2/3/4] | Travel: [arrived yesterday/same city/off day]

INTELLIGENCE:
  [Findings tagged CRITICAL/MODERATE/CONTEXT]

Phase 6: Dashboard


OPEN QUESTIONS FOR BOSS RULING

  1. Walk-off correction for Run Line: Opus identified that standard Poisson/NB overstates home -1.5 cover probability because home teams stop batting when leading. Should we implement a correction formula now, or build full inning-by-inning Markov simulation later?

  2. Early-season protocol: Recommended blending Steamer/ZiPS projections with actual stats in April-May (weighted 70/30 projections/actual with 3 starts, shifting to 30/70 by 10 starts). Confirm?

  3. Individual BvP data: Council says drop it (sample size too small to be reliable). Use platoon splits at team level instead. Confirm?

  4. Data history depth: How many seasons of SP game logs? 2 seasons? 3 seasons?

  5. Umpire zone impact: Track ABS Challenge System data starting 2026 but treat as CONTEXT only through May, actionable June+. Confirm?

  6. Opener/bullpen game detection: Should the system auto-detect when a team announces an "opener" (1-2 inning starter followed by bulk reliever) and treat it differently from a traditional start?


COUNCIL METADATA

Detail Value
Council date 2026-04-01
Advisory responses 5 (all completed)
Peer reviews 5 (all completed)
Strongest advisor gpt-oss (3/5 votes)
Runner-up Opus (2/5 votes)
Biggest blind spot Gemini (2/5 votes)
Full council data /home/ubuntu/edgeclaw/data/councils/2026-04-01/mlb-data-audit/
Source: ~/edgeclaw/results/panel-results/mlb-data-audit-ruling.md