MLB Research Pipeline — Council Ruling

Date: 2026-04-01 Process: Full 5-phase council (Advisory → Anonymization → Peer Review → Chairman Synthesis → Boss Ruling) Advisors: Opus, Sonnet, Gemini 3.1 Pro, Grok 4.20 Reasoning, gpt-oss-120b Winner: Opus (2 of 5 peer review votes; gpt-oss got 2, Gemini got 1) Status: PENDING BOSS RULING on open questions


COUNCIL SUMMARY

Where Advisors Agreed

  1. Starting pitcher is THE #1 variable — controls 55-70% of game outcome variance, more extreme than NHL goalies
  2. 4 separate edge scanners required — Moneyline, Run Line (-1.5), Totals (O/U), First 5 Innings (F5)
  3. SP confirmation workflow with credibility hierarchy — team official > beat reporter > fantasy aggregator > social media
  4. Bullpen availability is the #2 daily variable — pitch counts, days rest, and multi-day workload all tracked
  5. Weather is a quantifiable model input, not just context — wind speed/direction, temp, humidity, park-specific multipliers
  6. Park factors per stadium — Coors (extreme), Yankee Stadium (short porch RF), Oracle Park (marine layer), etc.
  7. Platoon splits (L/R matchups) create 30-50 point wOBA swings at team level
  8. 3-pass research schedule — morning (SP confirmation + weather), afternoon (lineups + bullpen), pre-game (final confirmation)
  9. Poisson distribution for run scoring with park/weather adjustments
  10. Matchup cards must include SP stats, bullpen availability, weather, park factors, and umpire assignment

Where Advisors Disagreed

  1. F5 distribution model: Some advisors used same Poisson as full game, Opus correctly identified that first-time-through-order (FTTO) advantage makes innings 1-3 systematically different from 4-5, requiring separate parameters. Council verdict: F5 needs SP-specific FTTO splits applied to Poisson parameters.
  2. ABS Challenge System readiness: Some advisors treated it as immediately actionable edge. Opus flagged that April-May 2026 is data collection only — need 200+ challenges before patterns emerge. Council verdict: Flag as CONTEXT only through May, actionable June+ with minimum sample.
  3. Which models run research: Gemini proposed using blind analyst models (Sonnet, Gemini) for research, creating contamination. Council verdict: Research models must be SEPARATE from blind analyst models — use Grok 4.1 Fast for search, DeepSeek R1 for extraction.
  4. Database engine: gpt-oss recommends PostgreSQL + TimescaleDB. Others assume SQLite. Council verdict: SQLite WAL for current scale, design tables to be migration-ready.

Strongest Arguments (from peer review)

Opus wins with the most production-ready and analytically deep design:

Biggest Blind Spot

Gemini: Proposed using blind analyst models for research (contamination risk), thin database schema, generic search queries, and most importantly — assigned the wrong models to research roles. Also weakest on edge scanner math specifics.

What Everyone Missed (from peer reviews)

  1. Intelligence calibration loop — No advisor designed a feedback system to measure which IntelAdjustment types are value-additive vs. noise. Need: retrospective tagging against outcomes, dynamic source credibility weights, A/B testing of research prompts.
  2. Daily lineup delta detection — SP scratches are rare; star position player rest days happen EVERY day. Need automated parser comparing official 9-man lineup vs projected lineup, flagging large wRC+ deltas.
  3. Weather void rules by market — Rain-shortened games: ML and F5 are graded, but Run Line and Totals may be voided. Pipeline should restrict RL/Totals exposure in high-rain environments.
  4. Kalshi exchange mechanics — Thin liquidity on MLB props, bid-ask spread cost, price staleness detection.
  5. Home team walk-off impact on Run Line — Home team doesn't bat bottom 9th if leading. This fundamentally alters -1.5 run line probability for home favorites vs away favorites.

BUILD PLAN

Phase 1: MLB Game Data Tables

mlb_starting_pitchers:

mlb_bullpen_availability:

mlb_game_environment:

mlb_team_game_logs:

mlb_lineups:

mlb_umpire_assignments:

mlb_park_factors:

Phase 2: Matchup Card Format

GAME: [Away] @ [Home] | [Date] [Time ET] | [Park]
WEATHER: [Temp]°F | Wind: [Speed]mph [Direction_Relative] | Humidity: [%] | Precip: [%]
ROOF: [Status] | PARK FACTOR: [Runs Factor] | UMPIRE: [Name] (K+[adj]/BB+[adj]/R+[adj])

HOME SP: [Name] ([L/R]) | Status: [Confirmed/Probable/TBD]
  ERA: [season] | FIP: [season] | xFIP: [season] | WHIP: [season]
  K/9: [rate] | BB/9: [rate] | HR/9: [rate] | GB%: [rate]
  FTTO wOBA: [rate] | FTTO K%: [rate]
  Last Start: [date] vs [team] — [IP] IP, [ER] ER, [K] K, [Pitches] pitches
  Days Rest: [n] | Season IP: [total] | Pitch Count Trend: [up/stable/down]
  vs Opp Lineup (platoon): Team wOBA vs [L/R]HP: [rate]

AWAY SP: [Name] ([L/R]) | Status: [Confirmed/Probable/TBD]
  [Same fields as above]

HOME LINEUP: [Confirmed/Projected]
  Lineup wRC+: [total] | vs SP Hand wRC+: [total]
  Key Hitters: [Top 3 by wRC+ with stats]
  Delta: [actual vs projected wRC+ — flags rest days]

AWAY LINEUP: [Confirmed/Projected]
  [Same fields as above]

HOME BULLPEN: [GREEN/YELLOW/RED]
  Closer: [Name] — [Status] | Setup: [Names] — [Status]
  Pitches Last 3 Days: [total team] | High-Leverage Available: [Y/N]

AWAY BULLPEN: [GREEN/YELLOW/RED]
  [Same fields as above]

INTELLIGENCE:
  [Research findings tagged CRITICAL/MODERATE/CONTEXT]
  [SP injury concerns, lineup changes, weather alerts, ABS data]

Phase 3: Edge Scanners (4 scanners)

Common engine:

  1. Ingest Pinnacle odds (ML, RL, Totals, F5)
  2. De-vig using Shin + Power methods
  3. Build Poisson probability curves with adjustments
  4. Compare to Kalshi contract prices
  5. Apply minimum edge (4 cents after Kalshi 7% fee) and minimum sample gates
  6. Output: {game_id, market_type, side, model_prob, kalshi_price, edge, confidence}

Per-market scanner differences:

Scanner Distribution Key Adjustments Unique Logic
Moneyline Poisson (expected runs per team) SP quality, lineup wRC+, park factor, weather, bullpen quality, platoon splits, umpire Standard win probability from Poisson run differential
Run Line (-1.5) Poisson with margin threshold Same as ML + home team walk-off constraint (no bottom 9th if leading) P(win by 2+) — home favorites have structurally different RL probability than away favorites
Totals Poisson (combined expected runs) Weather is PRIMARY driver: wind direction/speed × park multiplier, temp adjustment (+0.15 runs per 10°F above 72°F), humidity Combined run total distribution, over/under probability at each threshold
First 5 Innings Modified Poisson (SP-only) FTTO splits, SP K rate, SP walk rate, umpire zone, NO bullpen component Isolates SP performance — use innings 1-5 specific rates, remove bullpen quality entirely

Specific adjustments:

Factor ML Impact RL Impact Totals Impact F5 Impact
SP change (starter→TBD) Full recompute Full recompute Full recompute Full recompute
SP change (starter→worse SP) Adjust expected runs Same + walk-off recalc Adjust both sides Full recompute
Weather: wind out to CF 15mph+ Minimal Minimal +0.5 to +1.5 runs (park-dependent) +0.2 to +0.5 runs
Weather: wind in from CF 15mph+ Minimal Minimal -0.5 to -1.0 runs -0.2 to -0.4 runs
Temperature >85°F Slight boost offense Slight +0.3 to +0.5 runs +0.1 to +0.2 runs
Star hitter scratched (150+ wRC+) -2 to -5 cents Similar -0.1 to -0.3 runs -0.05 to -0.15 runs
Bullpen RED status -3 to -8 cents Larger impact (late-game leverage) +0.3 to +0.8 runs NO IMPACT
Umpire: tight zone Slight pitcher boost Similar -0.3 to -0.5 runs -0.2 to -0.3 runs
Day game after night game -1 to -3 cents offense Similar -0.1 to -0.3 runs Minimal

Phase 4: Research Pipeline

Research passes (3 per day):

Pass 1 — Morning (9-10 AM ET):

Pass 2 — Afternoon (1-2 PM ET):

Pass 3 — Pre-game (1 hour before first pitch):

SP Change Cascade: When SP changes from confirmed to scratched:

  1. CRITICAL alert fires immediately
  2. All 4 scanners recompute (ML, RL, Totals, F5)
  3. F5 gets COMPLETE recompute (SP is ~90% of F5 outcome)
  4. Bullpen availability rechecked (bullpen game = entire staff affected)
  5. Matchup card regenerated with new SP stats
  6. All edges recalculated and re-flagged

Phase 5: Database Schema (additional tables)

mlb_research_findings:

mlb_edge_results:

mlb_game_results:

mlb_abs_tracking:

Phase 6: Dashboard


OPEN QUESTIONS FOR BOSS RULING

  1. Early-season caution mode: Council recommends widening confidence intervals 20% and raising edge thresholds 1.5% in April (limited SP sample). Confirm?

  2. SP status handling: When SP is "probable" (not confirmed), should scanner run with discounted confidence or wait for confirmation?

  3. Weather void rules: Should we auto-restrict Run Line and Totals exposure when rain probability exceeds a threshold (e.g., >40%)? ML and F5 still grade if game reaches 5 innings.

  4. ABS Challenge System: Agree with phased approach (CONTEXT only through May, actionable June+ with 200+ challenge minimum)?

  5. Bullpen availability data source: MLB Stats API has pitch counts but not always day-of availability. Should we scrape RotoWire/RotoBaller bullpen reports, or build from raw pitch-count data?

  6. Intelligence calibration cadence: Weekly review of which IntelAdjustment types are value-additive vs. noise?


COUNCIL METADATA

Detail Value
Council date 2026-04-01
Advisory responses 5 (all completed)
Peer reviews 5 (all completed)
Strongest advisor Opus (2/5 votes)
Runner-up gpt-oss (2/5 votes)
Biggest blind spot Gemini (2/5 votes)
Full council data /home/ubuntu/edgeclaw/data/councils/2026-04-01/mlb-research/
Source: ~/edgeclaw/results/panel-results/mlb-research-ruling.md