MLB Research Pipeline — Council Ruling

Date: 2026-04-01 Process: Full 5-phase council (Advisory → Anonymization → Peer Review → Chairman Synthesis → Boss Ruling) Advisors: Opus, Sonnet, Gemini 3.1 Pro, Grok 4.20 Reasoning, gpt-oss-120b Winner: Opus (2 of 5 peer review votes; gpt-oss got 2, Gemini got 1) Status: PENDING BOSS RULING on open questions

COUNCIL SUMMARY

Where Advisors Agreed

Starting pitcher is THE #1 variable — controls 55-70% of game outcome variance, more extreme than NHL goalies
4 separate edge scanners required — Moneyline, Run Line (-1.5), Totals (O/U), First 5 Innings (F5)
SP confirmation workflow with credibility hierarchy — team official > beat reporter > fantasy aggregator > social media
Bullpen availability is the #2 daily variable — pitch counts, days rest, and multi-day workload all tracked
Weather is a quantifiable model input, not just context — wind speed/direction, temp, humidity, park-specific multipliers
Park factors per stadium — Coors (extreme), Yankee Stadium (short porch RF), Oracle Park (marine layer), etc.
Platoon splits (L/R matchups) create 30-50 point wOBA swings at team level
3-pass research schedule — morning (SP confirmation + weather), afternoon (lineups + bullpen), pre-game (final confirmation)
Poisson distribution for run scoring with park/weather adjustments
Matchup cards must include SP stats, bullpen availability, weather, park factors, and umpire assignment

Where Advisors Disagreed

F5 distribution model: Some advisors used same Poisson as full game, Opus correctly identified that first-time-through-order (FTTO) advantage makes innings 1-3 systematically different from 4-5, requiring separate parameters. Council verdict: F5 needs SP-specific FTTO splits applied to Poisson parameters.
ABS Challenge System readiness: Some advisors treated it as immediately actionable edge. Opus flagged that April-May 2026 is data collection only — need 200+ challenges before patterns emerge. Council verdict: Flag as CONTEXT only through May, actionable June+ with minimum sample.
Which models run research: Gemini proposed using blind analyst models (Sonnet, Gemini) for research, creating contamination. Council verdict: Research models must be SEPARATE from blind analyst models — use Grok 4.1 Fast for search, DeepSeek R1 for extraction.
Database engine: gpt-oss recommends PostgreSQL + TimescaleDB. Others assume SQLite. Council verdict: SQLite WAL for current scale, design tables to be migration-ready.

Strongest Arguments (from peer review)

Opus wins with the most production-ready and analytically deep design:

4-tier bullpen availability system (GREEN/YELLOW/RED/BLACK) with specific pitch-count thresholds
Park-specific wind sensitivity multipliers (Wrigley 1.5x, Coors 0.8x, Oracle Park 1.2x, domes 0.0x)
Correctly noted humid air is LESS dense than dry air (physics error most models make)
Early-season caution mode: widen confidence intervals 20%, raise edge thresholds 1.5% in April
Per-game kill switch taxonomy (4 levels: per-game, per-market, weather-triggered, daily)
ABS Challenge phased rollout (collect → 200 sample → actionable)
Wind direction encoded as field-relative (OUT_TO_CF / IN_FROM_CF / CROSSWIND_LR)

Biggest Blind Spot

Gemini: Proposed using blind analyst models for research (contamination risk), thin database schema, generic search queries, and most importantly — assigned the wrong models to research roles. Also weakest on edge scanner math specifics.

What Everyone Missed (from peer reviews)

Intelligence calibration loop — No advisor designed a feedback system to measure which IntelAdjustment types are value-additive vs. noise. Need: retrospective tagging against outcomes, dynamic source credibility weights, A/B testing of research prompts.
Daily lineup delta detection — SP scratches are rare; star position player rest days happen EVERY day. Need automated parser comparing official 9-man lineup vs projected lineup, flagging large wRC+ deltas.
Weather void rules by market — Rain-shortened games: ML and F5 are graded, but Run Line and Totals may be voided. Pipeline should restrict RL/Totals exposure in high-rain environments.
Kalshi exchange mechanics — Thin liquidity on MLB props, bid-ask spread cost, price staleness detection.
Home team walk-off impact on Run Line — Home team doesn't bat bottom 9th if leading. This fundamentally alters -1.5 run line probability for home favorites vs away favorites.

BUILD PLAN

Phase 1: MLB Game Data Tables

mlb_starting_pitchers:

game_id, date, team, pitcher_id, pitcher_name
status (TBD/probable/confirmed/scratched)
status_source, status_timestamp, prev_status
hand (L/R), season_era, season_fip, season_xfip, season_whip
k_per_9, bb_per_9, hr_per_9, gb_rate
ftto_woba (first time through order), ftto_k_rate
last_start_date, last_start_pitches, days_rest
season_ip, pitch_count_trend
cascade_fired (boolean — prevents duplicate recomputations)

mlb_bullpen_availability:

team, date, pitcher_id, pitcher_name, role (closer/setup/middle/long)
status (GREEN/YELLOW/RED/BLACK)
yesterday_pitches, two_days_ago_pitches, three_days_ago_pitches
appearances_last_7d, pitches_last_7d
high_leverage_available (boolean)

mlb_game_environment:

game_id, date, park_id, park_name
roof_status (open/closed/retractable_open/retractable_closed/dome)
temperature_f, humidity_pct, wind_speed_mph
wind_direction_relative (OUT_TO_CF/IN_FROM_CF/CROSSWIND_LR/CROSSWIND_RL/CALM)
wind_sensitivity_multiplier (park-specific: Wrigley 1.5, Coors 0.8, etc.)
altitude_ft, air_density_adjustment
precip_probability, weather_source, weather_timestamp

mlb_team_game_logs:

team, date, game_id, opponent, home_away
runs_scored, runs_allowed, hits, errors
team_woba, team_ops, team_wrc_plus
vs_lhp_woba, vs_rhp_woba (platoon splits)

mlb_lineups:

game_id, date, team, batting_order (1-9)
player_id, player_name, position
confirmed (boolean), source, timestamp
season_wrc_plus, vs_hand_wrc_plus (vs SP hand)
lineup_wrc_plus_total (sum of 9 hitters)
projected_wrc_plus_total (what was expected before lineup card)
delta_wrc_plus (actual - projected, flags rest days)

mlb_umpire_assignments:

game_id, date, umpire_id, umpire_name
career_k_per_game_above_avg, career_bb_per_game_above_avg
career_runs_per_game_above_avg
season_k_rate, season_bb_rate
abs_challenge_overturn_rate (2026 new)

mlb_park_factors:

park_id, park_name, team
runs_factor, hr_factor, hits_factor
lhb_hr_factor, rhb_hr_factor (asymmetric parks)
dimensions_lf, dimensions_cf, dimensions_rf
altitude_ft, roof_type

Phase 2: Matchup Card Format

GAME: [Away] @ [Home] | [Date] [Time ET] | [Park]
WEATHER: [Temp]°F | Wind: [Speed]mph [Direction_Relative] | Humidity: [%] | Precip: [%]
ROOF: [Status] | PARK FACTOR: [Runs Factor] | UMPIRE: [Name] (K+[adj]/BB+[adj]/R+[adj])

HOME SP: [Name] ([L/R]) | Status: [Confirmed/Probable/TBD]
  ERA: [season] | FIP: [season] | xFIP: [season] | WHIP: [season]
  K/9: [rate] | BB/9: [rate] | HR/9: [rate] | GB%: [rate]
  FTTO wOBA: [rate] | FTTO K%: [rate]
  Last Start: [date] vs [team] — [IP] IP, [ER] ER, [K] K, [Pitches] pitches
  Days Rest: [n] | Season IP: [total] | Pitch Count Trend: [up/stable/down]
  vs Opp Lineup (platoon): Team wOBA vs [L/R]HP: [rate]

AWAY SP: [Name] ([L/R]) | Status: [Confirmed/Probable/TBD]
  [Same fields as above]

HOME LINEUP: [Confirmed/Projected]
  Lineup wRC+: [total] | vs SP Hand wRC+: [total]
  Key Hitters: [Top 3 by wRC+ with stats]
  Delta: [actual vs projected wRC+ — flags rest days]

AWAY LINEUP: [Confirmed/Projected]
  [Same fields as above]

HOME BULLPEN: [GREEN/YELLOW/RED]
  Closer: [Name] — [Status] | Setup: [Names] — [Status]
  Pitches Last 3 Days: [total team] | High-Leverage Available: [Y/N]

AWAY BULLPEN: [GREEN/YELLOW/RED]
  [Same fields as above]

INTELLIGENCE:
  [Research findings tagged CRITICAL/MODERATE/CONTEXT]
  [SP injury concerns, lineup changes, weather alerts, ABS data]

Phase 3: Edge Scanners (4 scanners)

Common engine:

Ingest Pinnacle odds (ML, RL, Totals, F5)
De-vig using Shin + Power methods
Build Poisson probability curves with adjustments
Compare to Kalshi contract prices
Apply minimum edge (4 cents after Kalshi 7% fee) and minimum sample gates
Output: {game_id, market_type, side, model_prob, kalshi_price, edge, confidence}

Per-market scanner differences:

Scanner	Distribution	Key Adjustments	Unique Logic
Moneyline	Poisson (expected runs per team)	SP quality, lineup wRC+, park factor, weather, bullpen quality, platoon splits, umpire	Standard win probability from Poisson run differential
Run Line (-1.5)	Poisson with margin threshold	Same as ML + home team walk-off constraint (no bottom 9th if leading)	P(win by 2+) — home favorites have structurally different RL probability than away favorites
Totals	Poisson (combined expected runs)	Weather is PRIMARY driver: wind direction/speed × park multiplier, temp adjustment (+0.15 runs per 10°F above 72°F), humidity	Combined run total distribution, over/under probability at each threshold
First 5 Innings	Modified Poisson (SP-only)	FTTO splits, SP K rate, SP walk rate, umpire zone, NO bullpen component	Isolates SP performance — use innings 1-5 specific rates, remove bullpen quality entirely

Specific adjustments:

Factor	ML Impact	RL Impact	Totals Impact	F5 Impact
SP change (starter→TBD)	Full recompute	Full recompute	Full recompute	Full recompute
SP change (starter→worse SP)	Adjust expected runs	Same + walk-off recalc	Adjust both sides	Full recompute
Weather: wind out to CF 15mph+	Minimal	Minimal	+0.5 to +1.5 runs (park-dependent)	+0.2 to +0.5 runs
Weather: wind in from CF 15mph+	Minimal	Minimal	-0.5 to -1.0 runs	-0.2 to -0.4 runs
Temperature >85°F	Slight boost offense	Slight	+0.3 to +0.5 runs	+0.1 to +0.2 runs
Star hitter scratched (150+ wRC+)	-2 to -5 cents	Similar	-0.1 to -0.3 runs	-0.05 to -0.15 runs
Bullpen RED status	-3 to -8 cents	Larger impact (late-game leverage)	+0.3 to +0.8 runs	NO IMPACT
Umpire: tight zone	Slight pitcher boost	Similar	-0.3 to -0.5 runs	-0.2 to -0.3 runs
Day game after night game	-1 to -3 cents offense	Similar	-0.1 to -0.3 runs	Minimal

Phase 4: Research Pipeline

Research passes (3 per day):

Pass 1 — Morning (9-10 AM ET):

"[Team] probable pitcher today [date]"
"[Team] starting lineup [date]"
"[Team] injury report [date]"
"[Park] weather forecast [date] game time"
MLB probable pitchers page (RotoWire, RotoBaller)
NWS API call for outdoor parks (wind at game time)

Pass 2 — Afternoon (1-2 PM ET):

"[Team] confirmed lineup [date]"
"[Team] bullpen availability [date]"
"[Player] injury update [date]" (for flagged players)
"[Team] morning stretch [date]" (lineup confirmation)
Compare actual lineup vs projected → flag delta

Pass 3 — Pre-game (1 hour before first pitch):

Final SP confirmation check
Final weather check (NWS API)
Late scratch monitoring
Umpire crew assignment verification
Kalshi price recheck for staleness detection

SP Change Cascade: When SP changes from confirmed to scratched:

CRITICAL alert fires immediately
All 4 scanners recompute (ML, RL, Totals, F5)
F5 gets COMPLETE recompute (SP is ~90% of F5 outcome)
Bullpen availability rechecked (bullpen game = entire staff affected)
Matchup card regenerated with new SP stats
All edges recalculated and re-flagged

Phase 5: Database Schema (additional tables)

mlb_research_findings:

game_id, date, finding_id, finding_type
severity (CRITICAL/MODERATE/CONTEXT)
source, source_credibility (1-5)
raw_text, structured_adjustment (JSON)
affects_markets (array: ML/RL/TOT/F5)
timestamp, model_used

mlb_edge_results:

game_id, date, market_type (ML/RL/TOT/F5)
side (home/away/over/under)
model_prob, pinnacle_devigged_prob, kalshi_price
edge_cents, confidence
sp_status_at_calc, weather_at_calc
timestamp, staleness_flag

mlb_game_results:

game_id, date, home_team, away_team
final_score_home, final_score_away
f5_score_home, f5_score_away
rain_delay (boolean), rain_shortened (boolean)
actual_sp_home, actual_sp_away (for SP scratch tracking)

mlb_abs_tracking:

game_id, date, umpire_id
total_challenges, successful_challenges
challenge_by_inning, challenge_by_count
pitch_location_data (JSON)
leverage_index_at_challenge

Phase 6: Dashboard

Daily slate overview: All games with SP status, weather flags, edge counts per market type
Game drill-down: Full matchup card + all 4 market edges + research findings
SP status board: All 30 teams' confirmed/probable/TBD starters with color coding
Weather dashboard: Outdoor parks with wind/temp/precip impact estimates
Bullpen tracker: Team-by-team availability (GREEN/YELLOW/RED/BLACK)
Edge alerts: Sorted by magnitude, filterable by market type, with staleness timestamps
Lineup delta alerts: Games where actual lineup deviates significantly from projected
P&L tracker: Performance by market type, by edge bucket, Brier scores
ABS tracking: Challenge rates by umpire (data collection phase)

OPEN QUESTIONS FOR BOSS RULING

Early-season caution mode: Council recommends widening confidence intervals 20% and raising edge thresholds 1.5% in April (limited SP sample). Confirm?
SP status handling: When SP is "probable" (not confirmed), should scanner run with discounted confidence or wait for confirmation?
Weather void rules: Should we auto-restrict Run Line and Totals exposure when rain probability exceeds a threshold (e.g., >40%)? ML and F5 still grade if game reaches 5 innings.
ABS Challenge System: Agree with phased approach (CONTEXT only through May, actionable June+ with 200+ challenge minimum)?
Bullpen availability data source: MLB Stats API has pitch counts but not always day-of availability. Should we scrape RotoWire/RotoBaller bullpen reports, or build from raw pitch-count data?
Intelligence calibration cadence: Weekly review of which IntelAdjustment types are value-additive vs. noise?

COUNCIL METADATA

Detail	Value
Council date	2026-04-01
Advisory responses	5 (all completed)
Peer reviews	5 (all completed)
Strongest advisor	Opus (2/5 votes)
Runner-up	gpt-oss (2/5 votes)
Biggest blind spot	Gemini (2/5 votes)
Full council data	`/home/ubuntu/edgeclaw/data/councils/2026-04-01/mlb-research/`

Source: ~/edgeclaw/results/panel-results/mlb-research-ruling.md