WNBA Data Audit — Council Ruling

Date: 2026-04-01 Process: Full 5-phase council (Advisory → Anonymization → Peer Review → Chairman Synthesis → Boss Ruling) Advisors: Opus, Sonnet, Gemini 3.1 Pro, Grok 4.20 Reasoning, gpt-oss-120b Winner: Gemini (2 of 5 genuine votes — from Grok and gpt-oss) Status: PENDING BOSS RULING on open questions


COUNCIL SUMMARY

Where Advisors Agreed

  1. Starting from zero — no WNBA data in DB at all, everything must be built
  2. 40-game season = small samples — Bayesian shrinkage mandatory, priors from prior seasons
  3. Player availability is dominant signal — smaller roster = higher individual impact per absence
  4. Four Factors model adapted for WNBA — eFG%, TOV%, ORB%, FT rate, with WNBA-specific weights
  5. Home court advantage is real but smaller — ~3-4 points (vs NBA's 3.5-4.5)
  6. Overseas performance data is intelligence source — offseason leagues inform preseason projections
  7. Commissioner's Cup changes team motivation — need to track as separate competition flag
  8. Her Hoop Stats is gold standard WNBA source — better than raw Basketball Reference for analytics
  9. WNBA Stats API mirrors NBA structure — similar endpoints, transferable scraping code
  10. Expansion teams need prior-based estimation — wide confidence intervals, roster-component approach

Where Advisors Disagreed

  1. Score distribution model: Some proposed normal, others Skellam (discrete), one proposed log-normal. Council verdict: Skellam for spread/totals (WNBA's lower possession count makes discrete distribution more appropriate).
  2. Overseas adjustment factors: Opus provided specific conversion rates (EuroLeague 0.85x, Turkish 0.75x, WNBL 0.70x). Others left vague. Council verdict: Use Opus's tiered conversion factors as starting point, refine with data.
  3. Per-minute normalization: Gemini proposed Per-40 (WNBA standard), others used Per-36 or raw. Council verdict: Per-40 minutes for all WNBA stats.
  4. Charter flight impact: Gemini identified 2024 charter transition as structural break invalidating pre-2024 rest/travel data. Council verdict: Flag 2024+ as new era for travel models.

Strongest Arguments (from peer review)

Gemini wins with the most structurally aware analysis:

Opus strong runner-up (endorsed by Sonnet):

Biggest Blind Spot

Player prop architecture completely absent — All advisors focused on game-level markets (ML, spread, totals) while ignoring player props, which are the most inefficient WNBA market. With 12-woman rosters, predictable 7-8 player rotations, and extreme offensive concentration in stars, player prop pricing is easier to beat than game-level markets. Need Minutes Projection Engine accounting for foul trouble (referee crew correlated) and blowout risk.

What Everyone Missed (from peer reviews)

  1. Hard Cap + Emergency Hardship Exception — WNBA's hard cap means teams can't add players without dropping below 10 healthy. Teams routinely play with 9-10 players, forcing starters to 38-40 minutes. This creates predictable late-game fatigue affecting 4th quarter pace, defense, and 2nd-half totals. No advisor modeled this roster constraint.
  2. Live in-game betting — WNBA in-game lines are the softest in North American sports. No advisor discussed real-time play-by-play modeling, foul trouble tracking, or live lineup detection.
  3. Arena-sharing conflicts — Teams sharing venues with NBA teams get suboptimal game times, court setups, or alternative venues during NBA playoff overlap. Affects practice access, shootarounds, and attendance.
  4. 2025 three-point line structural break — All pre-2025 three-point data is non-comparable (line moved to NBA distance).
  5. Charter flight 2024 structural break — Pre-2024 travel/rest data overvalues HCA.

BUILD PLAN

Phase 1: Core Data Tables

wnba_teams: team_id, name, abbreviation, conference, arena, shares_arena_with, charter_flight, expansion_year, coach_id, cap_space, active wnba_players: player_id, name, team_id, position, height, age, experience, salary, overseas_team, national_team, star_tier (1-3), active wnba_games: game_id, season, date, time, home_team, away_team, home_score, away_score, attendance, commissioner_cup, era (pre/post_charter, pre/post_3pt_change) wnba_player_game_stats: stat_id, game_id, player_id, minutes, pts, reb, ast, stl, blk, tov, fg_made, fg_att, 3p_made, 3p_att, ft_made, ft_att, plus_minus, usage_rate, per_40_pts, per_40_reb, per_40_ast wnba_team_game_stats: stat_id, game_id, team_id, pace, off_rtg, def_rtg, net_rtg, efg_pct, tov_pct, orb_pct, ft_rate, active_roster_count wnba_injuries: injury_id, player_id, status, reason, first_reported, last_updated, games_missed, hardship_eligible wnba_schedule: game_id, rest_days_home, rest_days_away, travel_distance, timezone_change, arena_conflict_flag wnba_rosters: roster_id, team_id, player_id, joined_date, left_date, transaction_type, hardship_exception wnba_referees: ref_id, game_id, referee_name, home_cover_pct, over_pct, avg_fouls_called wnba_overseas: overseas_id, player_id, league, team, season, games, stats_json, wnba_equiv_factor wnba_draft: draft_id, year, round, pick, player_name, college, mock_consensus_pick, workout_reports wnba_awards: award_id, season, award_type, player_id, votes_or_shares, rank wnba_odds: odds_id, game_id, market_type, book, selection, odds, timestamp, pinnacle_covered

Phase 2: Custom Metrics

Metric Formula Notes
Star Impact (On/Off) Team net rating WITH - WITHOUT player Per-40 minute basis
Bayesian Team Rating Shrunk four-factors with prior-season data WNBA-specific weights
Overseas Import Score Stats × league_conversion_factor (0.65-0.85) Preseason projections
Roster Depth Index Sum of player impact ratings for players 6-12 Hardship vulnerability
HCA (Charter Era) 2024+ only data, venue-specific crowd factor Ignore pre-2024 travel
3PT Era Adjustment Pre/post 2025 three-point line structural break Separate models
Minutes Fatigue Model f(active_roster_count, minutes_played, game_time) Hard cap effect
Sample Confidence Games / stability_threshold (15-20) Scale edge thresholds
Pinnacle Coverage Flag Binary: is Pinnacle covering this game? Falls back to synthetic line

Phase 3: 7 Edge Scanners

Scanner Min Edge Unique Logic
Moneyline 4% Four-factors × availability × HCA (charter era); 5 cent min on Pinnacle
Spread 5% Star absence margin shift; Skellam distribution
Totals 5% Pace matchup × rest × roster count (hardship); Skellam
Series 6% MC simulation; best-of-3 or best-of-5; home court pattern
MVP 8% Stats + narrative + voting history; Dirichlet-multinomial
ROY 10% Draft position × usage opportunity × team context
Draft #1 12% Mock consensus × GM signals × workout reports

Phase 4: Dashboard


OPEN QUESTIONS FOR BOSS RULING

  1. Her Hoop Stats subscription: Required for best WNBA analytics. Cost?
  2. Player props: Build prop architecture or focus on game-level markets first?
  3. Live in-game betting: Build real-time model for WNBA in-game lines?
  4. Historical depth: How many seasons to backfill? (Note: pre-2024 charter and pre-2025 3PT breaks)
  5. Overseas league coverage: Which leagues to track (EuroLeague Women, Turkish, WNBL, Chinese)?
  6. Pinnacle gap strategy: When Pinnacle doesn't cover a game, use synthetic line or skip?
  7. Commissioner's Cup: Model as separate competition or flag only?

COUNCIL METADATA

Detail Value
Council date 2026-04-01
Advisory responses 5 (all completed)
Peer reviews 5 (all completed)
Strongest advisor Gemini (2/5 genuine votes — from Grok and gpt-oss)
Runner-up Opus (1/5 genuine from Sonnet — deepest domain specifics)
Biggest blind spot Player prop architecture absent
Full council data /home/ubuntu/edgeclaw/data/councils/2026-04-01/wnba-data-audit/
Source: ~/edgeclaw/results/panel-results/wnba-data-audit-ruling.md