Soccer Data Audit — Council Ruling
Date: 2026-04-01
Process: Full 5-phase council (Advisory → Anonymization → Peer Review → Chairman Synthesis → Boss Ruling)
Advisors: Opus, Sonnet, Gemini 3.1 Pro, Grok 4.20 Reasoning, gpt-oss-120b
Winner: Opus (5 of 5 peer review votes — UNANIMOUS)
Status: PENDING BOSS RULING on open questions
COUNCIL SUMMARY
Where Advisors Agreed
- xG (expected goals) is the #1 data requirement — from FBref (free StatsBomb data)
- Club Elo ratings for team strength measurement (free API)
- Bivariate Poisson with Dixon-Coles as the goal-scoring distribution model
- League-specific models — different goal rates, home advantage, and variance per league
- Multi-league coverage — Big 5 European leagues + Champions League minimum
- Injury/suspension data from Transfermarkt (free, comprehensive)
- Match-level weather and pitch data for outdoor fixtures
- Team form tracking with EWMA at home/away splits
- 6 market scanners — 1X2, Asian Handicap, Totals, BTTS, Correct Score, Double Chance
- Referee data matters (penalty rates, card rates, stoppage time patterns)
Where Advisors Disagreed
- xG source: gpt-oss recommended paid Opta/StatsBomb feeds. Opus specified FBref (free). Council verdict: FBref for current scale, evaluate paid feeds if scale demands.
- Database engine: gpt-oss recommended PostgreSQL, Opus used SQLite. Council verdict: SQLite WAL.
- Historical depth: Range from 2 to 5 seasons. Council verdict: 3 seasons for model training, current season for live use.
Strongest Arguments (from peer review)
Opus wins UNANIMOUSLY — every reviewer selected Opus as strongest:
- Most complete SQLite schema with exact column types and foreign keys
- Dixon-Coles implementation details (ρ parameter estimation, iterative optimization)
- League-specific home advantage coefficients (Bundesliga 0.40, EPL 0.30, Serie A 0.35)
- Asian Handicap quarter-line pricing (0.25/0.75 splits correctly handled)
- Correct Score grid with Dixon-Coles low-score correction
- Squad rotation detection for midweek European fixtures
- Manager tactical system classification
- EWMA with separate home/away decay rates
- Comprehensive matchup card with all relevant soccer data
Biggest Blind Spot
Gemini: Thin schema, generic data sources, no Dixon-Coles correction, no Asian Handicap quarter-line handling, no league-specific parameterization.
What Everyone Missed (from peer reviews)
- Transfer window impact — January/summer windows create structural breaks in team quality. Need transfer window flag and model reset.
- Betting exchange data (Betfair) — Exchange volume and price movements are sharper signals than bookmaker odds for soccer.
- VAR implementation differences by league — Some leagues use VAR more aggressively. Need league-specific VAR impact parameters.
- Multi-club ownership regulations — UEFA and domestic rules affect team selection in certain matchups.
- Altitude and travel for non-European leagues — Copa Libertadores, MLS Western Conference, etc.
BUILD PLAN
Phase 1: Core Soccer Data Tables
soccer_match_logs: fixture_id, date, league, season, home_team, away_team, home_goals, away_goals, home_xg, away_xg, home_shots, away_shots, home_sot, away_sot, home_possession, away_possession, home_corners, away_corners, referee_id
soccer_team_baselines: team_id, league, date, stat_type, last_3/5/10, season_avg, ewma_home, ewma_away, xg_for_ewma, xg_against_ewma
soccer_club_elo: team_id, date, elo_rating, elo_change, league_rank
soccer_player_availability: team_id, date, player_id, player_name, position, status, injury_type, expected_return, importance_score
soccer_fixtures_context: fixture_id, date, league, weather, pitch_status, referee_id, match_importance_home/away, days_rest_home/away, midweek_european_home/away
soccer_referee_stats: referee_id, league, avg_fouls, avg_cards, penalty_rate, var_overturn_rate, avg_added_time
soccer_league_params: league, season, avg_goals_per_game, home_advantage_coefficient, draw_rate, btts_rate, dixon_coles_rho
Phase 2: Distribution Models
- Bivariate Poisson with Dixon-Coles (ρ correction for 0-0, 1-0, 0-1, 1-1)
- League-specific λ parameters
- Home advantage multiplicative adjustment per league
- All 6 markets derived from same underlying goal distribution
Phase 3: 6 Edge Scanners
- 1X2, Asian Handicap (quarter-line), Totals, BTTS, Correct Score, Double Chance
- Pinnacle de-vig via Shin method (3-way)
- Min edge 4 cents, min sample 10 league matches
Phase 4: Dashboard
- Multi-league fixture board with edge counts
- Match drill-down with matchup card + 6 market edges + score probability grid
- Injury board, referee assignments, form tables, xG rankings
OPEN QUESTIONS FOR BOSS RULING
- League scope: Big 5 + CL only, or expand to MLS, Eredivisie, Liga Portugal, etc.?
- Dixon-Coles fitting: Needs ~2 seasons historical data per league. Confirm 3-season backfill?
- FBref scraping: Free but requires web scraping. Build scraper or find API wrapper?
- Betfair exchange data: Worth integrating as sharpness signal?
- Transfer window model resets: Auto-adjust team baselines after January/summer windows?
- Correct Score scanner: High variance — build now or defer?
COUNCIL METADATA
| Detail |
Value |
| Council date |
2026-04-01 |
| Advisory responses |
5 (all completed) |
| Peer reviews |
5 (all completed) |
| Strongest advisor |
Opus (5/5 votes — UNANIMOUS) |
| Runner-up |
N/A |
| Biggest blind spot |
Gemini |
| Full council data |
/home/ubuntu/edgeclaw/data/councils/2026-04-01/soccer-data-audit/ |
Source: ~/edgeclaw/results/panel-results/soccer-data-audit-ruling.md