Soccer Research Pipeline — Council Ruling

Date: 2026-04-01 Process: Full 5-phase council (Advisory → Anonymization → Peer Review → Chairman Synthesis → Boss Ruling) Advisors: Opus, Sonnet, Gemini 3.1 Pro, Grok 4.20 Reasoning, gpt-oss-120b Winner: Opus (3 of 5 peer review votes — strongest consensus) Status: PENDING BOSS RULING on open questions


COUNCIL SUMMARY

Where Advisors Agreed

  1. 6 edge scanners required — 1X2 (match result), Asian Handicap, Totals (over/under goals), BTTS (both teams to score), Correct Score, Double Chance
  2. Bivariate Poisson for goal modeling — correlated home/away goal distributions (not independent Poisson)
  3. xG (expected goals) is the #1 advanced metric — from FBref/StatsBomb/Opta
  4. League-specific models — different leagues have vastly different goal-scoring rates and home advantage
  5. Team form (last 5-10 matches) with EWMA — recency-weighted performance
  6. Injury/suspension tracking is critical — especially for key players (GK, strikers, CBs)
  7. Pinnacle as sharp anchor — de-vig via Shin method for true probabilities
  8. Multi-league coverage — EPL, La Liga, Bundesliga, Serie A, Ligue 1, Champions League minimum
  9. Motivation context matters — relegation battles, title races, dead rubbers, cup rotation
  10. Home advantage varies by league — Bundesliga ~0.4 goals, EPL ~0.3, some leagues higher

Where Advisors Disagreed

  1. Correct Score distribution: Some used raw bivariate Poisson probabilities, Opus used Dixon-Coles correction for low-scoring draws (0-0, 1-0, 0-1, 1-1). Council verdict: Dixon-Coles correction is essential — standard bivariate Poisson misprices low scores.
  2. Asian Handicap pricing: Some treated as simple spread, Opus noted it's a continuous market requiring interpolation between quarter-lines. Council verdict: Must handle 0.25 and 0.75 handicap splits (half-win/half-push).
  3. Data source priority: Gemini relied on generic search, others specified FBref + Club Elo + Understat + WhoScored. Council verdict: FBref for xG data (free StatsBomb), Club Elo for historical ratings, Understat for shot maps.

Strongest Arguments (from peer review)

Opus wins with the most complete and soccer-specific design:

Biggest Blind Spot

Gemini: Generic search queries, no soccer-specific data sources specified, no Dixon-Coles correction, no Asian Handicap quarter-line handling, thin database schema.

What Everyone Missed (from peer reviews)

  1. In-play market dynamics — Red cards, early goals, and injuries fundamentally change all markets mid-game. No advisor designed a live data integration layer.
  2. Betfair exchange data — Betfair exchange volume and price is a stronger sharpness signal than Pinnacle for soccer. Not mentioned by most advisors.
  3. Multi-club ownership and loan conflicts — Players on loan can't play against parent club. Same-ownership clubs may have coordinated rotation.
  4. Weather and pitch conditions — Heavy rain, frozen pitches, and altitude (e.g., La Paz in Copa Libertadores) affect goal scoring rates.
  5. Referee assignment impact — Some referees systematically award more penalties, cards, and stoppages. Need ref-specific models.

BUILD PLAN

Phase 1: Soccer Data Tables

soccer_team_stats:

soccer_team_form:

soccer_club_elo:

soccer_player_availability:

soccer_fixtures:

soccer_referee_stats:

Phase 2: Distribution Models Per Market

Market Distribution Parameters Notes
1X2 (Match Result) Bivariate Poisson with Dixon-Coles λ_home, λ_away from xG × form × home_adv × matchup; ρ correction for low scores P(H) + P(D) + P(A) = 1
Asian Handicap Derived from bivariate Poisson Goal difference distribution; quarter-line splits (0.25/0.75 = half-win/half-push) Most liquid soccer market — tightest Pinnacle margins
Totals (O/U) Derived from bivariate Poisson P(total goals > threshold); standard lines 2.5, 3.5 Combined goal distribution from bivariate model
BTTS Derived from bivariate Poisson P(home ≥ 1 AND away ≥ 1) = 1 - P(0,any) - P(any,0) + P(0,0) Highly correlated with defensive quality
Correct Score Dixon-Coles bivariate Poisson P(home=x, away=y) for all score combinations up to 5-5 Low-scoring corrections essential (ρ parameter)
Double Chance Derived from 1X2 P(1X) = P(H) + P(D); P(X2) = P(D) + P(A); P(12) = P(H) + P(A) Simple derivation — useful for lower-edge, higher-confidence bets

Phase 3: Edge Scanners (6 scanners)

Common engine:

  1. Ingest Pinnacle odds (all markets)
  2. De-vig using Shin method (3-way for 1X2, 2-way for AH/totals/BTTS)
  3. Build bivariate Poisson probability matrix with Dixon-Coles
  4. Compare to Kalshi contract prices
  5. Min edge: 4 cents after Kalshi 7% fee
  6. Min sample: 10+ league matches for both teams this season
  7. Output: {fixture_id, market, selection, model_prob, kalshi_price, edge, confidence}

Per-market unique logic:

Scanner Unique Logic
1X2 Full 3-way probabilities; draw is soccer-specific (most mispriced market in lower leagues)
Asian Handicap Quarter-line interpolation; most liquid/sharp market
Totals Weather adjustment (rain → fewer goals); team tempo (possession vs counter)
BTTS Defensive injury impact amplified; clean sheet rate is key input
Correct Score Dixon-Coles ρ correction critical; aggregate into score groups for calibration
Double Chance Derived from 1X2 scanner; useful for heavy favorites where 1X2 edge is small

Phase 4: Matchup Card Format

MATCH: [Home] vs [Away] | [League] | [Date] [Kickoff Time]
VENUE: [Stadium] | REFEREE: [Name] (Pen/G: [rate], Cards/G: [avg])
WEATHER: [Conditions] | PITCH: [Status]

HOME TEAM: [Name] | Elo: [rating] ([rank] in league)
  Form (Last 5): [W/D/L sequence] | Pts: [total/15]
  Season: GF [avg] | GA [avg] | xGF [avg] | xGA [avg]
  Home: GF [avg] | GA [avg] | xGF [avg] | xGA [avg]
  BTTS Rate: [%] | Clean Sheet Rate: [%]
  Shots/G: [avg] | SOT/G: [avg] | Possession: [avg%]
  Set Piece Goals: [% of total] | Penalty Rate: [per game]

AWAY TEAM: [Name] | Elo: [rating] ([rank] in league)
  Form (Last 5): [W/D/L sequence]
  Season: GF [avg] | GA [avg] | xGF [avg] | xGA [avg]
  Away: GF [avg] | GA [avg] | xGF [avg] | xGA [avg]
  [Same fields as home]

KEY ABSENCES:
  Home: [Player (position) — reason — importance: X/10]
  Away: [Player (position) — reason — importance: X/10]

MOTIVATION:
  Home: [Title race / Relegation / Mid-table / Cup rotation]
  Away: [Same]
  Days Since Last Match: Home [n] | Away [n]
  Midweek European: Home [Y/N] | Away [Y/N]

H2H (Last 5 meetings):
  [Date: Score, Date: Score, ...]
  Home Wins: [n] | Draws: [n] | Away Wins: [n]

INTELLIGENCE:
  [CRITICAL/MODERATE/CONTEXT findings]

Phase 5: Dashboard


OPEN QUESTIONS FOR BOSS RULING

  1. League coverage scope: Start with Big 5 European leagues + Champions League? Or include MLS, Eredivisie, Liga Portugal, Scottish Premiership?

  2. Dixon-Coles ρ parameter: Council recommends this correction for correct score pricing. Requires fitting from historical data (~2 seasons). Confirm?

  3. Asian Handicap quarter-lines: These are complex (half-win/half-push on 0.25/0.75 lines). Build full quarter-line pricing now or start with whole/half lines only?

  4. FBref xG data: Free StatsBomb xG via FBref. Should we also consider paid Opta/StatsBomb feeds for real-time, or is FBref sufficient?

  5. In-play integration: Council identified this as a gap. Should we design pre-match only (like other desks) or include basic in-play triggers (red card, early goal)?

  6. Correct Score scanner: Very high variance, low hit rate. Worth building, or focus on 1X2/AH/totals/BTTS first?


COUNCIL METADATA

Detail Value
Council date 2026-04-01
Advisory responses 5 (all completed)
Peer reviews 5 (all completed)
Strongest advisor Opus (3/5 votes — strongest consensus)
Runner-up Grok (1/5), Gemini (1/5)
Biggest blind spot Gemini (2/5 votes)
Full council data /home/ubuntu/edgeclaw/data/councils/2026-04-01/soccer-research/
Source: ~/edgeclaw/results/panel-results/soccer-research-ruling.md