Golf Data Audit — Council Ruling

Date: 2026-04-01 Process: Full 5-phase council (Advisory → Anonymization → Peer Review → Chairman Synthesis → Boss Ruling) Advisors: Opus, Sonnet, Gemini 3.1 Pro, Grok 4.20 Reasoning, gpt-oss-120b Winner: Grok (2 of 5 peer review votes — most split council, 4 of 5 self-voted) Status: PENDING BOSS RULING on open questions


COUNCIL SUMMARY

Where Advisors Agreed

  1. Strokes Gained decomposition is mandatory — SG:OTT, SG:APP, SG:ARG, SG:PUTT, SG:T2G, SG:Total
  2. DataGolf as primary SG data source (PGA Tour, DP World Tour, Korn Ferry)
  3. Course fit via regression — player SG vector × course demand vector
  4. EWMA for SG baselines — 24-round half-life for T2G, 40-round for putting (per Broadie)
  5. Weather wave advantage is #1 short-term edge — NWS API for US events
  6. Monte Carlo tournament simulation — 50K-100K iterations for outright/top-N probabilities
  7. 7 edge scanners — Outright, H2H, Make Cut, Top 5/10/20, Round Leader, 3-Ball, Hole in One
  8. Grass type affects putting dramatically — Bermuda/Bentgrass/Poa annua require separate SG:PUTT tracking
  9. Top 20 market is laziest-priced — known inefficiency worth exploiting
  10. LIV requires different modeling — 54 holes, no cut, shotgun starts, 48-player field

Where Advisors Disagreed

  1. API specificity: Opus provided actual API endpoints (statdata.pgatour.com, datagolf.com/api), others cited sources generically. Council verdict: Need exact endpoints for implementation.
  2. LPGA data: gpt-oss claimed "LPGA ShotLink" exists (it doesn't in public form). Opus honestly noted DataGolf doesn't cover LPGA and provided proxy SG formulas. Council verdict: LPGA needs proxy formulas, not phantom data sources.
  3. Course clustering approach: Some used k-means, others regression weights, one used manual categories. Council verdict: Regression-derived course DNA vectors (not manual clustering) — data-driven.
  4. Edge thresholds: Range from 2% to 5% per market. Council verdict: Market-specific — 2% outrights (highest variance), 3% H2H, 4% 3-ball (most concentrated).

Strongest Arguments (from peer review)

Grok wins with the most focused, implementable design:

Opus runner-up with deepest operational specifics:

Biggest Blind Spot

No backtesting or calibration framework — All advisors build models but none address how to validate them. No historical backtesting, calibration curves, Brier scores, sample size requirements, or closing line value (CLV) tracking.

What Everyone Missed (from peer reviews)

  1. Withdrawal risk pricing — Golf fields have 10-25% WD rates between initial odds and Thursday tee-off. H2H/3-ball void on WD. Outright markets shift when stars withdraw. "Dead money" from withdrawn players is exploitable mispricing. Need WD probability model.
  2. Information-timing framework — Monday/Tuesday practice round reports, fitness tests, late alternate additions create a window where model has genuine info edge over static book lines. No advisor built timing-specific intelligence cadence.
  3. Field composition uncertainty — MC simulations assume fixed field, but field changes daily Sunday through Thursday. Need to simulate field uncertainty, not just player performance.
  4. Pin position daily impact — Pin placements change hole difficulty by 0.5-1.0 strokes per hole. Not captured in any lookback.
  5. Sponsor exemptions and Monday qualifiers — Late field additions have different SG profiles. Pipeline must handle.

BUILD PLAN

Phase 1: Core Golf Data Tables

golf_players: player_id, full_name, tour (PGA/LIV/DPWT/KF/LPGA), nationality, age, owgr_rank, datagolf_rank, sg_total, sg_ott, sg_app, sg_arg, sg_putt, sg_t2g, player_sigma, active, updated_at

golf_player_sg_baselines: player_id, date, sg_component (OTT/APP/ARG/PUTT/T2G/Total), window (8rd/24rd/40rd/100rd), value, ewma_value, rounds_in_window

golf_courses: course_id, name, city, state_country, par, yardage, grass_fairway, grass_green (bermuda/bentgrass/poa), altitude_ft, roof_type, dna_ott, dna_app, dna_arg, dna_putt, avg_winning_score, avg_cut_line, course_difficulty_index

golf_tournaments: tournament_id, name, course_id, tour, start_date, purse, field_size, cut_rule (top-65/70/no-cut), format (stroke/shotgun), major (boolean), num_rounds (72/54)

golf_field_lists: tournament_id, player_id, entry_type (committed/alternate/MQ/sponsor), wd_status, wd_timestamp, wave_r1 (AM/PM), wave_r2 (PM/AM), tee_time_r1, tee_time_r2, made_cut, final_position, final_score

golf_player_course_history: player_id, course_id, appearances, rounds_played, avg_sg_total, best_finish, cuts_made, cuts_missed, wins, avg_score_to_par

golf_weather: tournament_id, round_number, wave (AM/PM), forecast_timestamp, temp_f, wind_speed_mph, wind_gust_mph, wind_direction, precip_prob, humidity_pct, wave_advantage_strokes

golf_round_scores: tournament_id, round_number, player_id, score_to_par, sg_total, sg_ott, sg_app, sg_arg, sg_putt, tee_time, wave, position_after_round, score_detail (JSON birdie/bogey/par per hole)

golf_putting_surface_splits: player_id, grass_type (bermuda/bentgrass/poa), rounds_played, sg_putt_avg, sg_putt_ewma, putt_make_pct_5to10ft, putt_make_pct_10to20ft

Phase 2: Custom Metrics

Metric Formula Purpose
SG Composite (EWMA) T2G: 24-round half-life; Putting: 40-round half-life (per Broadie) Weighted player strength
Course Fit Score dot(player_SG_vector, course_DNA_vector), z-normalized Player-course compatibility
Course History Bonus 3+ cuts: +0.1, T10: +0.15, Win: +0.20, cap +0.3 SG:Total Venue familiarity
Weather Wave Advantage (PM_wind_penalty - AM_wind_penalty) in strokes Short-term edge signal
Player Sigma Std dev of SG:Total over last 40 rounds Consistency measure
Cut Probability From MC: P(player in top-N after 36 holes) with course-adjusted SG Make-cut market pricing
Field Strength Index Sum of top-30 SG:Total in field / baseline Tournament difficulty
Putting Surface Adjustment SG:Putt_on_grass_type - SG:Putt_overall Grass-type-specific correction
Form Trend Slope of SG:Total over last 8 rounds (positive = improving) Momentum signal
WD Probability Logistic: age, injury flag, recent WD history, travel distance Withdrawal risk pricing

Phase 3: Distribution Models

Phase 4: 7 Edge Scanners

Scanner Min Edge De-vig Method Unique Logic
Outright (150-way) 2% Power method Course fit + weather + form; Kelly 0.25x
H2H 3% Multiplicative 2-way Wave differential if different waves; WD void risk; Kelly 0.5x
Make the Cut 3% Multiplicative 2-way Consistency (low sigma) > peak SG; course history weight; Kelly 0.4x
Top 5/10/20 3% Power method (20-way) Top 20 laziest-priced; Kelly 0.35x
Round Leader 4% Power method Wave weather DOMINANT; single-round sim; Kelly 0.3x
3-Ball 4% Power method (3-way) Same-group shared conditions; highest correlation; Kelly 0.4x
Hole in One 5% Multiplicative 2-way Par-3 difficulty × field size × ace rate; high variance; Kelly 0.15x

Phase 5: Dashboard


OPEN QUESTIONS FOR BOSS RULING

  1. DataGolf subscription: Required for comprehensive SG data across PGA/DPWT/KF. Cost?
  2. LPGA coverage: No reliable SG source. Build proxy formulas or skip LPGA?
  3. LIV-specific model: 54 holes, no cut, shotgun starts, 48-player field. Worth separate build?
  4. WD probability model: Build logistic regression for withdrawal risk pricing?
  5. Historical depth: How many seasons of SG data to backfill?
  6. Putting surface splits: Track SG:Putting separately by grass type (bermuda/bentgrass/poa)?
  7. Hole in One scanner: Very high variance novelty market. Build or skip?

COUNCIL METADATA

Detail Value
Council date 2026-04-01
Advisory responses 5 (all completed)
Peer reviews 5 (all completed)
Strongest advisor Grok (2/5 votes — 1 genuine cross-vote)
Runner-up Opus (1/5 self-vote, but strongest review insights)
Biggest blind spot No backtesting/calibration framework
Full council data /home/ubuntu/edgeclaw/data/councils/2026-04-01/golf-data-audit/
Source: ~/edgeclaw/results/panel-results/golf-data-audit-ruling.md