NHL - NHL Research Pipeline Council Ruling

=== COUNCIL RESULTS ===

QUESTION: Design the complete research and intelligence pipeline for the NHL betting desk.

DATE: 2026-04-01 ADVISORS: Opus, Sonnet, Gemini 3.1 Pro, Grok 4.20 Reasoning, gpt-oss-120b PROCESS: Full council — 5 advisory, anonymized peer review, 5 independent reviews ANONYMIZATION MAP: A=Gemini, B=Opus, C=Grok, D=gpt-oss, E=Sonnet

PEER REVIEW VOTES

Strongest Response:

Advisor B (Opus): 4 votes (Opus, Sonnet, Gemini, Grok reviews)
Advisor E (Sonnet): 1 vote (gpt-oss review)

Biggest Blind Spot:

Advisor C (Grok): 2 votes — coach-vs-beat-reporter conflict resolution is backwards
Advisor D (gpt-oss): 2 votes — assumes human "Intelligence Analyst" staff that doesn't exist, backward credibility hierarchy
Advisor B (Opus): 1 vote — no explicit backup goalie quality tiers

CONSENSUS (All 5 Agreed)

Goalie confirmation is a first-class subsystem — not just another injury check. It needs its own cascade, conflict resolution, and dual-scenario protocol.
Beat reporter visual confirmation outweighs coach verbal statements — NHL coaches deliberately mislead about goalies. This is cultural, not accidental.
DailyFaceoff is the primary structured source but it LAGS beat reporters by 5-15 minutes.
Structured IntelAdjustment JSON objects for all findings — not free-text parsing.
Analysts stay completely blind — they see prose matchup cards only. No market data, no IntelAdjustment objects, no model outputs.
Kill switch via Telegram — freezes auto-adjustments but continues collecting findings.
Full audit trail — every adjustment stores before/after values with source attribution.
Late CRITICAL findings (post-analyst-submission) go directly to verdict layer (Opus), not back to analysts.

KEY DIVERGENCES

1. Additive vs Multiplicative Adjustments

Opus (WINNER): Multiplicative — final_xGF = base * goalie_factor * injury_factor * fatigue_factor. With hard floor (60% of baseline xGF) and ceiling (150% of baseline xGA).
Sonnet: Additive with separate caps per category.
Grok/gpt-oss: Additive or mixed.
Peer review consensus (4/5): Multiplicative is mathematically correct for independent effects. Additive can produce absurd outputs when stacking.

2. Coach vs Beat Reporter Credibility

Opus/Sonnet/Gemini: Beat reporter visual confirmation > coach verbal statement. "Coaches lie about goalies" is a documented NHL norm.
Grok: Coach quote wins unless contradicted — flagged as backwards by 2 reviewers.
gpt-oss: Coach ranked as "Tier S" above all others — flagged as wrong by 2 reviewers.

3. Backup Goalie Quality

Sonnet: 4-tier backup quality system (Tier A: strong backup, Tier B: modest downgrade, Tier C: significant, Tier D: emergency/AHL).
Opus: Mentioned backup delta but no explicit tier system.
Others: Superficial treatment.
Resolution: Adopt Sonnet's 4-tier system.

4. Research Model Stack

Opus: Grok 4.1 Fast (search) → Gemini Flash (extraction) → DeepSeek R1 (contradiction) → Sonnet (plausibility gate)
Sonnet: Sonar Deep Research (morning) + Grok 4.1 Fast (afternoon/pregame) → Gemini Flash/Haiku (extraction) → DeepSeek R1 (validation)
Resolution: Opus's 4-step validation pipeline is more thorough. Use Grok 4.1 Fast for all search passes (speed + cost), Gemini Flash for extraction, DeepSeek R1 for contradiction detection.

STRONGEST ARGUMENTS (From Peer Review)

Opus's multiplicative adjustment framework — 4/5 reviewers endorsed this. The worked example and floor/ceiling bounds make it production-ready.
Opus's post-mortem feedback loop — Only advisor to design weekly tracking of source reliability that updates credibility tiers. Beat reporters who are consistently wrong get downgraded.
Opus's dual-scenario GTD handling — Run two parallel models (Goalie A scenario + Goalie B scenario). Only bet when edge exists in BOTH. Most operationally sound.
Sonnet's backup goalie quality tiers — 4-level system with specific SV% thresholds. Critical for the most common NHL research event (B2B backup start).
Sonnet's T-15 minute hard cutoff — No new bet execution within 15 minutes of puck drop when CRITICAL finding fires. Stricter than NBA, appropriate for NHL.
Opus's "time-based credibility decay does NOT apply in NHL" — Subtle but correct. The entire intelligence window is same-day (9 AM to puck drop). Recency is a tiebreaker, not a decay function.

WHAT EVERYONE MISSED (Peer Review Findings)

1. Cost & Latency Budget (Opus review)

On a 15-game NHL night: 30 teams × 7-8 queries × 3-4 passes = 400-900 web search API calls per day plus extraction/validation LLM calls. No advisor calculated total API cost or whether queries can complete within pass windows. Need a priority system: focus research spend on games with active Kalshi markets.

Also: Optional morning skate problem. Teams increasingly skip morning skate entirely (especially B2B). The entire intelligence architecture treats morning skate as the primary window, but sometimes there IS no morning skate. Need fallback protocol.

2. Kalshi-Specific Constraints (Sonnet review)

Pipeline assumes a standard sportsbook. Kalshi is a prediction market with:

Markets that open/close at unpredictable times
Bid-ask spreads that widen dramatically on late-breaking news
Position sizing constrained by contract liquidity A CRITICAL finding at 5:45 PM may be worthless if the Kalshi contract has already repriced or liquidity has dried up.

3. Non-Contact Jersey Detection (Gemini review)

All 5 advisors built morning skate query extraction but NONE instructed the extraction model to look for "non-contact jersey" or jersey colors. In NHL, a player skating in a non-contact jersey (red/yellow) is 100% OUT. Without this explicit extraction rule, the AI reads "Matthews is skating with the team" and flags him as EXPECTED_IN — missing the critical visual cue.

4. Market as Intelligence Source (Grok review)

None built a bidirectional loop where line movement confirms/disconfirms research. When a goalie rumor leaks, the market often moves BEFORE public confirmation. An unexplained 8-cent ML shift should increase confidence in a beat reporter's tweet. Pipeline treats research as upstream truth and market as downstream only.

5. In-Play Research (gpt-oss review)

Entire pipeline is pre-game "set-and-forget." No live research for in-game events (star forward injured at 10-minute mark, goalie pulled early, momentum swings). A modern desk should have a low-latency event-driven in-play layer.

BUILD SPEC — NHL RESEARCH PIPELINE

Source Credibility Hierarchy

Rank	Source	Confidence	Notes
1	NHL Official API (confirmed starter field)	100%	Definitive. Overrides everything.
2	DailyFaceoff "Confirmed" tag	95%	Rarely wrong once marked.
3	Beat reporter with visual confirmation at rink	85%	"I saw X take starter's end" — physical evidence.
4	Team official PR/Twitter	80%	Reliable when they post, often silent until game time.
5	National journalist (Friedman, Seravalli, Dreger)	75%	Very reliable for trades, less so for daily lineups.
6	DailyFaceoff "Expected" tag	70%	Usually right but not guaranteed.
7	Team beat writer (no visual confirmation)	65%	"I'm hearing X will start" — less reliable than visual.
8	Coach presser quote	60%	Coaches actively mislead about goalies. Cultural norm.
9	Fan/aggregator accounts	20%	Context only, never auto-adjust.

Goalie Confirmation Workflow

Status escalation:

UNCONFIRMED (default)
  → EXPECTED (DailyFaceoff "Expected" OR 1+ beat reporter says likely)
    → CONFIRMED (DailyFaceoff "Confirmed" OR NHL API OR 2+ independent beat reporters confirm visual)

Math layer treatment by status:

UNCONFIRMED: Use season-average split. Context only for analysts. No math adjustment.
EXPECTED: Apply 50% confidence blend (halfway between starter and backup scenario).
CONFIRMED: Full recompute using confirmed goalie's stats. No blending.

Conflict resolution:

Highest-tier source wins, unless contradicted by 2+ sources at next tier down.
Coach says X, beat reporter saw Y at morning skate → trust beat reporter visual.
DailyFaceoff vs beat reporter → flag as CONFLICT, hold at EXPECTED, Telegram alert.
Time recency is tiebreaker within same tier.

Goalie change cascade (full recompute):

Replace goalie SV% splits (5v5, PP, PK, high-danger) — use last-30-game rolling average
Recalculate xGA: xGA_adjusted = xGA_base * (league_avg_sv% / new_goalie_sv%)
Re-derive Poisson lambda for both teams
Rerun full probability distribution
Re-de-vig against Pinnacle
Compare against Kalshi — all markets (ML, puckline, total) recompute

Game-time decisions:

Run TWO parallel model scenarios (Goalie A and Goalie B)
Only bet when edge exists in BOTH scenarios
T-30 hard deadline: if unresolved, default to backup goalie scenario (conservative)

Emergency goalie (warmup illness/injury):

Immediate CRITICAL Telegram alert
HOLD all pending bets for this game
Full recompute with backup stats
If backup is AHL emergency call-up with no NHL data: use .880 baseline SV%
Apply 1.15x sigma multiplier (increased uncertainty)
T-15 minute hard cutoff: no new positions opened

Backup Goalie Quality Tiers

Tier	Criteria	Treatment
A	20+ NHL starts this season, .905+ SV%	Minimal delta from starter. Small recompute.
B	5-19 starts, .895-.905 SV%	Modest downgrade. ~0.008 SV% reduction.
C	<5 starts or career backup, .880-.894 SV%	Significant downgrade. Full recompute.
D	AHL call-up, <5 NHL career games	Emergency. Use .880 baseline + max uncertainty.

Research Model Stack

Step	Model	Purpose	Cost
Web search (all passes)	Grok 4.1 Fast	Fast search, structured extraction	$0.20/$0.50M
Structured extraction	Gemini Flash	Parse raw results into IntelAdjustment JSON	Cheap
Contradiction detection	DeepSeek R1	Compare new findings vs existing data	Cheap
Plausibility gate	Sonnet 4.6	Final sanity check before math layer	Per-call

Validation Pipeline (sequential, per finding)

Rule-based checks (code): Player on correct roster? Game today? Timestamp recent?
Contradiction detection (DeepSeek R1): Conflicts with prior findings? Matchup card data?
Plausibility gate (Sonnet): Magnitude matches player importance? Source cited?
Source verification (code): Source in credibility hierarchy? Tier sufficient for severity?

Research Passes & Queries

Morning Skate (10:00-11:30 AM ET) — per game:

"[Team] morning skate [date] goalie starter"
"[Team] morning skate [date] lines combinations"
"[Team] injury update [date]"
"[Star Player] morning skate status [date]" (for any questionable players)
"[Team] referee crew [date]"
"DailyFaceoff [Team] starter [date]"
EXTRACTION RULE: Explicitly look for "non-contact jersey" or jersey color mentions — player in non-contact = 100% OUT

Afternoon (2:00-3:00 PM ET):

"[Team] goalie confirmed [date]"
"[Team] lineup update [date]"
"[Team] recalled AHL [date]" / "[Team] roster move [date]"
"[Player] doubtful questionable [date]"
DailyFaceoff re-pull

Pre-Game (5:30-6:00 PM ET):

"[Team] starter confirmed warmup [date]"
"[Team] late scratch [date]"
"[Player] warmup status [date]"
NHL API official starter check

West Coast (8:00 PM ET): Same as pre-game for Pacific starts.

No Morning Skate Fallback: When team skips morning skate (common on B2B):

Rely on DailyFaceoff Expected tag + coach quotes from previous day
Check historical starter patterns ("this team always starts backup on B2B night 2")
Hold at EXPECTED until afternoon pass confirms

Adjustment Math

Adjustments MULTIPLY (not add):

final_xGF = base_xGF * goalie_opp_factor * injury_factor * fatigue_factor
final_xGA = base_xGA * own_goalie_factor * def_injury_factor * fatigue_factor

Goalie change sets the new baseline — other adjustments apply to the new baseline.

Skater injury (top-6 F confirmed OUT):

player_xGF_share = player_xGF / team_xGF (from MoneyPuck, 5v5)
xGF_reduction = player_xGF_share * 0.40 (40% lost, 60% redistributed)
injury_factor = 1 - xGF_reduction
Cap: single player max 15%, cumulative max 25%

Defenseman OUT: Same formula but applied to xGA.

Fatigue penalties:

Scenario	xGF multiplier	xGA multiplier
Home B2B (no travel)	0.97	1.02
Road B2B (same city)	0.96	1.03
Road B2B (cross-timezone)	0.94	1.05
3rd game in 4 nights	0.95	1.04
Travel disruption (late arrival)	additional 0.98	additional 1.03
Cumulative fatigue cap	min 0.92	max 1.08

Referee adjustment:

ref_penalty_factor = ref_avg_penalties / league_avg_penalties
adjusted_PP_opps = base_PP_opps * ref_penalty_factor
MODERATE severity, auto-applied

Bounds:

Floor: xGF cannot drop below 60% of baseline
Ceiling: xGA cannot exceed 150% of baseline

IntelAdjustment Schema

interface NHLIntelAdjustment {
  id: string;                    // UUID
  game_id: string;               // NHL API game ID
  team: string;                  // 3-letter code
  pass: "morning" | "afternoon" | "pregame" | "westcoast" | "emergency";
  timestamp: string;             // ISO 8601

  finding_type: "goalie_confirmed" | "goalie_change" | "player_out" | "player_in" |
                "line_change" | "travel_disruption" | "callup" | "trade" |
                "referee_assignment" | "motivation_context";

  severity: "CRITICAL" | "MODERATE" | "CONTEXT";
  status: "CONFIRMED" | "EXPECTED" | "UNCONFIRMED" | "CONFLICT";

  player_name: string | null;
  player_position: "G" | "F" | "D" | null;
  player_xgf_share: number | null;

  adjustment_type: "xGF" | "xGA" | "SV%" | "PP%" | "full_recompute" | "sigma" | null;
  adjustment_magnitude: number | null;  // multiplier (e.g., 0.94 = 6% reduction)
  adjustment_confidence: "HIGH" | "MEDIUM" | "LOW";

  source: string;
  source_tier: number;           // 1-9 from hierarchy
  source_url: string | null;
  raw_text: string;

  auto_apply: boolean;
  supersedes: string | null;     // ID of finding this replaces
  invalidated: boolean;
  invalidated_by: string | null;

  // Audit fields
  pre_adjustment_value: number | null;
  post_adjustment_value: number | null;
  applied_to_model: boolean;
  applied_timestamp: string | null;
}

Severity Classification

CRITICAL: Goalie change, top-6 F/top-4 D confirmed OUT, trade involving starter. Full recompute + Telegram alert.
MODERATE: Bottom-6 scratch, line shuffle, AHL call-up replacing a regular, referee assignment. Updates card + applies adjustment.
CONTEXT: Coach quotes, motivation narratives, minor moves. Prose only. No math adjustment.

Kill Switch & Safety

Commands:

/freeze NHL — stops all auto-adjustments. Findings stored as QUEUED_FROZEN.
/unfreeze NHL — resumes. Queued findings require explicit /apply queue NHL.
/rollback [id] — reverts specific adjustment, re-applies remaining chain.

Late-breaking news after analyst submission:

CONTEXT: Discard, log
MODERATE (<T-60): Feed to Opus verdict layer only as supplemental brief
CRITICAL (<T-30): Emergency path — Opus re-evaluates with late intel, Telegram alert
CRITICAL (<T-15): HARD STOP — no new positions opened

Post-mortem integration: Weekly review tracks which adjustments were correct, which sources were reliable. Source tiers updated from empirical results.

BOSS RULINGS (2026-04-01)

Cost budget: No fixed budget. Build it right, measure actual costs after running, then adjust from there.
Market as intelligence: Yellow light only. Log unexplained line movement as a CONFLICT flag on the matchup card. Alert boss on Telegram. Do NOT auto-adjust model numbers — let boss decide whether to bet or skip that game.
Non-contact jersey rule: Yes. Add explicit extraction instruction to look for "non-contact jersey" mentions in morning skate reports.
In-play research: Pre-game only for now. No live/in-game research layer.
Beat reporter list: Full 32-team list from day one.
Kalshi liquidity check: Deferred. Focus is on accurate prediction-making (data collection + research). Execution-layer concerns like Kalshi liquidity come later.

COUNCIL METADATA

Detail	Value
Council date	2026-04-01
Advisory responses	5 (all completed)
Peer reviews	5 (all completed)
Strongest advisor	Opus (4/5 votes)
Biggest blind spots	Grok (coach hierarchy backwards), gpt-oss (assumes human staff)
Full council data	`/home/ubuntu/edgeclaw/data/councils/2026-04-01/nhl-research/`

Source: ~/edgeclaw/results/panel-results/nhl-research-pipeline-ruling.md