=== COUNCIL RESULTS ===

QUESTION: Design the complete research and intelligence pipeline for the NHL betting desk.

DATE: 2026-04-01 ADVISORS: Opus, Sonnet, Gemini 3.1 Pro, Grok 4.20 Reasoning, gpt-oss-120b PROCESS: Full council — 5 advisory, anonymized peer review, 5 independent reviews ANONYMIZATION MAP: A=Gemini, B=Opus, C=Grok, D=gpt-oss, E=Sonnet


PEER REVIEW VOTES

Strongest Response:

Biggest Blind Spot:


CONSENSUS (All 5 Agreed)

  1. Goalie confirmation is a first-class subsystem — not just another injury check. It needs its own cascade, conflict resolution, and dual-scenario protocol.
  2. Beat reporter visual confirmation outweighs coach verbal statements — NHL coaches deliberately mislead about goalies. This is cultural, not accidental.
  3. DailyFaceoff is the primary structured source but it LAGS beat reporters by 5-15 minutes.
  4. Structured IntelAdjustment JSON objects for all findings — not free-text parsing.
  5. Analysts stay completely blind — they see prose matchup cards only. No market data, no IntelAdjustment objects, no model outputs.
  6. Kill switch via Telegram — freezes auto-adjustments but continues collecting findings.
  7. Full audit trail — every adjustment stores before/after values with source attribution.
  8. Late CRITICAL findings (post-analyst-submission) go directly to verdict layer (Opus), not back to analysts.

KEY DIVERGENCES

1. Additive vs Multiplicative Adjustments

2. Coach vs Beat Reporter Credibility

3. Backup Goalie Quality

4. Research Model Stack


STRONGEST ARGUMENTS (From Peer Review)

  1. Opus's multiplicative adjustment framework — 4/5 reviewers endorsed this. The worked example and floor/ceiling bounds make it production-ready.

  2. Opus's post-mortem feedback loop — Only advisor to design weekly tracking of source reliability that updates credibility tiers. Beat reporters who are consistently wrong get downgraded.

  3. Opus's dual-scenario GTD handling — Run two parallel models (Goalie A scenario + Goalie B scenario). Only bet when edge exists in BOTH. Most operationally sound.

  4. Sonnet's backup goalie quality tiers — 4-level system with specific SV% thresholds. Critical for the most common NHL research event (B2B backup start).

  5. Sonnet's T-15 minute hard cutoff — No new bet execution within 15 minutes of puck drop when CRITICAL finding fires. Stricter than NBA, appropriate for NHL.

  6. Opus's "time-based credibility decay does NOT apply in NHL" — Subtle but correct. The entire intelligence window is same-day (9 AM to puck drop). Recency is a tiebreaker, not a decay function.


WHAT EVERYONE MISSED (Peer Review Findings)

1. Cost & Latency Budget (Opus review)

On a 15-game NHL night: 30 teams × 7-8 queries × 3-4 passes = 400-900 web search API calls per day plus extraction/validation LLM calls. No advisor calculated total API cost or whether queries can complete within pass windows. Need a priority system: focus research spend on games with active Kalshi markets.

Also: Optional morning skate problem. Teams increasingly skip morning skate entirely (especially B2B). The entire intelligence architecture treats morning skate as the primary window, but sometimes there IS no morning skate. Need fallback protocol.

2. Kalshi-Specific Constraints (Sonnet review)

Pipeline assumes a standard sportsbook. Kalshi is a prediction market with:

3. Non-Contact Jersey Detection (Gemini review)

All 5 advisors built morning skate query extraction but NONE instructed the extraction model to look for "non-contact jersey" or jersey colors. In NHL, a player skating in a non-contact jersey (red/yellow) is 100% OUT. Without this explicit extraction rule, the AI reads "Matthews is skating with the team" and flags him as EXPECTED_IN — missing the critical visual cue.

4. Market as Intelligence Source (Grok review)

None built a bidirectional loop where line movement confirms/disconfirms research. When a goalie rumor leaks, the market often moves BEFORE public confirmation. An unexplained 8-cent ML shift should increase confidence in a beat reporter's tweet. Pipeline treats research as upstream truth and market as downstream only.

5. In-Play Research (gpt-oss review)

Entire pipeline is pre-game "set-and-forget." No live research for in-game events (star forward injured at 10-minute mark, goalie pulled early, momentum swings). A modern desk should have a low-latency event-driven in-play layer.


BUILD SPEC — NHL RESEARCH PIPELINE

Source Credibility Hierarchy

Rank Source Confidence Notes
1 NHL Official API (confirmed starter field) 100% Definitive. Overrides everything.
2 DailyFaceoff "Confirmed" tag 95% Rarely wrong once marked.
3 Beat reporter with visual confirmation at rink 85% "I saw X take starter's end" — physical evidence.
4 Team official PR/Twitter 80% Reliable when they post, often silent until game time.
5 National journalist (Friedman, Seravalli, Dreger) 75% Very reliable for trades, less so for daily lineups.
6 DailyFaceoff "Expected" tag 70% Usually right but not guaranteed.
7 Team beat writer (no visual confirmation) 65% "I'm hearing X will start" — less reliable than visual.
8 Coach presser quote 60% Coaches actively mislead about goalies. Cultural norm.
9 Fan/aggregator accounts 20% Context only, never auto-adjust.

Goalie Confirmation Workflow

Status escalation:

UNCONFIRMED (default)
  → EXPECTED (DailyFaceoff "Expected" OR 1+ beat reporter says likely)
    → CONFIRMED (DailyFaceoff "Confirmed" OR NHL API OR 2+ independent beat reporters confirm visual)

Math layer treatment by status:

Conflict resolution:

Goalie change cascade (full recompute):

  1. Replace goalie SV% splits (5v5, PP, PK, high-danger) — use last-30-game rolling average
  2. Recalculate xGA: xGA_adjusted = xGA_base * (league_avg_sv% / new_goalie_sv%)
  3. Re-derive Poisson lambda for both teams
  4. Rerun full probability distribution
  5. Re-de-vig against Pinnacle
  6. Compare against Kalshi — all markets (ML, puckline, total) recompute

Game-time decisions:

Emergency goalie (warmup illness/injury):

Backup Goalie Quality Tiers

Tier Criteria Treatment
A 20+ NHL starts this season, .905+ SV% Minimal delta from starter. Small recompute.
B 5-19 starts, .895-.905 SV% Modest downgrade. ~0.008 SV% reduction.
C <5 starts or career backup, .880-.894 SV% Significant downgrade. Full recompute.
D AHL call-up, <5 NHL career games Emergency. Use .880 baseline + max uncertainty.

Research Model Stack

Step Model Purpose Cost
Web search (all passes) Grok 4.1 Fast Fast search, structured extraction $0.20/$0.50M
Structured extraction Gemini Flash Parse raw results into IntelAdjustment JSON Cheap
Contradiction detection DeepSeek R1 Compare new findings vs existing data Cheap
Plausibility gate Sonnet 4.6 Final sanity check before math layer Per-call

Validation Pipeline (sequential, per finding)

  1. Rule-based checks (code): Player on correct roster? Game today? Timestamp recent?
  2. Contradiction detection (DeepSeek R1): Conflicts with prior findings? Matchup card data?
  3. Plausibility gate (Sonnet): Magnitude matches player importance? Source cited?
  4. Source verification (code): Source in credibility hierarchy? Tier sufficient for severity?

Research Passes & Queries

Morning Skate (10:00-11:30 AM ET) — per game:

Afternoon (2:00-3:00 PM ET):

Pre-Game (5:30-6:00 PM ET):

West Coast (8:00 PM ET): Same as pre-game for Pacific starts.

No Morning Skate Fallback: When team skips morning skate (common on B2B):

Adjustment Math

Adjustments MULTIPLY (not add):

final_xGF = base_xGF * goalie_opp_factor * injury_factor * fatigue_factor
final_xGA = base_xGA * own_goalie_factor * def_injury_factor * fatigue_factor

Goalie change sets the new baseline — other adjustments apply to the new baseline.

Skater injury (top-6 F confirmed OUT):

player_xGF_share = player_xGF / team_xGF (from MoneyPuck, 5v5)
xGF_reduction = player_xGF_share * 0.40 (40% lost, 60% redistributed)
injury_factor = 1 - xGF_reduction
Cap: single player max 15%, cumulative max 25%

Defenseman OUT: Same formula but applied to xGA.

Fatigue penalties:

Scenario xGF multiplier xGA multiplier
Home B2B (no travel) 0.97 1.02
Road B2B (same city) 0.96 1.03
Road B2B (cross-timezone) 0.94 1.05
3rd game in 4 nights 0.95 1.04
Travel disruption (late arrival) additional 0.98 additional 1.03
Cumulative fatigue cap min 0.92 max 1.08

Referee adjustment:

Bounds:

IntelAdjustment Schema

interface NHLIntelAdjustment {
  id: string;                    // UUID
  game_id: string;               // NHL API game ID
  team: string;                  // 3-letter code
  pass: "morning" | "afternoon" | "pregame" | "westcoast" | "emergency";
  timestamp: string;             // ISO 8601

  finding_type: "goalie_confirmed" | "goalie_change" | "player_out" | "player_in" |
                "line_change" | "travel_disruption" | "callup" | "trade" |
                "referee_assignment" | "motivation_context";

  severity: "CRITICAL" | "MODERATE" | "CONTEXT";
  status: "CONFIRMED" | "EXPECTED" | "UNCONFIRMED" | "CONFLICT";

  player_name: string | null;
  player_position: "G" | "F" | "D" | null;
  player_xgf_share: number | null;

  adjustment_type: "xGF" | "xGA" | "SV%" | "PP%" | "full_recompute" | "sigma" | null;
  adjustment_magnitude: number | null;  // multiplier (e.g., 0.94 = 6% reduction)
  adjustment_confidence: "HIGH" | "MEDIUM" | "LOW";

  source: string;
  source_tier: number;           // 1-9 from hierarchy
  source_url: string | null;
  raw_text: string;

  auto_apply: boolean;
  supersedes: string | null;     // ID of finding this replaces
  invalidated: boolean;
  invalidated_by: string | null;

  // Audit fields
  pre_adjustment_value: number | null;
  post_adjustment_value: number | null;
  applied_to_model: boolean;
  applied_timestamp: string | null;
}

Severity Classification

Kill Switch & Safety

Commands:

Late-breaking news after analyst submission:

Post-mortem integration: Weekly review tracks which adjustments were correct, which sources were reliable. Source tiers updated from empirical results.


BOSS RULINGS (2026-04-01)

  1. Cost budget: No fixed budget. Build it right, measure actual costs after running, then adjust from there.

  2. Market as intelligence: Yellow light only. Log unexplained line movement as a CONFLICT flag on the matchup card. Alert boss on Telegram. Do NOT auto-adjust model numbers — let boss decide whether to bet or skip that game.

  3. Non-contact jersey rule: Yes. Add explicit extraction instruction to look for "non-contact jersey" mentions in morning skate reports.

  4. In-play research: Pre-game only for now. No live/in-game research layer.

  5. Beat reporter list: Full 32-team list from day one.

  6. Kalshi liquidity check: Deferred. Focus is on accurate prediction-making (data collection + research). Execution-layer concerns like Kalshi liquidity come later.


COUNCIL METADATA

Detail Value
Council date 2026-04-01
Advisory responses 5 (all completed)
Peer reviews 5 (all completed)
Strongest advisor Opus (4/5 votes)
Biggest blind spots Grok (coach hierarchy backwards), gpt-oss (assumes human staff)
Full council data /home/ubuntu/edgeclaw/data/councils/2026-04-01/nhl-research/
Source: ~/edgeclaw/results/panel-results/nhl-research-pipeline-ruling.md