Analyst Pipeline Specification — Prediction Phase

Version: 1.0 Date: 2026-04-06 Status: AUTHORITATIVE — Council Ruling (5/5 unanimous Option C) Applies to: All active desks (MLB, MLB Player Props, and all future desks)


Overview

The analyst stage is where AI models independently review fundamental data and generate predictions. This spec defines the architecture, models, data flow, and rules.

Council ruling: All 5 advisors (Opus, Sonnet, Gemini Pro, Grok 4.20, gpt-oss-120b) unanimously rejected the full council approach (Option B) for daily predictions. They unanimously recommended Option C — independent parallel analysts with mechanical combination in code.


Architecture — Option C

Three models run in parallel. Each receives the same fundamentals-only briefing. Each makes predictions independently. A code function (not AI) combines them. No AI touches the synthesis step.

DATA COLLECTION (scrapers, crons)
       |
BRIEFING GENERATOR (strips all prices/edges, fundamentals only)
       |
  +---------+---------+
  |         |         |
OPUS    GEMINI    GPT-OSS
(native) (API)    (API)
  |         |         |
  +---------+---------+
       |
MECHANICAL COMBINATION (code, weighted average)
       |
STORE (analyst_picks table, tagged UNANIMOUS/MAJORITY/SPLIT)
       |
SETTLEMENT (3 AM daily, Brier scores)

Panel Roster (LOCKED)

Seat Model Provider Route Notes
Analyst 1 Claude Opus 4.6 Max Reasoning Anthropic Claude Code cloud scheduled agent Bills to subscription, NOT API. Max reasoning mode.
Analyst 2 Gemini 3.1 Pro Preview Google OpenRouter (google/gemini-3.1-pro-preview) API call from analyst-runner.ts
Analyst 3 gpt-oss-120b OpenAI OpenRouter (openai/gpt-oss-120b) API call from analyst-runner.ts

Rules:


Briefing — Fundamentals Only (Price Blackout)

Analysts NEVER see:

Analysts DO see:

Implementation: Build per-desk briefing generators (like NBA's analyst-briefing.ts) that pull from desk-specific data tables. Explicitly strip any price/edge data before sending to models.


Analyst Output Format

Each model returns structured JSON:

{
  "game_id": "...",
  "sport": "mlb",
  "market_type": "spread|total|moneyline|pitcher_strikeouts|batter_hits|...",
  "prediction": {
    "side": "over|under|home|away",
    "line": 5.5,
    "probability": 0.62
  },
  "conviction": 4,
  "reasoning": "One sentence explaining the key factor driving this prediction"
}

For player props, the market_type specifies the prop (e.g., pitcher_strikeouts, batter_hits, batter_total_bases, batter_home_runs, batter_hits_runs_rbis).


Mechanical Combination

A code function (TypeScript, not AI) combines the 3 predictions:

Weights:

Formula:

final_prediction = (w1 * pred1 + w2 * pred2 + w3 * pred3) / (w1 + w2 + w3)
final_conviction = average of individual convictions (no rounding)

No AI touches this step. It is arithmetic.


Consensus Tagging

After all 3 models respond, tag the prediction set:

Tag Definition What it means
UNANIMOUS All 3 models predict the same direction Strong signal
MAJORITY 2 models agree, 1 dissents Moderate signal, dissenter logged
SPLIT All 3 disagree or no clear majority Weak/no signal

Stored fields:


Data Collection Phase (CURRENT MODE)

Log everything. Act on nothing.

We are NOT betting on these predictions yet. Every prediction from every model gets stored regardless of consensus or conviction. The purpose is to accumulate data so the system teaches itself what's actionable.

After months of data, the patterns will show:

The rules for what counts as actionable will be derived from the data, not guessed upfront.


No Peer Review on Daily Predictions

The full council process (advisory → peer review → synthesis) is NEVER used for daily prediction runs. This is a hard rule.

Why (council unanimous):


Monthly Calibration Council

On the 1st of each month, the full 5-model council reviews the last 30 days:

Input (generated automatically):

Output:

The monthly council ruling gets saved to the dashboard.


XGBoost Meta-Learner Tracker

A simple counter monitors settled predictions per model per market type. When thresholds are crossed, it sends a Telegram alert.

Thresholds:

Implementation: Daily cron queries analyst_picks WHERE actual_result IS NOT NULL, groups by model + market_type, counts. Compares against thresholds. First time a threshold is crossed, sends a one-time Telegram alert.

When triggered: Run a council to evaluate whether XGBoost meta-learner should replace simple weighted average. Do not auto-deploy — council reviews first.


Degraded Mode

If a model's API fails after 3 retries:


Opus Scheduling (Cloud Agent)

Opus 4.6 Max Reasoning runs as a Claude Code cloud scheduled agent:


MLB-Specific Schedule

Time (ET) What runs
11:00 AM Data collection batch (stats, lineups, weather, bullpen, matchups)
12:00 PM Briefing generator builds today's briefings from collected data
12:15 PM Analyst pipeline fires — all 3 models run on all briefings
3:00 AM Settlement — resolves outcomes, calculates Brier scores
1st of month Monthly calibration council

Schedule may be adjusted based on lineup confirmation timing and game start times.


Database Schema

analyst_picks (per-prediction row)

id, date, sport, desk, game_id, market_type, player_name,
opus_prediction, opus_line, opus_probability, opus_conviction,
gemini_prediction, gemini_line, gemini_probability, gemini_conviction,
gptoss_prediction, gptoss_line, gptoss_probability, gptoss_conviction,
combined_prediction, combined_line, combined_probability, combined_conviction,
consensus_tag, dissenter_model, dissenter_prediction,
models_used, actual_result, actual_value,
opus_brier, gemini_brier, gptoss_brier, combined_brier,
settled_at, created_at

model_weights (per model per market type)

id, model_name, sport, market_type, weight, brier_score,
sample_size, last_updated

Peer Review Blind Spots (Documented for Future Reference)

The council's peer review phase identified these risks. They are NOT solved in v1 but must be monitored:

  1. Correlated model failure on public narratives — When all models read the same injury news or media hype, their agreement might be fake consensus. Monitor for high-consensus predictions on heavily-reported events.

  2. API reliability and silent model drift — Models may change behavior after provider updates without notice. Track prediction distribution shifts monthly.

  3. Model monoculture — Independence requires different foundation models. Current roster (Claude/Google/OpenAI) satisfies this. Do not add a second model from the same provider.

  4. A/B validation — Before acting on predictions (post data-collection phase), run a paper-trading validation period. Do not go live without at least 30 days of paper results.


Council Record

Source: ~/edgeclaw/docs/analyst-pipeline-spec.md