Date: 2026-03-30 Panel: Opus 4.6 (native, max reasoning) · Sonnet 4.6 · Gemini 3.1 Pro Preview · Grok 4.2 Reasoning Version: 2 (updated with no-pass constraint + boss clarifications) Verdict: UNANIMOUS on core questions, boss override on thresholds + steam
The desk analyst pipeline uses two independent AI analysts (Sonnet 4.6 + Gemini 3.1 Pro) to produce independent lines, probabilities, and analysis for sports betting markets on Kalshi. The edge scanner (pure math engine) finds mathematical edges by de-vigging Pinnacle lines and comparing against Kalshi prices. The analysts act as independent oddsmakers — they set their own lines from scratch using fundamental data only. Opus sees everything at the end and makes the final call.
We are in model-building mode. No real money is at risk. Every market gets a pick from every layer. We are tracking performance to measure accuracy, calibration, and model improvement over time.
1. DATA COLLECTION — free scrapers collect hard facts daily
2. RESEARCH — Grok + Sonar Pro (parallel, web search + deep research)
3. VALIDATION — 5 rule-based checks + Flash Lite contradiction detection
4. BLIND ANALYSIS — Sonnet 4.6 + Gemini 3.1 Pro (independent, cold, no memory)
- Each sets own lines (spread, total, ML, player props) from scratch
- Each provides implied probabilities + evidence + conviction
- Neither sees market prices, each other, or prior sessions
- System builds probability curves from each analyst output
5. COMPARISON LAYER — analyst curves vs edge scanner math vs Kalshi vs Pinnacle
6. OPUS VERDICT — sees EVERYTHING, makes final pick + position sizing on every market
7. POST-VERDICT — store predictions, track curves, settlement, calibration
UNANIMOUS + Boss confirmed: Hide ALL prices and lines. No exceptions.
Analysts see NONE of the following:
The analysts do not see what the contract thresholds are. They are not told "price BOS -7.5" — they are told "set your own spread for Celtics vs 76ers." They build the line from scratch.
Rationale: Anchoring is not a risk — it is a certainty. AI models anchor on any number shown to them. The spread IS a price. Even showing "-6.5" without odds tells the analyst the market thinks Boston wins by ~7. The entire value of the analyst layer is independence. If they adjust from market prices, we are paying for a redundant signal.
Boss override: No sharp money, no steam, no line movement at the analyst level.
Sharp money, line movement direction, cash/ticket divergence, reverse line movement — all excluded from analyst briefings entirely. This data goes directly to the Opus layer where it serves as a confirming/disconfirming signal alongside the analyst reports.
The analyst briefing contains ONLY raw performance data and situational facts. Nothing market-derived.
Guiding principle: The analyst sees what an oddsmaker would have on day one — raw data, no market reference.
Unanimous: Do not show Sagarin, DRatings, Dimers, ESPN BPI, Massey, GameSim, or Prediction Tracker to analysts.
These are other models' opinions. Showing them creates anchoring and turns analysts into aggregators rather than independent oddsmakers.
Valid use: Collect and track against outcomes over 2-3 months. If a model consistently beats Pinnacle implied probabilities, integrate that signal at the Opus layer — never at the analyst layer.
Every analyst produces the following for EVERY game assigned:
{
"game_id": "2026-03-30_BOS_PHI",
"analyst": "Sonnet-4.6",
"timestamp": "2026-03-30T14:23:00Z",
"game_summary": "2-3 sentence matchup assessment from fundamentals",
"spread_analysis": {
"predicted_winner": "BOS",
"predicted_margin": 7.0,
"confidence_band": [4.5, 9.5],
"implied_probability_curve": {
"win_by_1_plus": 0.74,
"win_by_4_plus": 0.62,
"win_by_7_plus": 0.48,
"win_by_10_plus": 0.33,
"win_by_14_plus": 0.18,
"win_by_20_plus": 0.06
},
"conviction": 4,
"conviction_reasoning": "Boston 5.2 net rating advantage amplified at home. PHI missing Embiid — team is -4.3 net rating without him this season.",
"evidence": [
"BOS net rating +5.2 vs PHI -1.3 = 6.5 point fundamental gap",
"BOS home court adds +3.1 to net rating historically",
"PHI 12-18 without Embiid, -4.3 net rating differential",
"Referee crew averages 2.1 more fouls/48 — deeper BOS roster benefits"
]
},
"total_analysis": {
"predicted_total": 219.0,
"confidence_band": [213.0, 225.0],
"implied_probability_curve": {
"over_205": 0.91,
"over_210": 0.80,
"over_215": 0.65,
"over_220": 0.45,
"over_225": 0.28,
"over_230": 0.13
},
"conviction": 3,
"conviction_reasoning": "Both teams near league average pace. No strong factors pushing over or under.",
"evidence": [
"BOS pace 99.2, PHI pace 98.8 — both middle of pack",
"BOS scores 114.2 PPG at home, allows 107.1",
"PHI scores 108.5 on road, allows 112.3"
]
},
"moneyline_analysis": {
"predicted_winner": "BOS",
"win_probability": 0.72,
"opponent_win_probability": 0.28,
"conviction": 4,
"conviction_reasoning": "Clear talent + situational advantage.",
"evidence": [
"Net rating differential + home court + injury advantage",
"BOS 28-8 at home this season"
]
},
"upset_scenario": "Philadelphia wins if Maxey scores 35+ and BOS shoots below 33% from three. PHI transition offense without Embiid is actually faster. BOS complacency in a seemingly easy matchup is the risk.",
"key_uncertainties": [
"If Embiid is upgraded to active, margin estimate drops to 3-4",
"BOS has been coasting in late season — effort level uncertain"
],
"data_gaps": [
"No recent PHI practice reports available",
"Unsure of BOS rotation plans with playoffs approaching"
]
}
| Level | Label | Meaning |
|---|---|---|
| 1 | VERY LOW | Genuine uncertainty. Making a pick but minimal fundamental basis. Near-random but tracked. |
| 2 | LOW | Weak basis. One or two data points, no strong directional story. |
| 3 | MODERATE | Reasonable basis. Several data points align. Standard operating pick. |
| 4 | HIGH | Strong basis. Multiple independent factors converge. Clear matchup story. |
| 5 | VERY HIGH | Exceptional. Rare clarity — overwhelming convergence of factors. Use sparingly (<10% of picks). |
Each analyst produces an implied probability curve for spreads and totals. The system:
This is the core model-building mechanism. Over time, we learn: is Sonnet better at spreads? Is Gemini better at totals? Does the edge scanner beat both on ML? The clean independence makes this measurement meaningful.
Opus receives for every market:
{
"market": "BOS vs PHI — Spread",
"final_line": "BOS -7.5",
"final_probability": 0.51,
"final_conviction": 4,
"position_size": "FULL",
"convergence_level": "FULL",
"verdict_narrative": "Both analysts see BOS -7 to -8.5. Edge scanner math at -7.1. Pinnacle at -6.5. Analysts + math agree market is underpricing BOS margin. Sharp money confirms direction. Full position.",
"analyst_agreement": true,
"analyst_vs_market_divergence": 1.5,
"tracking_flags": {
"high_conviction_both": true,
"contrarian_signal": false,
"steam_aligned": true
}
}
| Signal Pattern | Size |
|---|---|
| Both analysts + math agree, market diverges | FULL |
| One analyst + math agree, other close | THREE_QUARTER |
| Analysts agree, math differs | HALF |
| Math only, analysts neutral | QUARTER |
| Everyone disagrees (still pick, edge scanner tiebreaks) | MINIMUM |
MINIMUM is still a position. No passing. Every market gets a pick and a size. Minimum positions generate calibration data on low-confidence scenarios.
The analyst system prompt MUST include:
Independence framing: "You are an independent oddsmaker. You have NO access to current market prices. Build your lines entirely from the fundamental data provided. It is valuable for your line to differ significantly from markets — that divergence is the entire point of your role."
No-pass rule: "You MUST set a line and probability for every market type assigned. Low conviction is fine — it is still a pick with a number. There is no passing, no holding, no skipping."
Temporal isolation: Never show previous session outputs. Each session is fully independent.
Anti-inflation: "Most conviction ratings should be 2s and 3s. If you rate most picks 4+, you are miscalibrated. Be honest about uncertainty."
Evidence requirement: "Every claim must be backed by specific data from the briefing. Do not make assertions without evidence."
Track the following over time:
After 60-90 sessions, use this data to:
| Decision | Ruling |
|---|---|
| Price hiding | Complete blackout — no lines, odds, probabilities, thresholds from any source |
| Sharp money / steam | Excluded entirely from analyst layer |
| Briefing contents | Raw fundamentals only — stats, rest, refs, injuries, matchup data |
| Third-party models | Hidden from analysts — tracked for correlation research only |
| Assignment format | Game + market types only — "set your own spread, total, ML" — no thresholds shown |
| Analyst output | Own lines + implied probability curves + evidence + conviction + upset scenario |
| Conviction | 1-5 scale, per-market-type, mandatory reasoning, base rate enforcement |
| Probability curves | Built from analyst output, tracked against reality, compared across analysts |
| Opus verdict | Sees everything — analyst reports + Kalshi + Pinnacle + sharp money + models. Picks every market. |
| Position sizing | FULL to MINIMUM based on convergence — no passing |
| No-pass rule | Every layer picks every market. Low conviction = still a pick. |
The architecture's value is independence at the analyst layer. Protect it aggressively. The probability curves built from blind analysis are the product — everything else is infrastructure to make those curves better over time.
Date added: 2026-03-30 Status: HARD RULE — no substitutions
The blind analyst panel consists of exactly these three models. No substitutions, no swaps, no additions without a new panel ruling.
| Seat | Model | Provider | API |
|---|---|---|---|
| Analyst 1 | Claude Sonnet 4.6 (claude-sonnet-4-6) | Anthropic | Direct API |
| Analyst 2 | Gemini 3.1 Pro Preview (google/gemini-3.1-pro-preview) | OpenRouter | |
| Analyst 3 | gpt-oss-120b (openai/gpt-oss-120b) | OpenAI (open-weight) | OpenRouter |
Verdict layer: Opus 4.6 via CLI bridge (free). NOT a blind analyst.
Why these three:
Rules:
Date added: 2026-03-31 Council: Full 5-member council, unanimous Status: LOCKED
Opus goes FRESH every session. No resume. No memory of past verdicts.
Before each session, Opus receives a Verdict Calibration Digest — aggregate pipeline statistics only.
Resume introduces anchoring, path dependence, and context pollution — every advisor flagged these as fatal. Pure fresh wastes real signal about pipeline performance. Hybrid gives Opus calibration without contamination.