Panel: Opus, Sonnet, Grok 4.2 Reasoning, Gemini Pro 3.1 Judge: Opus (this document) Date: 2026-03-26
Strong panel. All four delivered specific, implementable ideas with URLs, schemas, and formulas. No hand-waving. Minor deductions: Grok's cron schedule was too sparse (daily-only for things that need hourly), Gemini's Flashpoint Index is creative but the NOTAM/MARAD/ADS-B sources may be brittle in practice. Otherwise excellent.
Every panelist independently concluded this. Sports has Pinnacle. Crypto has Deribit. Politics has nothing equivalent. We must build a synthetic sharp line from multiple sources.
Free API, 1000 req/hr with free key from api.data.gov. Track bill stages (introduced → committee → floor → passed → enrolled → signed). All 4 proposed nearly identical bill pipeline trackers.
https://api.congress.gov/v3/bill?api_key=KEY&limit=250&sort=updateDate+deschttps://api.congress.gov/sign-up/ (free, instant)Free, no auth needed. Tracks EOs, proclamations, memoranda.
https://www.federalregister.gov/api/v1/documents?conditions[presidential_document_type]=executive_order&conditions[publication_date][gte]=YYYY-MM-DDFree 15-minute feed. CAMEO-coded events with Goldstein scale scores. Filter QuadClass 3-4 (material conflict) for target countries.
http://data.gdeltproject.org/gdeltv2/lastupdate.txtThese are the direct data anchors for the highest-volume recurring political contracts (KXAPRPOTUS, KX538APPROVE).
realclearpolling.comprojects.fivethirtyeight.com/polls/data/approval-averages.csv (if still available) or scrapeMonthly CSV from CBP. Predictable release schedule (mid-month for prior month).
https://www.cbp.gov/newsroom/stats/southwest-land-border-encountersAll 4 said this independently. The prediction market consensus already incorporates sentiment. ML models need GPU farms we don't have. Cron is sufficient — political data doesn't move at microsecond speed.
Recurring series have patterns, are backtestable, and compound over time. One-offs require manual context and don't repeat.
Sonnet & Opus: Polymarket IS the sharp book. It has higher volume, more sophisticated (international) traders, and covers most Kalshi political markets. The Polymarket-Kalshi delta is the primary edge signal.
Grok & Gemini: No single sharp book — use a Bayesian/weighted ensemble of Metaculus + Manifold + historical base rates + proprietary metrics.
RULING: Sonnet/Opus win. Polymarket is the political Pinnacle.
Polymarket is real-money with deep liquidity, especially on political contracts. It's the closest thing politics has to a sharp book. The ensemble approach is also correct — but Polymarket should be the heaviest-weighted component, not one-of-many. We already scrape Manifold and Metaculus; adding Polymarket as the primary anchor with the others as cross-validation is the right architecture.
Implementation:
https://clob.polymarket.com/markets and https://gamma-api.polymarket.com/markets?closed=false&limit=500Opus: Velocity + projected Friday endpoint. Front-run pollster house effects (know which pollsters are about to publish and estimate their impact on the average).
Sonnet: Simpler — 7-day velocity, threshold crossing detection.
Grok: Second derivative (acceleration) + cross-market beta against PRESINDEXD.
Gemini: Time-series forecast with cosponsors and bipartisanship (wrong section — this was for bills, not approval).
RULING: Opus's approach is the most novel. Build velocity first (Sonnet), add house effects later (Opus).
The poll release timing exploitation (front-running which pollsters enter the RCP average using house effect tables) is genuinely alpha. But it's Phase 2 — Sonnet's simpler velocity model works immediately and can be enhanced. Grok's acceleration idea is interesting but harder to calibrate with limited data.
Implementation:
approval_velocity = (current - 3d_ago) / 3, project to FridayOpus, Sonnet, Grok: GDELT events (CAMEO-coded, Goldstein scale)
Gemini: GDELT PLUS FAA NOTAMs, MARAD maritime alerts, and ADS-B/AIS military flight/ship tracking
RULING: Gemini's Flashpoint Index is the most creative idea in the entire panel. APPROVED for Phase 3.
NOTAMs, MARAD alerts, and military flight tracking are genuine alpha that virtually no prediction market trader uses. A new MARAD alert for the Persian Gulf combined with increased US tanker-aircraft activity IS a concrete signal for UNSC veto / conflict contracts. However:
Implementation (Phase 3):
https://www.maritime.dot.gov/msci/msci-alerts — scrape alert listhttps://notams.aim.faa.gov/ — scrape by regionGemini only: Track where campaign money is actually being spent (FEC filings) vs what Trump posts about. The money is the sharp signal.
RULING: Interesting but LOW PRIORITY. FEC data is quarterly and heavily lagged.
FEC filings are valuable for long-term election markets but update too slowly for our weekly/monthly recurring contracts. The idea is sound in principle — money reveals intent better than tweets — but the data cadence doesn't match our contract cadence. Defer to Phase 4 if ever.
Opus: Mastodon-compatible API or third-party trackers. Poisson model for post velocity.
Grok: LLM-classified post types (rage/policy/personal) for entropy metric.
Sonnet/Gemini: Acknowledge it's hard. Gemini suggests logged-in scraping.
RULING: Use third-party trackers first. Don't scrape Truth Social directly.
Truth Social's API is hostile and direct scraping risks bans. truthsocialarchive.com or similar third-party trackers that already aggregate post counts are lower-risk. The Poisson extrapolation model (Opus) is the right math for the KXTRUTHSOCIAL contract. Grok's LLM entropy classifier is over-engineered for a post-counting contract.
Sonnet: Include PredictIt as a source. API at predictit.org/api/marketdata/all/
Others: Don't mention PredictIt.
RULING: Include PredictIt but low weight.
PredictIt is less efficient than Polymarket but still free data. Include it as a cross-validation source at low weight (0.05-0.10). The API is simple and stable.
Grok: Separate politics.db
Others: Implied use of existing db
RULING: Use existing research-pipeline.db. No new database.
All our other desks use the same database. Splitting creates maintenance burden and breaks cross-desk queries. Add new tables to research-pipeline.db via getPipelineDb().
external_market_pricesabs(edge) > 0.08 on any recurring seriesTradeable output by Day 5.
-- Approval ratings (RCP, 538)
CREATE TABLE IF NOT EXISTS approval_ratings (
id INTEGER PRIMARY KEY AUTOINCREMENT,
source TEXT NOT NULL, -- 'rcp', '538', 'silver_bulletin'
metric TEXT NOT NULL, -- 'approve', 'disapprove', 'net', 'favorable'
value REAL NOT NULL,
captured_at TEXT NOT NULL
);
-- Individual polls (for house effect calculation)
CREATE TABLE IF NOT EXISTS polls_individual (
poll_id TEXT PRIMARY KEY,
pollster TEXT NOT NULL,
field_date_end TEXT,
sample_size INTEGER,
population TEXT, -- 'RV', 'LV', 'A'
question_type TEXT NOT NULL,
result_positive REAL,
result_negative REAL,
captured_at TEXT NOT NULL
);
-- Pollster house effects (computed weekly)
CREATE TABLE IF NOT EXISTS pollster_house_effects (
pollster TEXT NOT NULL,
question_type TEXT NOT NULL,
house_effect REAL NOT NULL,
sample_count INTEGER NOT NULL,
updated_at TEXT NOT NULL,
PRIMARY KEY(pollster, question_type)
);
-- Congressional bill pipeline
CREATE TABLE IF NOT EXISTS congress_bills (
bill_id TEXT PRIMARY KEY,
title TEXT,
bill_type TEXT,
congress INTEGER,
introduced_date TEXT,
last_action_date TEXT,
last_action TEXT,
status TEXT NOT NULL, -- 'introduced','committee','floor','passed_one','passed_both','enrolled','signed'
readiness_score INTEGER DEFAULT 0,
captured_at TEXT NOT NULL
);
-- Executive actions (EOs, proclamations, memoranda)
CREATE TABLE IF NOT EXISTS executive_actions (
document_number TEXT PRIMARY KEY,
action_type TEXT NOT NULL,
title TEXT NOT NULL,
signing_date TEXT,
publication_date TEXT,
eo_number INTEGER,
captured_at TEXT NOT NULL
);
-- WH schedule + lid calls
CREATE TABLE IF NOT EXISTS wh_schedule (
event_date TEXT NOT NULL,
lid_called_at TEXT,
lid_type TEXT,
event_count INTEGER,
is_travel INTEGER DEFAULT 0,
captured_at TEXT NOT NULL,
PRIMARY KEY(event_date, captured_at)
);
-- CBP border encounters (monthly)
CREATE TABLE IF NOT EXISTS cbp_encounters (
month TEXT PRIMARY KEY,
sw_encounters INTEGER NOT NULL,
source_url TEXT,
captured_at TEXT NOT NULL
);
-- External prediction market prices (Polymarket, PredictIt)
CREATE TABLE IF NOT EXISTS external_market_prices (
id INTEGER PRIMARY KEY AUTOINCREMENT,
source TEXT NOT NULL, -- 'polymarket', 'predictit'
market_id TEXT NOT NULL,
question TEXT,
yes_price REAL,
no_price REAL,
volume REAL,
matched_kalshi_series TEXT,
captured_at TEXT NOT NULL
);
-- Kalshi-to-external market mapping (manually curated)
CREATE TABLE IF NOT EXISTS prediction_kalshi_map (
kalshi_series TEXT NOT NULL,
external_source TEXT NOT NULL,
external_id TEXT NOT NULL,
match_quality TEXT NOT NULL, -- 'exact', 'close', 'related'
last_verified TEXT NOT NULL,
PRIMARY KEY(kalshi_series, external_source, external_id)
);
-- GDELT conflict events
CREATE TABLE IF NOT EXISTS gdelt_events (
id INTEGER PRIMARY KEY AUTOINCREMENT,
event_date TEXT NOT NULL,
country_code TEXT NOT NULL,
cameo_code TEXT NOT NULL,
goldstein_scale REAL,
num_mentions INTEGER,
quad_class INTEGER,
captured_at TEXT NOT NULL
);
-- Computed politics consensus + edge
CREATE TABLE IF NOT EXISTS politics_consensus (
kalshi_series TEXT NOT NULL,
kalshi_price REAL,
consensus_prob REAL NOT NULL,
consensus_sources INTEGER,
edge REAL,
polymarket_prob REAL,
predictit_prob REAL,
manifold_prob REAL,
metaculus_prob REAL,
model_prob REAL,
computed_at TEXT NOT NULL,
PRIMARY KEY(kalshi_series, computed_at)
);
-- MARAD maritime alerts (Phase 3)
CREATE TABLE IF NOT EXISTS marad_alerts (
alert_id TEXT PRIMARY KEY,
region TEXT,
alert_date TEXT,
advisory_text TEXT,
severity INTEGER,
captured_at TEXT NOT NULL
);
-- Indexes
CREATE INDEX IF NOT EXISTS idx_approval_source_time ON approval_ratings(source, captured_at DESC);
CREATE INDEX IF NOT EXISTS idx_gdelt_country_date ON gdelt_events(country_code, event_date DESC);
CREATE INDEX IF NOT EXISTS idx_external_market_source ON external_market_prices(source, matched_kalshi_series, captured_at DESC);
CREATE INDEX IF NOT EXISTS idx_congress_status ON congress_bills(status, last_action_date DESC);
CREATE INDEX IF NOT EXISTS idx_politics_consensus_series ON politics_consensus(kalshi_series, computed_at DESC);
| Time | Job | Frequency | Targets |
|---|---|---|---|
| 6:00 AM | scrape-rcp-538 | Daily | approval_ratings |
| 6:15 AM | scrape-congress | Daily | congress_bills |
| 6:30 AM | scrape-federal-register | Daily | executive_actions |
| 6:45 AM | scrape-wh-schedule | Every 15min 7AM-11PM | wh_schedule |
| */30 | scrape-polymarket-politics | Every 30min | external_market_prices |
| */30 | scrape-predictit | Every 30min | external_market_prices |
| 7:15 AM, 1PM, 7PM | compute-politics-consensus | 3x daily | politics_consensus |
| 8:00 AM | scrape-cbp | 1st of month | cbp_encounters |
| */30 | scrape-gdelt | Every 30min | gdelt_events |
| 10:00 PM Sun | update-house-effects | Weekly | pollster_house_effects |
consensus_prob = (0.45 * polymarket) + (0.25 * metaculus) + (0.20 * manifold) + (0.10 * model)
// For data-anchored contracts (bills, border, EOs):
// model weight increases to 0.40, others decrease proportionally
edge = consensus_prob - kalshi_implied_prob
trade_threshold = 0.08 (8 cents)
min_sources = 2 (at least 2 independent sources must agree)
| Source | URL | Auth | Cost |
|---|---|---|---|
| Polymarket CLOB | https://clob.polymarket.com/markets |
None | Free |
| Polymarket Gamma | https://gamma-api.polymarket.com/markets |
None | Free |
| PredictIt | https://www.predictit.org/api/marketdata/all/ |
None | Free |
| Congress.gov | https://api.congress.gov/v3/bill |
Free API key | Free |
| Federal Register | https://www.federalregister.gov/api/v1/documents |
None | Free |
| CBP Encounters | https://www.cbp.gov/newsroom/stats/southwest-land-border-encounters |
None | Free |
| GDELT 2.0 | http://data.gdeltproject.org/gdeltv2/lastupdate.txt |
None | Free |
| RCP Polling | https://www.realclearpolling.com/polls/approval/donald-trump |
None | Free |
| 538 Approval | https://projects.fivethirtyeight.com/polls/ |
None | Free |
| MARAD Alerts | https://www.maritime.dot.gov/msci/msci-alerts |
None | Free |
| WH Schedule | https://www.whitehouse.gov/schedule/ |
None | Free |
| Panelist | Grade | Strengths | Weaknesses |
|---|---|---|---|
| Opus | A+ | Poll timing exploitation is genuine alpha. Phased plan starts producing in 3 days. 10 specific quantitative models with formulas. | Slightly over-detailed on some models |
| Sonnet | A | Polymarket-as-Pinnacle insight is the single best idea. Countable data contracts prioritized. Very practical. | Fewer novel metrics |
| Grok | A- | Legislative Friction Index and Executive Surprise are novel. Contrarian stance on not building ML is correct. | Cron schedule too sparse. Truth Social entropy over-engineered. |
| Gemini | A | Flashpoint Index (NOTAMs/MARAD) is the most creative single idea. Say-Do Gap is interesting. | FEC data too lagged for weekly contracts. Some sources may be brittle. |