Base: Claude Opus 4.6 (complete, unmodified) Panel additions cherry-picked from: Gemini 3.1 Pro, Grok 4.1, Grok 4 Fast, DeepSeek V3.1, GPT-4.1, Sonar Reasoning Pro Merged: Mar 15, 2026
This is the complete data collection specification for the Futures desk in the research pipeline. It covers commodity, energy, metals, agriculture, and financial futures traded on direct futures exchanges (CME/NYMEX/ICE) as the primary venue and prediction markets (Kalshi/Polymarket) as a secondary venue for commodity/economic binary contracts. An AI builder should be able to read this and know exactly what data to collect, from where, how often, and why.
Primary venue: Direct futures trading via a broker (CME/NYMEX/ICE/CBOE). Full liquidity, tight spreads, ladder order books with real depth. This is where size gets deployed.
Secondary venue: Kalshi and Polymarket offer binary contracts on commodity prices, economic data releases (CPI, NFP, GDP), Fed rate decisions, and Treasury yields. These platforms have their own ladder/CLOB structures but with thinner liquidity. The same data infrastructure powers both -- the difference is execution venue and position sizing.
The edge: Government agencies publish enormous amounts of free data that moves futures markets. Most retail traders watch headline numbers. We parse the sub-components, cross-reference positioning data (CFTC COT), track term structure shifts, and monitor alternative data (satellite, ship tracking) to identify mispricing hours to weeks before the market corrects.
IMPORTANT -- Prediction Market Arbitrage Caveat [PANEL: Sonar Reasoning Pro]: Cross-venue arb between Kalshi and Polymarket for the same underlying (e.g., BLS CPI) is NOT reliably tradeable. Spreads of 5-8 cents look attractive but after Kalshi trading fees (~0.5% one-way), Polymarket AMM slippage (1-2% in mid-cap markets), and funding costs, you are often underwater. Use Kalshi/Polymarket prices as market sentiment proxies and fundamental basis trade signals (our model says 60% probability, market prices 45% = edge to trade the underlying futures, not necessarily the binary). Cross-venue arb only when spreads exceed 10 cents on high-volume markets with demonstrable execution cost below the spread.
An AI builder must understand these structural differences:
Physical Delivery Matters -- Unlike stocks, most commodity futures can be settled by actual physical delivery of barrels of oil, bushels of corn, or bars of gold. This creates real-world supply bottlenecks (warehouse stocks, pipeline capacity, port congestion) that directly affect prices. Inventory data is the single most important input.
Term Structure Is the Edge -- Futures trade in monthly contracts stretching years into the future. The shape of the price curve (contango = future months more expensive; backwardation = future months cheaper) reveals the market's supply/demand expectations. Changes in curve shape often signal price moves before the front-month price reacts.
Government Data Drives Everything -- Unlike stocks where earnings matter, commodity futures are driven by government reports: EIA petroleum inventories (Wednesdays), EIA natural gas storage (Thursdays), USDA WASDE crop reports (monthly), CFTC positioning data (Fridays), BLS inflation data (monthly). These are all free. The release schedule is the trading calendar.
Seasonality Is Structural -- Natural gas demand peaks in winter (heating) and summer (cooling). Crop prices follow planting and harvest cycles. Gasoline demand peaks in summer driving season. These patterns repeat every year with measurable hit rates and are baked into the curve shape.
Positioning Data Is Public -- The CFTC publishes weekly data showing exactly how commercial hedgers, swap dealers, and speculative funds are positioned. When speculators hit extreme positions (>2 standard deviations), reversals happen. This data is free and has no equivalent in stock markets.
Cross-Commodity Relationships Are Tradeable -- Commodities exist in supply chains: crude oil becomes gasoline (crack spread), soybeans become meal and oil (crush spread), natural gas generates electricity (spark spread). These processing margins mean-revert. Ratio trades between related commodities (gold/silver, copper/gold, corn/wheat) have decades of statistical backing.
What: Every active Kalshi market related to commodities, economic data, and monetary policy.
Markets to track:
What to pull per market:
Source: Kalshi API (https://trading-api.kalshi.com/trade-api/v2) Auth: RSA-PSS signed requests (existing Kalshi Edge Bot code handles this). Use Kalshi testnet first to validate auth before connecting to production. Consider websocket connections for real-time data to reduce REST API call volume. [PANEL: Grok 4 Fast] Rate limits: Not publicly documented -- implement exponential backoff, alert if >1 silent failure per hour. [PANEL: DeepSeek V3.1] Collection frequency: Every 15 minutes during US market hours (9:30 AM - 4:00 PM ET) for high-conviction markets only. Hourly for all others. Snapshot at 8:00 AM ET (pre-market). Scale back to hourly across the board if rate limiting is detected. Fallback: Parse Kalshi web UI (slower) if API unavailable; skip Kalshi and use Polymarket only as last resort. [PANEL: Sonar Reasoning Pro]
What: Same categories as Kalshi. Different liquidity pool (crypto-native traders), so prices often diverge.
What to pull per market:
Source: Polymarket CLOB API (https://clob.polymarket.com/) -- free, no auth needed for read. Stability note: Crypto platforms change APIs frequently. Implement with exponential backoff; fallback to web scraping if API changes. [PANEL: Grok 4 Fast] Collection frequency: Same as Kalshi -- every 15 min during market hours for active markets, hourly otherwise. Fallback: Polymarket web UI or manual order book read every hour if CLOB API unavailable. [PANEL: Sonar Reasoning Pro]
Execution feasibility layer [PANEL: Sonar Reasoning Pro]: Before sizing any prediction market trade, calculate true edge after execution costs:
interface LiquidityAssessment {
contract: string;
venue: 'kalshi' | 'polymarket' | 'futures';
desiredSize: number;
estimatedSlippage: number; // % (Kalshi ~0.5-1%, Polymarket ~1-3%)
minConvictionThreshold: number; // only trade if conviction score > this
recommendedSize: number; // max position after slippage drag
}
// Examples:
// CL (WTI crude): $500K, 0.08% slippage, conviction threshold 50
// Kalshi CPI binary: $50K desired, 2.5% slippage, conviction threshold 80, recommended $15K
What: The single most important weekly energy data release. Crude oil inventories, production, imports, and refinery utilization at the PADD (regional) level. Moves crude and product prices instantly on publication.
Source: EIA API (https://api.eia.gov/v2/) API Key: Free -- register at https://www.eia.gov/opendata/register.php Rate limit: 1000 requests/hour.
What to pull:
Key API endpoints:
GET /v2/petroleum/stoc/wstk/data/ with facets for product (EPC0=crude, EPM0=gasoline, EPD0=distillate) and duoarea (R10-R50 for PADDs, R60 for SPR)GET /v2/petroleum/pnp/wiup/data/GET /v2/petroleum/crd/crpdn/data/GET /v2/petroleum/pri/spt/data/Collection frequency: Weekly. Report released Wednesdays 10:30 AM ET (delayed 1 day if Monday holiday). Pull immediately on release. Store historical for 5-year seasonal comparison. Fallback: EIA CSV download (weekly) if API unavailable; parse EIA website directly as last resort. [PANEL: Sonar Reasoning Pro]
Data quality rules [PANEL: Sonar Reasoning Pro]:
What: Weekly injection/withdrawal data for underground natural gas storage. The second most market-moving weekly report.
Source: EIA API -- GET /v2/natural-gas/stor/wkly/data/
What to pull:
Collection frequency: Weekly. Released Thursdays 10:30 AM ET. Fallback: EIA website HTML table scraper.
What: Monthly 2-year forecast of US and global energy supply, demand, prices, and production.
Source: EIA website (https://www.eia.gov/outlooks/steo/) -- downloadable tables in Excel/CSV. What to pull:
Collection frequency: Monthly, released ~10th of each month.
What: Monthly report on drilling efficiency and DUC (drilled but uncompleted) well count across 7 major US shale basins.
Source: EIA (https://www.eia.gov/petroleum/drilling/) -- downloadable spreadsheets. What to pull:
Collection frequency: Monthly.
What: Weekly count of active drilling rigs in the US and Canada by basin and target (oil vs gas).
Source: Baker Hughes (https://rigcount.bakerhughes.com/) -- free weekly PDF/Excel. Use EIA's incorporated copy of Baker Hughes data for API access (EIA incorporates it with short lag, cleaner than scraping BH directly). [PANEL: Opus review] What to pull:
Collection frequency: Weekly, released Fridays 1:00 PM ET.
What: OPEC's own assessment of global supply, demand, and production compliance.
Source: OPEC website (https://www.opec.org/opec_web/en/publications/338.htm) -- free PDF/tables. What to pull:
Collection frequency: Monthly, typically released mid-month.
What: Processing margins that measure refinery profitability. Calculated from freely available price data.
Spreads to calculate:
Source: Calculate from EIA spot prices or Polygon.io futures (preferred over Yahoo Finance). What to store: Absolute value, Z-score vs 1/3/5-year history, percentile rank, seasonal adjustment. Collection frequency: Daily at market close. Formula validation test [PANEL: Sonar Reasoning Pro]:
// Unit test to validate crack spread formula
test('crack_spread_calculation', () => {
const wti = 70; // $/barrel
const rbob = 2.10; // $/gallon
const ho = 2.00; // $/gallon
// 3-2-1 crack spread -- historically $10-40/bbl
const crackSpread = (2 * rbob * 42 + 1 * ho * 42) / 3 - wti;
// Must be positive (refineries profitable) and in $10-40 range
expect(crackSpread).toBeGreaterThan(0);
expect(crackSpread).toBeLessThan(60);
});
What: Daily settlement prices for the three benchmark energy contracts.
Source: Polygon.io (preferred -- $29/month, shared subscription, reliable). FRED for official daily spot fallback (DCOILWTICO, DHHNGSP, DCOILBRENTEU). Yahoo Finance (CL=F, NG=F, BZ=F) as last resort only -- ticker formats change monthly and data has 15-min delay. Collection frequency: Daily at market close. Store full history for seasonal analysis.
What: Exchange-monitored warehouse inventory levels for deliverable metals. The physical supply barometer.
What to pull per exchange:
Collection frequency: Daily for COMEX/LME. Weekly for SHFE.
What: How much gold central banks (especially China, India, Turkey, Poland) are buying or selling.
Source: World Gold Council (https://www.gold.org/goldhub/data/gold-reserves-by-country) -- quarterly report, free summary data. IMF IFS database (https://data.imf.org/) for monthly reserve changes. Collection frequency: Monthly (IMF data) and quarterly (WGC comprehensive).
What: Key ratios between related metals that reveal macro regime and mean-revert.
Ratios to calculate daily:
Source: Polygon.io preferred; FRED (GOLDAMGBD228NLBM, SLVPRUSD) as fallback. What to store: Ratio value, 20/60/200-day moving averages, Z-score vs 1/3/5-year history, percentile rank. Collection frequency: Daily at market close.
What: The price spread between Shanghai (SHFE) and London (LME) copper, adjusted for shipping, tariffs, and VAT.
Source: Calculate from SHFE and LME daily prices (use Investing.com/TradingView for SHFE data rather than scraping SHFE directly). What to track:
SHFE_price / (LME_price × exchange_rate × 1.13_VAT × 1.02_tariff + freight + 0.005_conversion_cost) [PANEL: DeepSeek V3.1 -- adds 0.5% RMB conversion cost often omitted]Collection frequency: Daily.
What: World Agricultural Supply and Demand Estimates. The single most important monthly report for grain, oilseed, cotton, and livestock markets. Updated the ~10th of each month.
Source: USDA PSD Online API (https://apps.fas.usda.gov/PSDOnlineV2/api/) -- free, no key needed, returns JSON. Rate limit: ~100 requests/hour; build retry logic with exponential backoff. Cache aggressively -- WASDE data only changes monthly, so once pulled, do not re-pull until the next release date. [PANEL: Opus review] Fallback: USDA PDF table parser if API has multi-day outages.
What to pull per commodity (corn, soybeans, wheat, cotton, rice, sugar):
Collection frequency: Monthly, released ~10th of each month at 12:00 PM ET. Pull immediately on release.
Revision handling [PANEL: Sonar Reasoning Pro]: When USDA revises WASDE backwards, update the stored data but do NOT fire a new signal. Flag as revision_type = 'backward_revision' in the database. Treat as noise unless it confirms a multi-month trend.
What: Weekly update on planting progress, crop condition ratings, and harvest progress. Published during the growing season (April-November).
Source: USDA NASS QuickStats API (https://quickstats.nass.usda.gov/api/api_GET/) -- free, requires API key. What to pull:
Collection frequency: Weekly, released Mondays 4:00 PM ET during growing season.
What: Weekly data on US agricultural export commitments (sales) and actual shipments (inspections).
Source:
What to pull:
Collection frequency: Weekly (Thursday for sales, Monday for inspections). FGIS daily during peak export season.
What: Survey of farmer planting intentions for the upcoming growing season. Released once a year at the end of March.
Source: USDA NASS and QuickStats API. What to pull:
Collection frequency: Annual (late March), with June Acreage update.
What: Monthly inventory reports for livestock. Published ~20th of each month.
Source: USDA NASS via QuickStats API. What to pull:
Collection frequency: Monthly.
What: Monthly frozen meat and poultry inventory in US cold storage warehouses.
Source: USDA NASS and QuickStats API. What to pull:
Collection frequency: Monthly, released ~22nd.
What: South American production estimates from local government agencies. Brazil and Argentina are the #1 and #3 global soybean exporters.
Source (simplified approach): [PANEL: Grok 4 Fast]
What to pull:
Collection frequency: Monthly for USDA FAS/Conab; weekly from Argentine exchanges during Nov-Apr growing season.
What: Results of US Treasury debt auctions. The bond market equivalent of earnings reports.
Source: Treasury Fiscal Data API (https://api.fiscaldata.treasury.gov/services/api/fiscal_service/v1/accounting/od/auctions_query) -- free, returns JSON. What to pull:
Collection frequency: On auction days (schedule known months in advance). 2Y/5Y auctions typically Tuesday/Wednesday, 10Y/30Y on Wednesday/Thursday.
What: Daily yields across the full maturity curve.
Source: FRED API (https://api.stlouisfed.org/fred/series/observations) -- free, API key required. Rate limit: 120 requests/minute. FRED Series IDs:
Calculated metrics:
Collection frequency: Daily. Fallback: FRED CSV download (weekly) if API unavailable; Yahoo Finance for yields (delayed but functional).
What: The difference between nominal Treasury yields and TIPS (inflation-protected) yields. This IS the market's inflation expectation.
Source: FRED API -- T5YIE, T10YIE, T5YIFR. What to track:
Collection frequency: Daily.
What: The combined Treasury General Account balance and Reverse Repo Facility usage. Together, these measure how much cash is available to buy assets.
Source:
What to calculate:
Collection frequency: Daily for RRP. Weekly for TGA (WTREGEN). Weekly for balance sheet (WALCL).
What: Market-implied probabilities of Fed rate changes at each upcoming FOMC meeting, derived from Fed Funds futures pricing.
Source: CME Group FedWatch tool (https://www.cmegroup.com/markets/interest-rates/cme-fedwatch-tool.html) -- free, scrapeable. Or calculate directly from 30-Day Fed Funds futures prices on FRED (FF1, FF2, etc.). What to track:
Collection frequency: Daily at market close. Extra snapshot after any major data release (CPI, NFP, GDP).
What: The Federal Reserve's System Open Market Account -- what the Fed actually owns on its balance sheet (Treasuries and MBS by maturity).
Source: NY Fed API -- free. Key metrics only: [PANEL: Opus review -- simplify from full maturity distribution]
Collection frequency: Weekly.
What: Secured Overnight Financing Rate. The benchmark short-term interest rate that replaced LIBOR.
Source: FRED API -- series SOFR (daily). Collection frequency: Daily.
What: Settlement prices for the front 6 contract months + the 12th month endpoint for each major futures contract. Front 6 + 12th month is sufficient for 95% of term structure analysis; contracts 13+ have <1% volume. [PANEL: Grok 4 Fast]
Contracts to track:
Source: Polygon.io ($29/month, shared with Options desk) -- MANDATORY from Week 1, not Month 2. Yahoo Finance scraping will break; ticker formats change monthly and break curve calculations within 3-6 months. [ALL PANEL MEMBERS]
What to store per contract per day:
Open Interest price matrix [PANEL: Opus review -- calculated from already-collected data, $0 additional cost]:
Collection frequency: Daily at market close (6:00 PM ET).
Physical delivery notices [PANEL: Grok 4 Fast]:
What: Daily classification of each commodity's term structure shape and how it is changing.
Calculate from curve data:
What: When each futures contract expires and rolls to the next month.
Source: CME Group contract specifications pages -- free. What to track:
Collection frequency: Static calendar, updated once per contract month. Volume crossover monitored daily during roll periods.
What: The price difference between specific contract months for the same commodity.
Key spreads to track:
What to store: Spread value, Z-score vs 1/3/5-year history, percentile rank, seasonal pattern. Collection frequency: Daily.
What: Weekly breakdown of futures positions by trader type across all major contracts. This is the most important positioning dataset in existence.
Trader categories:
Source: CFTC.gov -- Disaggregated Futures-Only report.
Also pull: Futures-and-Options Combined report [PANEL: Gemini 3.1 Pro]:
Also pull: ICE Commitments of Traders for Brent [PANEL: DeepSeek V3.1]:
What to pull per commodity (15 key contracts: CL, BZ, NG, RB, HO, GC, SI, HG, ZC, ZS, ZW, SB, KC, ZN, ES):
Collection frequency: Weekly. Released Friday 3:30 PM ET. Data is as of prior Tuesday close (3-day lag).
What: Statistical context for raw COT positions. Raw numbers mean nothing without historical context.
Calculate for each trader category, each commodity:
CRITICAL interpretation guidance [PANEL: Sonar Reasoning Pro]: Managed money extremes are NOT automatically reversal signals. Managed money are trend-followers and STAY extreme for weeks/months during strong trends. The correct conviction scoring approach:
Managed money extremes alone: 5 points max (downgraded from primary signal)
Dealer flip: 20 points (rare and real -- structural flow shift)
Commercial hedger flip: 15 points (physical supply/demand guys know the market)
Dealer-speculator divergence: 12 points (managed money short + dealers net long = fight)
Managed money extreme WITH dealer confirmation: 18 points (both agree = high conviction)
Managed money extreme WITHOUT dealer confirmation = crowded trade, high reversal RISK but not a standalone signal.
What: Monitor for swap dealer net position changing sign (from net long to net short, or vice versa).
Detection rule: Swap dealer net position changes sign from prior week AND the Z-score of the new position is >0.5 (not just noise around zero). Flag immediately in daily alert.
Backtest requirement [PANEL: Sonar Reasoning Pro]: Before treating dealer flips as Tier-1 signals, validate that they precede major moves (2-6 week lag) in at least 60% of historical cases per commodity. If backtesting shows <60% correlation, demote to Tier 2. Track every dealer flip with its subsequent 4-week return.
Persistence requirement: A flip that reverses within 1 week is noise. Require the flip to persist for 2+ consecutive weeks before upgrading to high-conviction signal.
What: Additional COT report formats.
Collection frequency: Weekly (same release schedule as disaggregated).
What: The Climate Prediction Center's probabilistic temperature and precipitation forecasts for the next 6-10 and 8-14 days.
Source: NOAA CPC (https://www.cpc.ncep.noaa.gov/products/predictions/) Collection frequency: Updated daily by CPC.
Supplement with global ensemble models [PANEL: Grok 4 Fast]:
High-frequency weather alerts [PANEL: GPT-4.1]:
What: Quantitative measure of energy demand driven by temperature deviation from 65F baseline.
Source: NOAA (https://www.cpc.ncep.noaa.gov/products/analysis_monitoring/cdus/degree_days/) -- free. Also EIA includes degree day data in their weekly reports. What to track:
Collection frequency: Weekly (published with EIA gas storage). Daily forecasts from NWS.
What: Weekly assessment of drought conditions across the continental US.
Source: https://droughtmonitor.unl.edu/DmData/DataDownload.aspx -- free CSV/Shapefile, updated weekly (Tuesdays). What to track:
Collection frequency: Weekly.
What: ENSO phase determines global weather patterns for 6-18 months.
Source: NOAA CPC ENSO page (https://www.cpc.ncep.noaa.gov/products/analysis_monitoring/enso_advisory/) -- free. What to track:
Collection frequency: Monthly (CPC updates monthly, with weekly SST data).
What: Tropical storm and hurricane positions, forecasts, and intensity for the Atlantic basin.
Source: NOAA National Hurricane Center (https://www.nhc.noaa.gov/gis/) What to track:
Collection frequency: Every 6 hours during active storms. Daily scan during hurricane season (June 1 - November 30).
What: Historical seasonal price patterns for each commodity, with statistical validation.
Calculate from 10-20 years of price data:
IMPORTANT -- Regime-conditional seasonality [PANEL: Sonar Reasoning Pro]: Raw seasonal hit rates are insufficient. Crowded patterns (summer crude driving demand) are already in the curve. Use regime-conditional approach:
interface SeasonalPattern {
commodity: string;
season: string;
regimes: {
'supply_tight': { hitRate: number; avgReturn: number };
'supply_ample': { hitRate: number; avgReturn: number };
'demand_surge': { hitRate: number; avgReturn: number };
};
}
// Only trade seasonal if current regime historically supports it.
// Seasonal aligned with tight inventory + extreme managed money short = high conviction.
// Ignore seasonal if term structure contradicts it (natgas in contango during winter = demand destruction overrides seasonal).
Collection frequency: Calculate once, update annually.
What: Near-real-time fire/hotspot detection from satellite thermal sensors.
Source: NASA FIRMS (https://firms.modaps.eosdis.nasa.gov/) -- free. Data available within 3 hours of satellite pass; add 6-hour delay disclaimer for signals. [PANEL: DeepSeek V3.1] What to monitor:
Collection frequency: Daily during relevant seasons (May-Nov). Defer full implementation to v2; use Grok/Sonar web search weekly to query "Brazil fires soybean" as cheaper interim approach. [PANEL: Opus review]
What: Mississippi River water levels at key gauges.
Source: USGS real-time water data (https://waterdata.usgs.gov/nwis/rt) -- free. Key gauges: Memphis (primary alert gauge -- set alert for <0 feet), St. Louis, Vicksburg. Simplified approach [PANEL: Opus review]: Monitor Memphis gauge. Alert if below 0 feet or above flood stage. Check weekly during risk seasons (Sep-Nov for low water, Mar-May for flooding). Do not poll daily year-round -- this is a low-frequency signal.
Collection frequency: Weekly check. Event-driven (hourly) when Memphis gauge is within 20% of alert thresholds.
v1 (Week 1-2 build): [PANEL: Opus review, Grok 4 Fast, DeepSeek V3.1 all agree -- start with 6]
| # | Ratio | Calculation | What It Means | Mean-Reversion Signal |
|---|---|---|---|---|
| 1 | Gold/Silver | GC / SI | Risk sentiment. >80 = recession fear, <60 = growth. | Z-score > 2 in either direction |
| 2 | Oil/NatGas | CL / NG | Energy substitution. 10:1 = BTU parity. | >25:1 = gas too cheap |
| 3 | Copper/Gold | HG / GC | Global growth proxy. Tracks 10Y yields closely. | Divergence from 10Y yield > 2 Z-scores |
| 4 | Soybean/Corn | ZS / ZC | Planting decision driver. >2.5 = plant beans, <2.2 = plant corn. | Extreme readings in March signal acreage shift |
| 5 | Brent/WTI | BZ - CL | Logistics/geopolitics. | Z-score > 2 from 3-year mean |
| 6 | Crack Spread | (2xRBx42 + HOx42)/3 - CL | Refinery margin. Mean ~$20. | >$40 or <$10 = unsustainable |
v2 additions (add only after core 6 are battle-tested): | 7 | Platinum/Gold | PL / GC | At historic lows. | Ratio < 0.5 = extreme long platinum reversion | | 8 | Crush Spread | ZM + ZL value - ZS cost | Soybean processing margin. | Z-score > 2 | | 9 | Spark Spread | Electricity - (NG x heat rate) | Power plant profit. | Regional | | 10 | Corn/Wheat | ZC / ZW | Feed grain substitution. | Corn >90% wheat = feed switching | | 11 | Gold/Copper x DXY | (GC/HG) x DXY | Combined risk filter. | Divergence from VIX | | 12 | Natgas/Crude BTU | (NG x 5.8) / CL | Energy-equivalent pricing. Note: use 5.8 (precise BTU factor), not 6. [PANEL: Opus review] | >0.5 = gas expensive |
Source for all: Polygon.io preferred. FRED for DXY (DTWEXBGS). Yahoo Finance as emergency fallback only. What to store: Ratio value, 20/60/200-day MAs, Z-score vs 1/3/5-year history, percentile rank. Collection frequency: Daily at market close.
| Currency Pair | Commodity Link | Why |
|---|---|---|
| AUD/USD | Copper, Iron Ore | Australia is a major copper/iron ore exporter |
| USD/CAD | WTI Crude | Canada's economy is oil-dependent |
| BRL/USD | Soybeans, Coffee | Brazil is the #1 exporter of both |
| NOK/USD | Brent Crude | Norway's sovereign wealth fund is oil-funded |
| DXY (broad dollar) | Commodity basket | Inverse correlation: strong dollar = cheap commodities |
Source: FRED API for exchange rates (DEXUSAL, DEXCAUS, DEXBZUS, DEXNOUS, DTWEXBGS). What to calculate: 60-day rolling correlation between each pair. Flag when correlation breaks. Collection frequency: Daily.
Chains to monitor:
Collection frequency: Weekly assessment using already-collected data.
Source: National Bureau of Statistics (http://www.stats.gov.cn/english/). Also Trading Economics/Investing.com for parsed data. What to track:
Collection frequency: Monthly, last day of month.
Source: GACC (http://english.customs.gov.cn/) -- free, English. Also Trading Economics as backup. What to track:
Collection frequency: Monthly (~15th of following month).
Source: Investing.com or TradingView (more reliable than SHFE English website). [PANEL: Opus review] Collection frequency: Weekly (published each Friday).
What: South Korea publishes export data on the 1st of each month with a 20-day flash estimate. Source: Korea Customs Service (https://unipass.customs.go.kr/ets/index_eng.do) -- free, English. What to track:
Defer to v2 [PANEL: Opus review]: Korean chip exports lead copper/palladium demand through a long causal chain. The direct copper demand signals (SHFE stocks, LME stocks, China PMI, SHFE-LME arb) are stronger. Build Korean tracking only if copper signals prove insufficient.
What: One row per trading day with every relevant flag and event marker.
SQLite CREATE statement:
CREATE TABLE IF NOT EXISTS daily_state (
date TEXT PRIMARY KEY, -- YYYY-MM-DD
-- Energy calendar
is_eia_petroleum_day INTEGER DEFAULT 0, -- Wednesday
is_eia_natgas_day INTEGER DEFAULT 0, -- Thursday
is_opec_report_day INTEGER DEFAULT 0,
is_baker_hughes_day INTEGER DEFAULT 0, -- Friday 1PM
-- Agriculture calendar
is_wasde_day INTEGER DEFAULT 0,
is_crop_progress_day INTEGER DEFAULT 0, -- Monday, growing season
is_export_sales_day INTEGER DEFAULT 0, -- Thursday
is_prospective_plantings INTEGER DEFAULT 0, -- end of March
is_cattle_on_feed_day INTEGER DEFAULT 0,
-- CFTC
is_cot_release_day INTEGER DEFAULT 0, -- Friday 3:30 PM
-- Fed/Treasury
is_fomc_day INTEGER DEFAULT 0,
is_fomc_eve INTEGER DEFAULT 0,
days_to_next_fomc INTEGER,
is_treasury_auction INTEGER DEFAULT 0,
auction_tenor TEXT, -- '10Y', '30Y', etc.
-- Macro releases
is_cpi_day INTEGER DEFAULT 0,
is_ppi_day INTEGER DEFAULT 0,
is_nfp_day INTEGER DEFAULT 0, -- first Friday of month
is_gdp_day INTEGER DEFAULT 0,
is_ism_pmi_day INTEGER DEFAULT 0, -- 1st business day of month
is_pce_day INTEGER DEFAULT 0,
is_jobless_claims_day INTEGER DEFAULT 0, -- Thursday
is_adp_day INTEGER DEFAULT 0, -- Wednesday before NFP
econ_release_tier INTEGER, -- 1=CPI/NFP/FOMC, 2=PPI/GDP/ISM, 3=minor
-- Contract rolls
contracts_rolling TEXT, -- JSON list of symbols in roll window
is_goldman_roll INTEGER DEFAULT 0, -- 5-9 business days before month end
days_to_contract_expiry INTEGER, -- front-month futures days to expiry
-- Options/calendar events [PANEL: Opus review]
is_options_expiry INTEGER DEFAULT 0, -- triple/quad witching
is_end_of_quarter_rebalance INTEGER DEFAULT 0,
is_us_holiday INTEGER DEFAULT 0,
is_uk_holiday INTEGER DEFAULT 0, -- LME closed
is_china_holiday INTEGER DEFAULT 0, -- SHFE closed
vix_regime TEXT, -- 'low' (<20), 'elevated' (20-30), 'high' (>30)
-- Weather
hurricane_active INTEGER DEFAULT 0,
extreme_weather_flag TEXT, -- 'polar_vortex', 'heat_wave', etc.
-- China
is_china_pmi_day INTEGER DEFAULT 0,
is_china_customs_day INTEGER DEFAULT 0,
-- General
is_month_end INTEGER DEFAULT 0,
is_quarter_end INTEGER DEFAULT 0,
is_half_day INTEGER DEFAULT 0,
created_at TEXT DEFAULT (datetime('now'))
);
CREATE INDEX IF NOT EXISTS idx_daily_state_date ON daily_state(date);
Collection frequency: Generated daily at 5:00 AM ET. Calendar data sourced from CME, USDA, EIA, BLS, Treasury release schedules.
Tier 1 (move futures 1%+ regularly):
Tier 2 (move futures 0.3-1%):
Tier 3 (minor, context only):
What: For every Tier 1 and Tier 2 release, store the median analyst forecast before release and calculate surprise = actual - consensus. Prediction markets price RELATIVE TO CONSENSUS, not the raw number. This is the single most important missing piece in the original spec.
Source:
What to store:
SQL:
CREATE TABLE IF NOT EXISTS economic_surprises (
id INTEGER PRIMARY KEY AUTOINCREMENT,
release_date TEXT NOT NULL, -- YYYY-MM-DD
release_time TEXT, -- HH:MM ET
indicator_name TEXT NOT NULL, -- 'CPI', 'NFP', 'EIA_crude', etc.
tier INTEGER NOT NULL, -- 1, 2, or 3
consensus_estimate REAL,
actual_value REAL,
prior_value REAL,
surprise REAL, -- actual - consensus
surprise_sigma REAL, -- surprise / historical_std
is_upward_revision INTEGER, -- for WASDE/STEO revisions
source TEXT, -- 'investing_com', 'tradingeconomics', etc.
created_at TEXT DEFAULT (datetime('now'))
);
CREATE INDEX IF NOT EXISTS idx_surprises_date ON economic_surprises(release_date);
CREATE INDEX IF NOT EXISTS idx_surprises_indicator ON economic_surprises(indicator_name);
Collection frequency: Pull consensus pre-release (morning of release day). Pull actual at time of release. Calculate surprise immediately.
What: Released quarterly at FOMC meetings (March, June, September, December). The dot plot shows each Fed governor's rate forecast. The median dot IS the Fed's forward guidance.
Source: Federal Reserve website (federalreserve.gov/monetarypolicy/fomccalendars.htm). FRED: FEDTARMD for median target rate projections. Parse from PDF tables. What to track:
Collection frequency: Quarterly (4 times/year at FOMC meetings).
Initial and Continuing Jobless Claims (MUST-HAVE):
Atlanta Fed GDPNow (MUST-HAVE):
Cleveland Fed Inflation Nowcast (MUST-HAVE):
Collection frequency: Jobless claims weekly (Thursday). GDPNow and CPI Nowcast daily when updated.
What: VIX (equity volatility) and MOVE (bond volatility index). VIX >30 changes the correlation structure of ALL commodities (correlations go to 1 in a crisis). MOVE >150 makes Treasury auction results much more volatile.
Source:
Collection frequency: Daily.
What: Prediction markets are essentially binary options. Pricing Kalshi "WTI > $80" accurately requires knowing the market's implied volatility (OVX) and skew. Futures prices alone only give you the at-the-money expectation.
Source:
Collection frequency: Daily.
What: HY-IG (high yield minus investment grade) spreads signal risk appetite before commodities move. Wide spreads = demand destruction coming = commodity bearish.
Source: FRED:
Collection frequency: Daily.
What: Shelter CPI is 36% of CPI index and lags real-time rent data by 12-18 months. Tracking leading indicators of shelter allows 6+ month CPI forecasts. Supercore (core services ex-housing) is the Fed's current focus.
Source:
Collection frequency: Monthly at BLS releases. Weekly for ISM sub-components.
What: Nitrogen dioxide concentrations over refinery complexes. Drop in NO2 = unplanned outage, 12-48 hours before news.
Source: Copernicus Open Access Hub or Google Earth Engine -- free, requires registration. Refinery complexes: Houston Ship Channel, Rotterdam/Antwerp, Jamnagar, Ras Tanura, Singapore. Collection frequency: Daily satellite pass. Defer to v2 [PANEL: Opus review, Gemini 3.1 Pro, Grok 4.1 -- all agree]: Requires multi-GB NetCDF/HDF5 downloads, geospatial processing, cloud-cover corrections (~40% of passes blocked). Interim approach: EPA CAMD continuous emissions monitoring (free, near-real-time for US facilities) + industry newsletter monitoring for refinery alerts.
Source: GridStatus.io -- free tier with real-time locational marginal prices (LMPs). What to track: Real-time LMP by region (ERCOT, PJM, MISO, SPP, CAISO, NYISO). Day-ahead vs real-time price spread. Collection frequency: Hourly during business hours.
Source: MarineTraffic, VesselFinder, AISHub -- free tiers. Warning [PANEL: Gemini 3.1 Pro]: MarineTraffic and VesselFinder use aggressive Cloudflare/Datadome bot protection. Scraping from a VPS IP is unreliable. Practical approach [PANEL: Opus review]: Use Grok/Sonar web search weekly to query "floating storage VLCC count" and "China crude oil tanker arrivals." Industry analysts publish these counts for free on Twitter/LinkedIn. Build automated AIS pipeline only if manual monitoring proves consistently valuable over 3+ months. Collection frequency: Weekly manual scan. Automated pipeline deferred to v2.
Source: EPA EMTS -- free but significant lag. Defer to v2 [PANEL: Opus review]: Small marginal signal relative to WASDE surprises; data has significant collection lag.
Source: AAR (https://www.aar.org/data-research/rail-traffic-data/) -- free weekly summary. What to track: Total carloads by commodity group, YoY change, 4-week MA, grain carloads. Collection frequency: Weekly (released Wednesdays).
Source: Free delayed data via Yahoo Finance (^BDI); Hellenic Shipping News for decomposition. What to track: BDI composite, Capesize sub-index (iron ore/coal), Panamax sub-index (grain). Collection frequency: Daily.
What: Container shipping rates. Leading indicator of consumer goods inflation (affects Fed policy expectations). Source: Freightos.com -- free delayed data. Collection frequency: Weekly.
What: Normalized Difference Vegetation Index from Landsat/Sentinel satellites provides real-time crop health at county level, detecting yield issues 4-6 weeks before USDA reports. Source: NASA Earthdata API (free) -- https://earthdata.nasa.gov/. USGS CropScape -- https://www.nass.usda.gov/Research_and_Science/Cropland/data_products.php. Defer to v2: Requires significant geospatial processing. Interim: use USDA crop progress good/excellent ratings as proxy.
What: The same signal means different things in different market regimes. Classify regime daily before scoring any trade.
enum MarketRegime {
RISK_ON = 'equities up, VIX < 15, credit spreads tight, correlations low',
RISK_OFF = 'equities down, VIX > 20, credit spreads wide, correlations high',
SUPPLY_SHOCK = 'backwardation, commercial hedgers extreme, inventory draws',
DEMAND_SHOCK = 'contango, managed money extreme long, inventory builds',
NORMAL = 'everything else'
}
// Regime detection from:
// 1. VIX level + 5-day trend
// 2. HY-IG spread (Section L.59)
// 3. 2s10s curve shape
// 4. Cross-commodity correlation (if all moving together = risk-off)
// 5. Futures curve structure (backwardation vs contango)
// Regime-specific conviction weight adjustments:
const convictionAdjustment: Record<MarketRegime, Record<string, number>> = {
[MarketRegime.RISK_ON]: { fundamentals: 0.30, momentum: 0.35, positioning: 0.15, curve: 0.15, seasonal: 0.05 },
[MarketRegime.RISK_OFF]: { fundamentals: 0.20, momentum: 0.50, positioning: 0.10, curve: 0.10, seasonal: 0.10 },
[MarketRegime.SUPPLY_SHOCK]: { fundamentals: 0.50, positioning: 0.25, momentum: 0.10, curve: 0.10, seasonal: 0.05 },
[MarketRegime.NORMAL]: { fundamentals: 0.25, momentum: 0.20, positioning: 0.20, curve: 0.15, seasonal: 0.10 }
};
Store daily regime classification in daily_state table (add market_regime TEXT column).
What: A systematic scoring framework to separate real signals from noise and traps. Every trade idea gets scored before analysts see it.
Components (apply regime-adjusted weights):
| # | Component | Base Weight | Description |
|---|---|---|---|
| 1 | Fundamental Direction | 25 | Do government data sources (EIA, USDA, CFTC) agree? Unanimous (25), majority (15), mixed (5), contradictory (0). |
| 2 | Positioning Confirmation | 20 | See COT scoring table in Section 34. Dealer flip + commercial (20), dealer flip alone (15), commercial flip alone (12), managed money extreme + dealer confirmation (18), managed money extreme alone (5), all categories crowded (0). |
| 3 | Term Structure Signal | 15 | Backwardation supports bullish (15), neutral (8), curve contradicts (0). |
| 4 | Cross-Commodity Confirmation | 15 | 3+ correlated assets confirming (15), 1-2 (8), contradicting (0). |
| 5 | Seasonal Alignment | 10 | Strong seasonal in correct regime with >65% hit rate (10), moderate (5), fighting seasonal (0). |
| 6 | Freshness / Timeliness | 15 | Data <24h old (15), <1 week (10), >1 week (5), >2 weeks (0). COT data always 3 days stale -- COT freshness = max(10, 15 - days_since_release). [PANEL: Grok 4 Fast] |
Initial weights are estimates only [PANEL: Opus review, Sonar Reasoning Pro]: Start with equal weights (16.7% each) if preferred; the weights above are reasonable initial guesses. After 3 months of logged outcomes, run logistic regression to determine empirical weights. Label weights as 'v1_estimate' in config. Target: calibrate after 100 settled predictions.
Backtesting requirement [PANEL: Sonar Reasoning Pro]: Before deploying scoring, run a 3-year backtest on past commodity moves:
// Track outcome by conviction score decile
// Target: decile 9 (80-90 conviction) should be >60% accurate
// If any decile shows <50% accuracy = investigate immediately
// Weekly Brier score calibration after launch
Total possible: 100. Minimum threshold: 55 to pass to analysts. Liquidity-adjusted threshold: For prediction markets, raise minimum threshold to 70 due to execution costs. For liquid futures (CL, GC, ZN), 55 is appropriate.
interface ValidationResult {
source: string;
dataQuality: 'validated' | 'revised' | 'preliminary' | 'delayed' | 'failed';
revisionRisk: 'low' | 'medium' | 'high';
confidenceMultiplier: 0.5 | 0.7 | 1.0;
lastSuccessfulFetch: Date;
lagVsExpected: number; // days behind expected release
}
// Validation rules:
// EIA: if data age > 8 days, quality = 'delayed', confidence *= 0.7
// EIA: if marked 'preliminary', revisionRisk = 'high', confidence *= 0.8
// CFTC: verify COT release happened (don't assume Friday 3:30PM always occurs)
// WASDE backward revision: flag as noise, do not fire new signal
// Satellite: reject if cloud cover > 30%
// Any source: alert if > 48 hours stale vs expected release schedule
Fallback sources table [PANEL: Sonar Reasoning Pro]:
| Primary | Fallback 1 | Fallback 2 |
|---|---|---|
| Kalshi API | Parse Kalshi web UI | Polymarket only |
| Polymarket CLOB | Polymarket web UI | Manual hourly |
| FRED API | FRED CSV download | Yahoo Finance (yields) |
| Polygon (futures) | Alpha Vantage free tier | Yahoo Finance (emergency) |
| EIA API | EIA CSV download | Parse EIA website |
| USDA WASDE API | USDA PDF parser | News headlines + consensus |
| Investing.com calendar | TradingEconomics | MarketWatch calendar |
Operational SLA:
These rules reject a trade idea regardless of conviction score:
Circular Source Chain: If the bullish case for commodity X relies on data derived from commodity X's price, reject. Example: "gold should go up because gold ETF inflows are rising" -- the inflows ARE the price move.
Staleness Window Exceeded: Primary data source has not updated in >2x its normal frequency. Stale data = trading yesterday's signal.
Regime Change Override: If the 60-day rolling correlation between the signal source and the commodity has flipped sign (e.g., was +0.7, now -0.3), reject until new regime is understood.
Narrative vs Data Mismatch: Trade thesis relies on qualitative narrative but quantitative data contradicts it. Data wins.
Single-Source Dependency: Entire trade thesis rests on one data point with no corroboration. Require at least 2 independent data sources.
Data Quality Failure: Primary source has confidenceMultiplier < 0.7 AND no fallback source available.
These don't auto-reject but add a warning flag visible to analysts:
Every Friday after market close, calculate calibration metrics:
// Group predictions by conviction score decile (0-10, 10-20, ..., 90-100)
// For each decile where outcome is known:
// - Calculate Brier score
// - Calculate win rate
// - Track vs historical
// Alert conditions:
// - Any decile shows <50% win rate: investigate immediately
// - High conviction (80+) shows <60% win rate: weights are wrong
// - Any single data source correlates with unusually low accuracy: signal may be dead
// Output: weekly calibration report to EDGE TEAM group
Prediction tracking table:
CREATE TABLE IF NOT EXISTS prediction_log (
id INTEGER PRIMARY KEY AUTOINCREMENT,
prediction_date TEXT NOT NULL,
commodity TEXT NOT NULL,
direction TEXT NOT NULL, -- 'bullish', 'bearish'
venue TEXT NOT NULL, -- 'futures', 'kalshi', 'polymarket'
conviction_score REAL NOT NULL,
regime TEXT,
primary_signals TEXT, -- JSON array of signal names
entry_price REAL,
target_price REAL,
settlement_date TEXT,
settled_price REAL,
outcome TEXT, -- 'win', 'loss', 'open'
brier_score REAL,
return_pct REAL,
created_at TEXT DEFAULT (datetime('now'))
);
CREATE INDEX IF NOT EXISTS idx_pred_date ON prediction_log(prediction_date);
CREATE INDEX IF NOT EXISTS idx_pred_outcome ON prediction_log(outcome);
-- Enable WAL mode for concurrent write performance
PRAGMA journal_mode=WAL;
PRAGMA synchronous=NORMAL;
-- ============================================================
-- ENERGY DATA
-- ============================================================
CREATE TABLE IF NOT EXISTS eia_petroleum_weekly (
id INTEGER PRIMARY KEY AUTOINCREMENT,
report_date TEXT NOT NULL, -- YYYY-MM-DD (Wednesday)
padd TEXT NOT NULL, -- 'US', 'PADD1'..'PADD5', 'SPR', 'Cushing'
crude_stocks_kb REAL, -- thousand barrels
gasoline_stocks_kb REAL,
distillate_stocks_kb REAL,
crude_imports_kbd REAL, -- thousand barrels/day
crude_production_kbd REAL,
refinery_util_pct REAL,
refinery_inputs_kbd REAL,
gasoline_supplied_kbd REAL, -- demand proxy
vs_5yr_avg_pct REAL, -- current vs 5-year seasonal average
wow_change_kb REAL, -- week-over-week change
consensus_crude REAL, -- pre-release consensus
surprise_crude REAL, -- actual - consensus
data_type TEXT DEFAULT 'final', -- 'advance', 'final'
created_at TEXT DEFAULT (datetime('now'))
);
CREATE UNIQUE INDEX IF NOT EXISTS idx_eia_pet_date_padd ON eia_petroleum_weekly(report_date, padd);
CREATE TABLE IF NOT EXISTS eia_natgas_storage (
id INTEGER PRIMARY KEY AUTOINCREMENT,
report_date TEXT NOT NULL,
region TEXT NOT NULL, -- 'Total', 'East', 'Midwest', 'Mountain', 'Pacific', 'SouthCentral'
storage_bcf REAL,
net_change_bcf REAL,
vs_5yr_avg_pct REAL,
vs_prior_yr_pct REAL,
consensus_bcf REAL,
surprise_bcf REAL,
created_at TEXT DEFAULT (datetime('now'))
);
CREATE UNIQUE INDEX IF NOT EXISTS idx_eia_gas_date_region ON eia_natgas_storage(report_date, region);
-- ============================================================
-- FUTURES PRICES
-- ============================================================
CREATE TABLE IF NOT EXISTS futures_prices (
id INTEGER PRIMARY KEY AUTOINCREMENT,
trade_date TEXT NOT NULL, -- YYYY-MM-DD
symbol TEXT NOT NULL, -- root symbol: CL, GC, ZC, etc.
contract_month TEXT NOT NULL, -- YYYY-MM (expiry month)
expiry_date TEXT,
months_to_expiry INTEGER,
open REAL,
high REAL,
low REAL,
settle REAL NOT NULL,
volume INTEGER,
open_interest INTEGER,
oi_change INTEGER, -- daily OI change
price_change REAL, -- daily settle change
oi_price_signal TEXT, -- 'new_longs','short_cover','new_shorts','long_liq'
spread_vs_front REAL, -- price difference vs front month
source TEXT DEFAULT 'polygon',
created_at TEXT DEFAULT (datetime('now'))
);
CREATE UNIQUE INDEX IF NOT EXISTS idx_futures_date_sym_month ON futures_prices(trade_date, symbol, contract_month);
CREATE INDEX IF NOT EXISTS idx_futures_trade_date ON futures_prices(trade_date);
CREATE INDEX IF NOT EXISTS idx_futures_symbol ON futures_prices(symbol);
-- ============================================================
-- TERM STRUCTURE ANALYTICS
-- ============================================================
CREATE TABLE IF NOT EXISTS term_structure (
id INTEGER PRIMARY KEY AUTOINCREMENT,
calc_date TEXT NOT NULL,
symbol TEXT NOT NULL,
structure_type TEXT NOT NULL, -- 'contango', 'backwardation', 'mixed'
front_price REAL,
sixth_month_price REAL,
twelfth_month_price REAL,
front_back_spread REAL, -- front minus 6th month
annualized_roll_yield REAL, -- %
curve_slope_percentile REAL, -- vs 1-year history
curve_slope_pct_3yr REAL, -- vs 3-year history
structure_change TEXT, -- 'flattening', 'steepening', 'stable'
created_at TEXT DEFAULT (datetime('now'))
);
CREATE UNIQUE INDEX IF NOT EXISTS idx_term_struct_date_sym ON term_structure(calc_date, symbol);
-- ============================================================
-- CFTC COT DATA
-- ============================================================
CREATE TABLE IF NOT EXISTS cot_positions (
id INTEGER PRIMARY KEY AUTOINCREMENT,
report_date TEXT NOT NULL, -- report as-of date (Tuesday)
release_date TEXT, -- actual release date (Friday)
symbol TEXT NOT NULL, -- CL, GC, ZC, etc.
report_type TEXT NOT NULL, -- 'disaggregated', 'tff', 'legacy', 'cit'
category TEXT NOT NULL, -- 'prod_merch','swap_dealer','managed_money','other','commercial','noncommercial','index_trader'
long_contracts INTEGER,
short_contracts INTEGER,
spreading_contracts INTEGER,
net_position INTEGER, -- long - short
long_pct_oi REAL,
short_pct_oi REAL,
wow_net_change INTEGER,
net_z52 REAL, -- z-score vs 52-week rolling
net_z3yr REAL, -- z-score vs 3-year rolling
net_percentile REAL, -- percentile rank vs 3-year
consecutive_direction_weeks INTEGER, -- weeks in same direction
is_extreme INTEGER DEFAULT 0, -- net_z52 > 2 or < -2
dealer_flip INTEGER DEFAULT 0, -- swap dealer net changed sign this week
created_at TEXT DEFAULT (datetime('now'))
);
CREATE UNIQUE INDEX IF NOT EXISTS idx_cot_date_sym_type_cat ON cot_positions(report_date, symbol, report_type, category);
CREATE INDEX IF NOT EXISTS idx_cot_dealer_flip ON cot_positions(dealer_flip, report_date);
-- ============================================================
-- PREDICTION MARKET SNAPSHOTS
-- ============================================================
CREATE TABLE IF NOT EXISTS prediction_market_snapshots (
id INTEGER PRIMARY KEY AUTOINCREMENT,
snapshot_time TEXT NOT NULL, -- ISO8601
platform TEXT NOT NULL, -- 'kalshi', 'polymarket'
market_id TEXT NOT NULL,
title TEXT,
category TEXT, -- 'commodity', 'macro', 'rates', 'crypto'
expiry_date TEXT,
settlement_source TEXT, -- what data Kalshi uses to resolve
yes_bid REAL,
yes_ask REAL,
yes_last REAL,
no_bid REAL,
no_ask REAL,
volume REAL,
open_interest REAL,
book_depth_json TEXT, -- JSON: top 2 levels for all, top 5 for high-volume
is_high_volume INTEGER DEFAULT 0, -- volume > 100 contracts/day
created_at TEXT DEFAULT (datetime('now'))
);
CREATE INDEX IF NOT EXISTS idx_pm_time ON prediction_market_snapshots(snapshot_time);
CREATE INDEX IF NOT EXISTS idx_pm_market ON prediction_market_snapshots(market_id, platform);
-- Store OHLCV daily summaries to manage storage
CREATE TABLE IF NOT EXISTS prediction_market_daily (
id INTEGER PRIMARY KEY AUTOINCREMENT,
trade_date TEXT NOT NULL,
platform TEXT NOT NULL,
market_id TEXT NOT NULL,
yes_open REAL,
yes_high REAL,
yes_low REAL,
yes_close REAL,
volume REAL,
eod_book_json TEXT, -- end-of-day order book snapshot only
created_at TEXT DEFAULT (datetime('now'))
);
CREATE UNIQUE INDEX IF NOT EXISTS idx_pm_daily_date_mkt ON prediction_market_daily(trade_date, market_id, platform);
-- ============================================================
-- MACRO / FRED DATA
-- ============================================================
CREATE TABLE IF NOT EXISTS macro_series (
id INTEGER PRIMARY KEY AUTOINCREMENT,
obs_date TEXT NOT NULL, -- YYYY-MM-DD
series_id TEXT NOT NULL, -- FRED series ID or custom
series_name TEXT,
value REAL,
wow_change REAL, -- week-over-week change
mom_change REAL, -- month-over-month change
z52 REAL, -- z-score vs 52-week
created_at TEXT DEFAULT (datetime('now'))
);
CREATE UNIQUE INDEX IF NOT EXISTS idx_macro_date_series ON macro_series(obs_date, series_id);
-- Key series: DGS2, DGS10, DGS30, T10YIE, T5YIFR, DFII10, SOFR, RRPONTSYD, WALCL,
-- WTREGEN, WLRRAFORL, VIXCLS, BAMLH0A0HYM2, BAMLC0A0CM, DTWEXBGS,
-- DEXUSAL, DEXCAUS, DEXBZUS, ICSA, CCSA
-- ============================================================
-- CROSS-COMMODITY RATIOS
-- ============================================================
CREATE TABLE IF NOT EXISTS commodity_ratios (
id INTEGER PRIMARY KEY AUTOINCREMENT,
calc_date TEXT NOT NULL,
ratio_name TEXT NOT NULL, -- 'gold_silver', 'oil_natgas', 'copper_gold', etc.
raw_value REAL NOT NULL,
ma20 REAL,
ma60 REAL,
ma200 REAL,
z52 REAL, -- z-score vs 52-week
z3yr REAL, -- z-score vs 3-year
percentile_1yr REAL,
percentile_3yr REAL,
percentile_5yr REAL,
is_extreme INTEGER DEFAULT 0, -- percentile > 90 or < 10
created_at TEXT DEFAULT (datetime('now'))
);
CREATE UNIQUE INDEX IF NOT EXISTS idx_ratios_date_name ON commodity_ratios(calc_date, ratio_name);
-- ============================================================
-- USDA DATA
-- ============================================================
CREATE TABLE IF NOT EXISTS wasde_estimates (
id INTEGER PRIMARY KEY AUTOINCREMENT,
release_date TEXT NOT NULL,
commodity TEXT NOT NULL, -- 'corn', 'soybeans', 'wheat', etc.
region TEXT NOT NULL, -- 'US', 'World'
production_mbu REAL, -- million bushels (or metric tons for world)
ending_stocks_mbu REAL,
stocks_to_use_pct REAL,
exports_mbu REAL,
yield_bu_acre REAL,
area_harvested_mA REAL, -- million acres
mom_revision REAL, -- current minus prior month ending stocks
consensus_stocks REAL, -- pre-release trade estimate
surprise REAL, -- actual - consensus
revision_type TEXT, -- 'upward', 'downward', 'backward', 'no_change'
created_at TEXT DEFAULT (datetime('now'))
);
CREATE UNIQUE INDEX IF NOT EXISTS idx_wasde_date_com_region ON wasde_estimates(release_date, commodity, region);
-- ============================================================
-- TREASURY AUCTIONS
-- ============================================================
CREATE TABLE IF NOT EXISTS treasury_auctions (
id INTEGER PRIMARY KEY AUTOINCREMENT,
auction_date TEXT NOT NULL,
security_type TEXT NOT NULL, -- '2Y', '5Y', '10Y', '30Y', 'TIPS'
high_yield REAL, -- stop-out rate %
bid_to_cover REAL,
tail_bps REAL, -- high yield minus when-issued, basis points
indirect_pct REAL, -- % of award to indirect bidders
direct_pct REAL,
dealer_pct REAL,
btc_z1yr REAL, -- bid-to-cover z-score vs 1-year same tenor
tail_z1yr REAL, -- tail z-score vs 1-year same tenor
indirect_z1yr REAL,
is_weak INTEGER DEFAULT 0, -- tail > 1bps AND btc < 1yr avg AND indirect < 1yr avg
created_at TEXT DEFAULT (datetime('now'))
);
CREATE UNIQUE INDEX IF NOT EXISTS idx_auctions_date_type ON treasury_auctions(auction_date, security_type);
-- ============================================================
-- METALS WAREHOUSE STOCKS
-- ============================================================
CREATE TABLE IF NOT EXISTS metals_warehouse (
id INTEGER PRIMARY KEY AUTOINCREMENT,
report_date TEXT NOT NULL,
exchange TEXT NOT NULL, -- 'COMEX', 'LME', 'SHFE'
metal TEXT NOT NULL, -- 'gold', 'silver', 'copper', 'aluminum', etc.
total_stocks REAL, -- in appropriate unit (troy oz, lots, tonnes)
registered_stocks REAL, -- COMEX: available for delivery
eligible_stocks REAL,
on_warrant REAL, -- LME: available
cancelled_warrants REAL, -- LME: being withdrawn (bullish demand signal)
cancelled_pct REAL, -- cancelled / total * 100
largest_location_pct REAL, -- squeeze risk indicator
wow_change REAL,
created_at TEXT DEFAULT (datetime('now'))
);
CREATE UNIQUE INDEX IF NOT EXISTS idx_metals_wh_date_exch_metal ON metals_warehouse(report_date, exchange, metal);
-- ============================================================
-- CONVICTION SCORES
-- ============================================================
CREATE TABLE IF NOT EXISTS conviction_scores (
id INTEGER PRIMARY KEY AUTOINCREMENT,
scored_at TEXT NOT NULL,
commodity TEXT NOT NULL,
direction TEXT NOT NULL, -- 'bullish', 'bearish'
venue TEXT, -- 'futures', 'kalshi', 'polymarket'
market_regime TEXT,
-- Component scores
fundamental_score REAL,
positioning_score REAL,
curve_score REAL,
cross_commodity_score REAL,
seasonal_score REAL,
freshness_score REAL,
-- Totals
total_score REAL NOT NULL,
passes_threshold INTEGER DEFAULT 0,
-- Flags
trap_flags_json TEXT, -- JSON array of triggered trap flags
rejection_reason TEXT, -- if rejected, why
data_quality_json TEXT, -- JSON: ValidationResult for each source
passed_to_analysts INTEGER DEFAULT 0,
created_at TEXT DEFAULT (datetime('now'))
);
CREATE INDEX IF NOT EXISTS idx_conviction_scored ON conviction_scores(scored_at);
CREATE INDEX IF NOT EXISTS idx_conviction_passes ON conviction_scores(passes_threshold);
-- ============================================================
-- ECONOMIC SURPRISES
-- ============================================================
CREATE TABLE IF NOT EXISTS economic_surprises (
id INTEGER PRIMARY KEY AUTOINCREMENT,
release_date TEXT NOT NULL,
release_time TEXT,
indicator_name TEXT NOT NULL,
tier INTEGER NOT NULL,
consensus_estimate REAL,
actual_value REAL,
prior_value REAL,
surprise REAL,
surprise_sigma REAL,
is_backward_revision INTEGER DEFAULT 0,
source TEXT,
created_at TEXT DEFAULT (datetime('now'))
);
CREATE UNIQUE INDEX IF NOT EXISTS idx_surprises_date_ind ON economic_surprises(release_date, indicator_name);
-- ============================================================
-- PREDICTION LOG (accountability)
-- ============================================================
CREATE TABLE IF NOT EXISTS prediction_log (
id INTEGER PRIMARY KEY AUTOINCREMENT,
prediction_date TEXT NOT NULL,
commodity TEXT NOT NULL,
direction TEXT NOT NULL,
venue TEXT NOT NULL,
conviction_score REAL NOT NULL,
regime TEXT,
primary_signals TEXT, -- JSON
entry_price REAL,
target_price REAL,
settlement_date TEXT,
settled_price REAL,
outcome TEXT, -- 'win', 'loss', 'open', 'hold'
brier_score REAL,
return_pct REAL,
created_at TEXT DEFAULT (datetime('now'))
);
CREATE INDEX IF NOT EXISTS idx_pred_date ON prediction_log(prediction_date);
CREATE INDEX IF NOT EXISTS idx_pred_outcome ON prediction_log(outcome, settlement_date);
-- ============================================================
-- WEATHER DATA
-- ============================================================
CREATE TABLE IF NOT EXISTS hdd_cdd (
id INTEGER PRIMARY KEY AUTOINCREMENT,
week_ending TEXT NOT NULL,
region TEXT NOT NULL,
hdd_actual REAL,
hdd_normal REAL,
hdd_vs_normal REAL,
cdd_actual REAL,
cdd_normal REAL,
cdd_vs_normal REAL,
ytd_hdd_vs_normal REAL,
created_at TEXT DEFAULT (datetime('now'))
);
CREATE UNIQUE INDEX IF NOT EXISTS idx_hdd_week_region ON hdd_cdd(week_ending, region);
-- ============================================================
-- DAILY STATE TABLE (see Section 52 for full schema)
-- ============================================================
-- Already defined above with all fields including panel additions.
-- Daily signal dashboard: commodities with extreme COT positioning + supporting fundamentals
SELECT
c.symbol,
c.category,
c.net_z52,
c.net_percentile,
c.dealer_flip,
c.consecutive_direction_weeks,
t.structure_type,
t.annualized_roll_yield,
r.raw_value AS ratio_value,
r.z3yr AS ratio_z3yr
FROM cot_positions c
LEFT JOIN term_structure t ON t.symbol = c.symbol AND t.calc_date = (SELECT MAX(calc_date) FROM term_structure WHERE symbol = c.symbol)
LEFT JOIN commodity_ratios r ON r.calc_date = (SELECT MAX(calc_date) FROM commodity_ratios)
WHERE c.report_date = (SELECT MAX(report_date) FROM cot_positions)
AND c.category = 'swap_dealer'
AND (c.dealer_flip = 1 OR ABS(c.net_z52) > 2)
ORDER BY ABS(c.net_z52) DESC;
-- Prediction market mispricing vs Cleveland Fed CPI nowcast
SELECT
p.market_id,
p.title,
p.yes_last AS kalshi_yes_price,
m.value AS cleveland_fed_nowcast,
(p.yes_last - m.value) AS implied_edge
FROM prediction_market_snapshots p
CROSS JOIN macro_series m
WHERE p.platform = 'kalshi'
AND p.category = 'macro'
AND p.title LIKE '%CPI%'
AND m.series_id = 'ClevelandFedCPINowcast'
AND p.snapshot_time > datetime('now', '-1 hour')
AND m.obs_date = (SELECT MAX(obs_date) FROM macro_series WHERE series_id = 'ClevelandFedCPINowcast')
ORDER BY ABS(implied_edge) DESC;
-- Weekly Brier score calibration by conviction decile
SELECT
CAST(conviction_score / 10 AS INTEGER) * 10 AS decile_min,
COUNT(*) AS predictions,
AVG(brier_score) AS avg_brier,
SUM(CASE WHEN outcome = 'win' THEN 1 ELSE 0 END) * 1.0 / COUNT(*) AS win_rate
FROM prediction_log
WHERE outcome IN ('win', 'loss')
AND prediction_date > date('now', '-90 days')
GROUP BY 1
ORDER BY 1;
-- EIA report surprise history vs next-day futures price move (validate surprise signal)
SELECT
s.release_date,
s.indicator_name,
s.surprise,
s.surprise_sigma,
f.price_change AS next_day_cl_move
FROM economic_surprises s
LEFT JOIN futures_prices f ON f.trade_date = date(s.release_date, '+1 day')
AND f.symbol = 'CL' AND f.contract_month = (
SELECT MIN(contract_month) FROM futures_prices WHERE symbol = 'CL' AND trade_date = date(s.release_date, '+1 day'))
WHERE s.indicator_name = 'EIA_crude_inventories'
ORDER BY s.release_date DESC
LIMIT 52;
-- COT dealer flip tracker with subsequent returns
SELECT
c.report_date,
c.symbol,
c.net_position,
c.net_z52,
f4.settle AS price_4wk_later,
f0.settle AS price_at_flip,
((f4.settle - f0.settle) / f0.settle * 100) AS return_4wk_pct
FROM cot_positions c
LEFT JOIN futures_prices f0 ON f0.symbol = c.symbol
AND f0.trade_date = c.report_date
AND f0.months_to_expiry = 1
LEFT JOIN futures_prices f4 ON f4.symbol = c.symbol
AND f4.trade_date = date(c.report_date, '+28 days')
AND f4.months_to_expiry = 1
WHERE c.category = 'swap_dealer'
AND c.dealer_flip = 1
ORDER BY c.report_date DESC;
-- Cross-venue Kalshi vs Polymarket divergence alerts
SELECT
k.market_id AS kalshi_id,
p.market_id AS polymarket_id,
k.title,
k.yes_last AS kalshi_yes,
p.yes_last AS polymarket_yes,
(p.yes_last - k.yes_last) AS spread
FROM prediction_market_snapshots k
JOIN prediction_market_snapshots p ON p.title LIKE '%' || SUBSTR(k.title, 1, 20) || '%'
AND p.platform = 'polymarket'
AND p.snapshot_time > datetime('now', '-2 hours')
WHERE k.platform = 'kalshi'
AND k.snapshot_time > datetime('now', '-2 hours')
AND ABS(p.yes_last - k.yes_last) > 0.10
ORDER BY ABS(p.yes_last - k.yes_last) DESC;
| Data | Source | Endpoint | Auth | Rate Limit |
|---|---|---|---|---|
| EIA petroleum | EIA API | GET /v2/petroleum/stoc/wstk/data/ |
Free API key | 1000/hr |
| EIA natgas | EIA API | GET /v2/natural-gas/stor/wkly/data/ |
Free API key | 1000/hr |
| FRED series | FRED | GET /fred/series/observations?series_id={ID} |
Free API key | 120/min |
| Treasury auctions | Fiscal Data | GET /v1/accounting/od/auctions_query |
None | Generous |
| CFTC COT | CFTC.gov | Download: /dea/newcot/f_disagg.txt |
None | None |
| CFTC CIT | CFTC.gov | Download: /dea/newcot/deacit.txt |
None | None |
| USDA PSD | USDA FAS | GET /PSDOnlineV2/api/ |
None (~100/hr) | Low -- cache monthly |
| USDA QuickStats | USDA NASS | GET /quickstats/api/api_GET/ |
Free API key | Moderate |
| USDA exports | USDA FAS | https://apps.fas.usda.gov/export-sales/ | None | None |
| Kalshi markets | Kalshi | GET /trade-api/v2/markets |
RSA-PSS signed | Undocumented -- use backoff |
| Polymarket | Polymarket CLOB | GET https://clob.polymarket.com/markets |
None (read) | Unknown -- use backoff |
| Futures prices | Polygon.io | GET /v2/aggs/ticker/{ticker}/range/{mult}/{span}/{from}/{to} |
API key ($29/mo) | Tiered |
| NY Fed SOMA | NY Fed | GET /api/soma/summary.json |
None | Generous |
| NY Fed RRP | NY Fed | GET /api/rp/reverserepo/search.json |
None | Generous |
| CME FedWatch | CME | Scrape: cmegroup.com/markets/interest-rates/cme-fedwatch-tool.html | None | Scraping |
| NOAA CPC | NOAA | https://www.cpc.ncep.noaa.gov/products/predictions/ | None | None |
| Drought Monitor | NDMC | https://droughtmonitor.unl.edu/DmData/DataDownload.aspx | None | None |
| NHC Hurricane | NOAA | GET https://api.weather.gov/alerts |
None | None |
| GACC Customs | China Customs | http://english.customs.gov.cn/ | None | Scraping |
| NBS PMI | China NBS | http://www.stats.gov.cn/english/ | None | Scraping |
| SHFE stocks | Investing.com/TV | Scraping | None | Scraping |
| BLS CPI subs | BLS API | POST /publicAPI/v2/timeseries/data/ |
Free API key | 500/day |
| GDPNow | Atlanta Fed | https://www.atlantafed.org/cqer/research/gdpnow | None | Scraping |
| CPI Nowcast | Cleveland Fed | https://www.clevelandfed.org/indicators-and-data/inflation-nowcasting | None | Scraping |
| Jobless claims | FRED | series ICSA, CCSA | Free API key | 120/min |
| ISM sub-indices | FRED | NAPMPI, NAPMNOI, NAPMEI | Free API key | 120/min |
| OVX (oil vol) | FRED or CBOE | CBOE OVX, GVZ | None | None |
| Credit spreads | FRED | BAMLH0A0HYM2, BAMLC0A0CM | Free API key | 120/min |
| ICE Brent COT | ICE | https://www.theice.com/marketdata/reports/82 | None | Weekly PDF |
| GridStatus | GridStatus.io | REST API | Free tier | 100/day free |
| AAR Rail | AAR | https://www.aar.org/data-research/rail-traffic-data/ | None | Scraping |
| BDI | Yahoo Finance | ^BDI | None | Scraping |
| FBX Freight | Freightos | https://fbx.freightos.com/api/ | Free delayed | None |
# Daily -- 5:00 AM ET: Build daily_state table for today
0 5 * * 1-5 node collectors/daily-state-builder.js
# Daily -- 6:00 PM ET: After futures market close -- prices, ratios, Z-scores
0 18 * * 1-5 node collectors/futures-daily.js
0 18 * * 1-5 node collectors/ratios-calculator.js
0 18 * * 1-5 node collectors/fred-daily.js
# Prediction market -- 15 min during market hours for active markets, hourly otherwise
*/15 9-16 * * 1-5 node collectors/prediction-markets-active.js
0 * * * * node collectors/prediction-markets-hourly.js
0 8 * * 1-5 node collectors/prediction-markets-premarket.js
# Wednesday 10:30 AM ET: EIA petroleum report
30 10 * * 3 node collectors/eia-petroleum.js
# Thursday 10:30 AM ET: EIA natgas storage
30 10 * * 4 node collectors/eia-natgas.js
# Thursday 8:30 AM ET: Jobless claims + ADP Wednesday release
30 8 * * 4 node collectors/jobless-claims.js
# ADP: Wednesday 8:15 AM ET (2 days before NFP)
15 8 * * 3 node collectors/adp-employment.js # run only on NFP week
# Friday 3:30 PM ET: CFTC COT release
30 15 * * 5 node collectors/cftc-cot.js
# Friday 1:00 PM ET: Baker Hughes rig count
0 13 * * 5 node collectors/baker-hughes.js
# Monday 4:00 PM ET: USDA crop progress (growing season Apr-Nov)
0 16 * * 1 node collectors/usda-crop-progress.js # seasonal
# Monthly ~10th: WASDE release (12:00 PM ET)
30 12 8-15 * * node collectors/usda-wasde.js # check daily state for is_wasde_day
# Monthly last day: China NBS PMI
0 9 28-31 * * node collectors/china-pmi.js # check daily state for is_china_pmi_day
# Weekly Sunday: Season pattern and Z-score recalculation
0 20 * * 0 node analytics/recalculate-zscores.js
# Weekly Friday: Brier score calibration
0 17 * * 5 node analytics/calibration-weekly.js
# On auction days: Treasury auction results (check daily state)
0 14 * * 1-5 node collectors/treasury-auction.js # runs, checks if auction day
# Monthly: EIA STEO, DPR, OPEC MOMR, USDA exports, central bank gold
0 12 1-20 * * node collectors/monthly-data.js # checks daily_state for each flag
| Data Category | Daily Size | Annual Size | Notes |
|---|---|---|---|
| Futures prices + curves (20 contracts, 7 months each) | ~150KB | ~40MB | |
| EIA petroleum + natgas + production | ~50KB/week | ~3MB | |
| USDA WASDE + crop progress + exports | ~100KB/week | ~5MB | |
| CFTC COT (15 contracts, 3 report types + CIT) | ~250KB/week | ~13MB | |
| Treasury yields + auctions + Fed data | ~50KB/day | ~12MB | |
| Spreads + ratios + Z-scores | ~100KB/day | ~25MB | |
| Kalshi + Polymarket (OHLCV daily + intraday for active markets only) | ~500KB/day | ~125MB | Revised down from 500MB -- store daily OHLCV for all, intraday only for high-conviction |
| Economic surprises + consensus | ~10KB/week | ~0.5MB | |
| Weather outlooks + HDD/CDD | ~20KB/day | ~5MB | |
| China data + Korean exports | ~50KB/month | ~1MB | |
| Daily state table | ~5KB/day | ~1MB | |
| Metals warehouse stocks | ~20KB/day | ~5MB | |
| Macro series (FRED, credit, vol) | ~50KB/day | ~12MB | |
| Alt data (AIS manual, rail, BDI, FBX) | ~20KB/day | ~5MB | |
| Conviction scores + prediction log | ~20KB/day | ~5MB | |
| Total | ~258MB/year | Comfortably within SQLite/VPS capacity |
SQLite settings for production:
PRAGMA journal_mode=WAL;
PRAGMA synchronous=NORMAL;
PRAGMA cache_size=-64000; -- 64MB cache
PRAGMA temp_store=MEMORY;
| Period | Cost | What You Get |
|---|---|---|
| Weeks 1-3 | $0 | All government data (EIA, USDA, CFTC, FRED, BLS, Treasury, NOAA), prediction market scrapers, full analytical engine, conviction scoring, data quality validation |
| Month 2+ | $29/month | Polygon.io (shared with Options desk) -- MANDATORY from Week 1, not Month 2 |
| Month 3+ | $0-49/month | Quandl/Nasdaq Data Link for clean CFTC COT API if DIY parsing proves fragile |
| Year 1 Total | ~$290-500 | Complete Futures desk data infrastructure |
The key insight: 90%+ of data that moves commodity markets is published free by US government agencies. Start with government data, use Polygon for price reliability, use Grok/Sonar web search to fill gaps while automated pipelines are being built.
| Step | Component | Effort | Why First |
|---|---|---|---|
| 1 | SQLite schema + FRED API client | 2-3 days | Single API key unlocks yields, breakevens, SOFR, RRP, VIX, credit spreads, exchange rates, commodity spots, jobless claims. Highest-leverage single integration. Schema must be right on day 1. |
| 2 | Polygon.io futures price client | 2-3 days | Reliable prices are the foundation. Yahoo Finance WILL break within months. $29/month from day 1, not month 2. |
| 3 | EIA API client (petroleum + natgas) | 3-4 days | Wednesday petroleum and Thursday natgas are the two most market-moving weekly events. First edge signals. Add consensus collection to calculate surprise. |
| 4 | Kalshi + Polymarket API clients | 1-2 days | Execution venues. Reuse existing Kalshi Edge Bot code. Implement rate limiting and fallbacks from day 1. |
| 5 | Economic consensus collector | 2-3 days | MISSING from original spec. Consensus vs actual is the core signal for CPI/NFP/GDP Kalshi contracts. Scrape Investing.com calendar. |
| 6 | Daily state table + cron skeleton | 2-3 days | Traffic cop for the entire system. 5AM ET build, then triggers for Wed/Thu/Fri weekly events, monthly WASDE, etc. |
| 7 | CFTC COT parser + Z-score engine | 3-5 days | Positioning backbone. Use Quandl free tier for clean data. Calculate net positions, Z-scores, extreme flags, dealer flip detection for 15 contracts. |
| 8 | Cross-commodity ratios (core 6) | 2-3 days | Pure math from collected prices. Gold/silver, oil/natgas, copper/gold, soybean/corn, Brent-WTI, crack spread. Immediate signals. |
| 9 | Term structure builder | 3-4 days | Full curve for each commodity. Daily contango/backwardation classification, annualized roll yield, calendar spread percentile ranking. |
| 10 | USDA WASDE + crop progress parsers | 3-4 days | Agriculture equivalent of EIA weekly. Surprise calculator vs trade consensus. |
| 11 | Treasury auction scorer + Fed data | 3-4 days | Z-score auction results, FedWatch probability tracking, Atlanta Fed GDPNow, Cleveland CPI Nowcast, jobless claims, dot plot. |
| 12 | Data quality validation layer | 2-3 days | Add before deploying conviction scoring. ValidationResult for each source, fallback routing, staleness alerts. |
| 13 | Regime detection + conviction scoring | 3-5 days | Regime classifier first, then 6-component conviction scorer with regime-adjusted weights. Mandatory rejection rules. Trap flags. |
| 14 | China data collectors | 2-3 days | NBS PMI, GACC customs, SHFE stocks (via Investing.com/TradingView). High value for metals and energy demand signals. |
| 15 | Weather integration | 2-3 days | NOAA CPC, HDD/CDD, drought monitor, El Nino status, hurricane season tracking. |
| 16 | Calibration + backtesting setup | 2-3 days | Weekly Brier score calculation. Prediction log queries. Conviction score calibration against historical moves. |
Defer to v2 (after core pipeline is stable):
Status: NOT BUILT — add when this desk goes live for execution
Current state: All Kalshi data (prices, order books, trades) is fetched via REST API polling on cron schedules. This is fine for edge detection and monitoring, but NOT sufficient for live trade execution.
Why WebSocket matters:
Note: This desk primarily trades on other exchanges, but Kalshi has event markets (rate decisions, index levels, crypto milestones) that overlap with this desk. WebSocket upgrade applies when trading any Kalshi markets.
Added 2026-03-29 — upgrade REST to WS when desk moves to live execution