OPTIONS DESK DATA COLLECTION SPEC — MERGED FINAL

Base: Claude Opus 4.6 (complete, unmodified) + Cherry-picked Panel Additions

Panel: Gemini 3.1 Pro | GPT-4.1 | Grok 4.1 Fast | Grok 4.1 | DeepSeek V3.1 | Sonar Reasoning Pro (unique additions tagged)

Trading venues: Options broker (TBD) is the PRIMARY venue. Kalshi and Polymarket are SECONDARY venues (prediction markets). Both prediction market platforms have ladder structures and are treated as equally viable targets throughout.

Purpose: Data collection spec for a coding bot. Includes SQLite CREATE statements, API endpoints, collection cadence, storage estimates, implementation checklist, and SQL queries.

Generated: 2026-03-15


OVERALL ASSESSMENT (Opus 4.6 — Lead Reviewer)

Grade: A-

Exceptionally thorough spec covering the full options data stack from raw chains to cross-market arbitrage. The main risks are operational — Yahoo Finance scraping fragility, SQLite write contention under 15-min polling for 30+ tickers, and the GEX calculation making a critical assumption about dealer directionality that is often wrong. The Form 424B2 NLP idea is genuinely novel but underestimates parsing difficulty by 5-10x.


SECTION A: OPTIONS CHAIN DATA

A.1 Primary Chain Source

Primary: Polygon.io (pay $29/month from Day 1 — do not attempt Yahoo Finance scraping for production use) Secondary/Fallback: OCC Daily Files (free, T+1, official) Tertiary: CBOE Delayed Quotes (15-min delay, more stable than Yahoo)

CRITICAL FLAW FIXED: Yahoo Finance has no official free API. The yfinance library and similar scrapers use Yahoo's internal endpoints which change without notice. Yahoo actively rate-limits and blocks scrapers. With 30+ tickers at 15-minute intervals, you will hit rate limits within days. Pay for Polygon.io from Week 1. The time spent debugging Yahoo scraping failures costs more than $29/month in lost productivity. Polygon's historical data also lets you backfill the IV rank database immediately rather than waiting 252 days.

A.2 Ticker Universe

Tier 1 — 15-minute intraday snapshots:

Tier 2 — Daily end-of-day snapshots:

[PANEL: DeepSeek V3.1] Sector rotation note: Options dealers hedge sector ETFs differently than the index. Knowing sector flows improves GEX accuracy for individual names. However, tracking full chains for XLF/XLE/XLK is redundant with SPY for most signals. MVP: pull sector ETF chains only for OI and P/C ratio, not full intraday snapshots.

A.3 Chain Fields Collected Per Contract

ticker, expiration_date, strike, option_type (C/P),
bid, ask, mid, last_price, volume, open_interest,
implied_volatility, delta, gamma, theta, vega, rho,
underlying_price, timestamp, spread_pct

Calculated field: spread_pct = (ask - bid) / mid

MUST-HAVE addition (Opus): Track bid-ask spread as percentage of mid. Options with >20% spread are not practically tradeable. Flag illiquid contracts before routing to any analysis. This is especially important on Robinhood where execution quality is worse than direct exchange access.

A.4 0DTE Separation

[PANEL: Opus — MUST-HAVE] 0DTE options now represent 40-50% of total SPY/QQQ volume on many days. Their gamma impact is disproportionate because gamma explodes as expiration approaches. The GEX calculation will be wildly wrong if it treats 0DTE and 30DTE gamma equally. Separate 0DTE gamma from total GEX and track the ratio over time.

Filter: WHERE DTE = 0 Aggregation: separate pass on chain data, stored in options_chain_0dte table.


SECTION B: GREEKS CALCULATIONS

B.1 DIY Greeks

Calculate locally using Black-Scholes (calls/puts with continuous dividend yield q):

d1 = (ln(S/K) + (r - q + 0.5 * σ²) * T) / (σ * sqrt(T))
d2 = d1 - σ * sqrt(T)

Delta (call) = N(d1)
Delta (put)  = N(d1) - 1
Gamma        = N'(d1) / (S * σ * sqrt(T))
Theta (call) = -(S * N'(d1) * σ) / (2 * sqrt(T)) - r * K * exp(-r*T) * N(d2) + q * S * exp(-q*T) * N(d1)
Vega         = S * exp(-q*T) * N'(d1) * sqrt(T)
Rho (call)   = K * T * exp(-r*T) * N(d2)

Dividend adjustment: Always include q (continuous dividend yield). SPY and QQQ pay dividends. Ignoring this makes your forward price wrong and your Kalshi/Polymarket probability bridge mathematically incorrect.

[PANEL: Gemini 3.1 Pro — MUST-HAVE] Collect dividend yield and ex-dividend dates for all tracked tickers. Source: Polygon.io Reference Data API or Financial Modeling Prep free tier.

[PANEL: Gemini 3.1 Pro — MUST-HAVE] Collect corporate actions (splits, mergers, special dividends). Options chains adjust for corporate actions — if a stock splits, historical GEX and strike tracking will break instantly if not adjusted. Source: Polygon.io Reference Data API or OCC Information Memos.

B.2 GEX (Gamma Exposure)

Standard formula:

GEX = Sum over all strikes and expirations:
  OI * Gamma * 100 * Spot * sign(option_type)

Where:
  Call options contribute: +OI * Gamma * 100 * Spot  (dealers short calls = short gamma on calls)
  Put options contribute:  -OI * Gamma * 100 * Spot   (dealers short puts = long gamma on puts)

Net GEX positive = dealers long gamma (they buy dips, sell rips — market stabilizing)
Net GEX negative = dealers short gamma (they amplify moves — market destabilizing)

CRITICAL FLAW (Opus): The assumption "Call OI = dealers sold calls" is the standard retail assumption but is frequently wrong. Dealers are not always the seller. When a hedge fund sells a call to another hedge fund, the dealer may not be involved. When retail buys puts on Robinhood, the market maker (Citadel/Wolverine) is indeed short those puts — but institutional flow is bilateral.

Fix:

  1. Use CBOE customer vs firm volume data (free, published daily) to estimate what fraction of OI is customer-originated (more likely dealer is counterparty) vs firm-originated
  2. Weight GEX by customer-origin probability
  3. Add a confidence band to GEX estimates rather than treating them as precise
  4. Backtest GEX levels against actual price behavior to calibrate empirically

[PANEL: Grok 4.1 Fast] Sign convention: Net GEX = (Put OI * Gamma_put * 100 * Spot) - (Call OI * Gamma_call * 100 * Spot) for standard dealer short-gamma assumption. Validate explicitly against SqueezeMetrics or SpotGamma free samples.

Gamma flip level: Strike where Net GEX crosses zero. Below this level, dealers are short gamma (amplifying). Above it, dealers are long gamma (stabilizing).

0DTE GEX: Track separately. Calculate 0DTE GEX / Total GEX ratio. Days where 0DTE > 40% of GEX behave differently — gamma pin effects are compressed to single-session timeframe.

B.3 Vanna and Charm

Vanna = dDelta/dVol = dVega/dS

Vanna = -N'(d1) * d2 / σ

Interpretation: When IV drops, delta changes. Vanna flow = dealer delta adjustments driven by vol changes. Positive vanna + falling VIX = dealers buy stock (supportive).

Charm = dDelta/dTime (delta decay)

Charm = -N'(d1) * (r - q + d1 * σ / (2 * sqrt(T))) / sqrt(T)

Interpretation: Delta changes as time passes even with no price move. Charm flow = systematic daily rehedging. End-of-day/end-of-week charm creates predictable directional pressure.

[PANEL: Sonar Reasoning Pro — MUST-HAVE] Calculate vega flip levels analogous to gamma flip levels. Vega flip = strike where aggregate dealer vega exposure crosses zero. When a dealer is long vega in front month and short vega in back months, their hedging creates a specific pattern. Track vega exposure across expirations, not just current IV surface.

Defer recommendation (Opus/Grok 4.1/DeepSeek): Vanna and Charm calculations are second-order signals. Build after core GEX and IV surface are stable and validated. Effort to build and maintain (daily recalculations, intraday triggers) outweighs predictive value for initial launch. However, include the table schema now so data is captured.

B.4 Greeks Validation

[PANEL: GPT-4.1 — MUST-HAVE] Implement a periodic validation routine to catch outliers or impossible values in chain data:

Run validation on every chain snapshot before inserting to database. Log all flagged records to data_quality_log table.


SECTION C: VOLATILITY DATA

C.1 VIX Suite (CBOE — free)

Index Description Cadence
VIX 30-day implied vol of S&P 500 Every 15 min during market hours
VVIX Vol of VIX (vol-of-vol) Daily
SKEW Tail risk (OTM put demand) Daily
VIX3M 3-month VIX Daily
VIX6M 6-month VIX Daily

VIX term structure: Track M1, M2, and the VIX futures month containing the next FOMC/CPI event. Do not track all 8-9 listed months (over-engineered, per Opus).

Contango/backwardation: M2 / M1 - 1. Positive = contango (normal, carry positive for short vol). Negative = backwardation (stress, hedging demand elevated).

Endpoints:

C.2 Realized Volatility (MUST-HAVE — Opus)

Critical gap in original spec: IV percentile alone is meaningless without knowing what vol actually realized. Build all three estimators:

Close-to-Close (simplest):

RV_cc = sqrt(252 / n * Sum(log(Ci / Ci-1)²))

Parkinson (High-Low):

RV_park = sqrt(252 / (4 * ln(2) * n) * Sum(log(Hi/Li)²))

Yang-Zhang (full OHLC, most accurate):

RV_yz = sqrt(252 * (σ_open² + k * σ_close² + (1-k) * σ_rs²))
Where k = 0.34 / (1.34 + (n+1)/(n-1)), using Rogers-Satchell variance

Windows: Calculate all three estimators at 5, 10, 20, 30, 60 trading day windows.

Source: Calculate from OHLC price data (Yahoo Finance historical is fine for daily OHLC — historical data is stable, intraday chain data is where Yahoo fails). Polygon.io when subscribed.

Variance Risk Premium (VRP):

VRP = VIX² / 252 - RV_20d²  (annualized, in vol units: VRP = VIX - RV_20d)

Positive VRP = selling vol is profitable on average. VRP regime shifts predict when vol selling stops working. This is the single most robust predictor of short-term option returns in academic literature.

C.3 Historical IV Database (MUST-HAVE — Opus)

Build incrementally from daily chain snapshots. Need at least 252 trading days of IV history per ticker to calculate:

Store daily ATM IV per ticker in iv_history table. Start collecting on Day 1 — this table self-populates over time.

Backfill: CBOE historical data (free for VIX, paid for individual stocks) or ivolatility.com (limited free). Polygon.io has historical options data — use to backfill on subscription.

C.4 IV Surface Construction

Simplified approach (Opus recommendation): For prediction market arbitrage, you only need:

  1. ATM IV per expiration (for VRP and vol regime)
  2. 25-delta skew (for tail risk)
  3. IV at specific strikes matching prediction market contract thresholds

Full IV surface at 7 delta points × 9 expirations is a quant desk project. Build the simplified version first.

IV Surface Storage: Store IV at 5-delta points (10d, 25d, 50d ATM, 75d, 90d) per expiration per ticker. Interpolate with cubic spline when needed for specific strikes.

C.5 Forward Vol and Event Vol Stripping

Purpose: Isolate the vol priced specifically for an upcoming event (earnings, FOMC, CPI) by comparing IV across expirations that bracket the event.

Formula:

Forward vol for event window [T1, T2]:
σ_fwd = sqrt((σ_T2² * T2 - σ_T1² * T1) / (T2 - T1))

Where T1 is the expiration just before the event and T2 is the first expiration after.

Event vol = σ_fwd minus expected non-event vol (estimated from non-event windows of similar length).

C.6 VSTOXX (European Vol)

Pull at 8 AM ET pre-market. Tracks European equity vol — leads VIX when European markets react to overnight news before US opens.

Caveat (Opus): VSTOXX leads VIX primarily due to time zone mechanics, not predictive alpha. By 8 AM ET, US futures have already incorporated the same overnight news. The actual lead time where VSTOXX contains information not yet in US futures is 0-15 minutes, not 30-90. Use VSTOXX/VIX spread instead: when European vol is unusually high relative to US vol, it signals a European-specific risk that hasn't fully transmitted.

Data source: Eurex delayed data (30-min delay on free feeds). Alternative: use VIX futures that trade 24/5 on CBOE — highly correlated to VSTOXX and available real-time for free.

Endpoint: https://www.stoxx.com/indices-data?isin=EU0009658152 (scrape) or Eurex delayed data feed.

C.7 Implied Correlation

[PANEL: Opus — NICE-TO-HAVE] CBOE implied correlation index (COR1M) for SPX constituent correlation. When implied correlation is high, single-stock options are cheap relative to index options (dispersion trade opportunity). Realized correlation calculated from price data already collected.

Source: https://www.cboe.com/tradable_products/vix/implied_correlation/ (free)


SECTION D: MACRO AND RATES DATA

D.1 FRED API Integration

Base URL: https://api.stlouisfed.org/fred/series/observations Auth: Free API key from fred.stlouisfed.org Cadence: Daily at 6 PM ET (most series update by then)

Series ID Description Frequency
SOFR Secured Overnight Financing Rate Daily
DGS2 2-Year Treasury Yield Daily
DGS10 10-Year Treasury Yield Daily
DGS30 30-Year Treasury Yield Daily
T10Y2Y 10Y-2Y Yield Curve Spread Daily
BAMLH0A0HYM2 HY Credit Spread (ICE BofA) Daily (T+1 lag)
BAMLC0A0CM IG Credit Spread (ICE BofA) Daily (T+1 lag)
RRPONTSYD Fed Reverse Repo (RRP) Daily
WTREGEN Treasury General Account (TGA) Weekly
WALCL Fed Balance Sheet Total Assets Weekly

FRED credit spread lag caveat (Sonar Reasoning Pro): FRED publishes at 5 PM ET the following day. Use bond ETF prices (HYG, JNK for HY; LQD for IG) for intraday real-time spread approximation. Bond ETFs update intraday and are free from Yahoo Finance historical or Polygon.io.

Real-time HY spread proxy:

HY_spread_intraday ≈ JNK_yield - DGS10_current
JNK_yield = (annual coupon / JNK_price) * 100 (simplified; use OAS when available)

D.2 CME FedWatch Implied Probabilities (MUST-HAVE — Opus)

Critical gap: The spec uses Kalshi for Fed rate probabilities but ignores the deepest, most liquid source: CME Fed Funds futures. Every institutional trader uses these as the benchmark. Comparing CME implied probabilities to Kalshi prices is a more robust arbitrage signal than using options-derived probabilities.

Source: CME FedWatch tool (free website scrape) at https://www.cmegroup.com/markets/interest-rates/cme-fedwatch-tool.html Alternative: FRED Fed Funds futures data (series FFXXX where XXX is contract month) Cadence: Every 15 minutes during market hours (FedWatch updates continuously)

Calculation:

Implied prob of rate change = (100 - Fed Funds futures price) / 100
Implied prob of hold = 1 - prob of change

Arbitrage signal: CME_implied_hold_prob - Kalshi_hold_price. If >5c, Kalshi is mispriced. If Polymarket has the same contract, compare all three.

D.3 Treasury Auction Results

Source: https://www.treasurydirect.gov/TA_WS/securities/search (JSON API, free) Cadence: Pull daily, filter for auctions in past 7 days Fields: auction_date, security_type, term, bid_to_cover, indirect_bidders_pct, high_yield, when_issued_spread

Signal: Weak bid-to-cover (<2.3 for 10Y) or high tail (high_yield > when_issued by >1bp) signals Treasury demand concern → rates vol spikes → impacts rate-sensitive options.

D.4 MOVE Index (Bond Vol)

Source: ICE (paid) or scrape from financial data providers Cadence: Daily

Opus recommendation: Replace MOVE tracking with 2Y/10Y yield spread (already in FRED pull). Treasury vol (MOVE) leads equity options less strongly than yield curve slope does. Keep it simple: pull T10Y2Y from FRED, calculate daily slope change. Drop MOVE unless you find a free reliable source.

[PANEL: Sonar Reasoning Pro — NICE-TO-HAVE] Treasury futures-to-bond basis (TYX spread). When Treasury futures and cash bonds disconnect, it reprices the entire SOFR curve, which cascades into rate-sensitive options. Calculate from 10Y futures (free CME delayed) vs 10Y cash yield from FRED. Signal: basis widening → rate vol about to spike.

D.5 CFTC COT Options-Only Report

Source: https://www.cftc.gov/dea/options/deaoiall.htm (free weekly CSV) Cadence: Every Friday after 3:30 PM ET (data as of Tuesday) Fields: market_name, noncomm_positions_long, noncomm_positions_short, comm_positions_long, comm_positions_short, nonrept_positions_long, nonrept_positions_short

Signal: Extreme non-commercial (speculative) positioning in VIX futures or equity index options predicts mean reversion.


SECTION E: FLOW AND POSITIONING DATA

E.1 OCC Daily Files

Source: https://www.theocc.com/market-data/market-data-reports/volume-and-open-interest/ (free daily CSV) Cadence: Pull at 8 AM ET (data from prior day's close, usually published 7-8 AM) Fields: exchange, underlying, expiration, strike, call_put, volume, open_interest

Timing note (Opus): OCC data is T+1. Use yesterday's OCC OI as the base. Estimate today's OI changes from intraday volume data (volume where volume > prior OI suggests new positions). Update GEX with confirmed OCC data next morning.

E.2 Put/Call Ratios

Calculate from chain data or OCC files:

SELECT
  underlying,
  date,
  SUM(CASE WHEN option_type = 'P' THEN volume ELSE 0 END) AS put_volume,
  SUM(CASE WHEN option_type = 'C' THEN volume ELSE 0 END) AS call_volume,
  CAST(SUM(CASE WHEN option_type = 'P' THEN volume ELSE 0 END) AS REAL) /
    NULLIF(SUM(CASE WHEN option_type = 'C' THEN volume ELSE 0 END), 0) AS put_call_ratio,
  SUM(CASE WHEN option_type = 'P' THEN open_interest ELSE 0 END) AS put_oi,
  SUM(CASE WHEN option_type = 'C' THEN open_interest ELSE 0 END) AS call_oi,
  CAST(SUM(CASE WHEN option_type = 'P' THEN open_interest ELSE 0 END) AS REAL) /
    NULLIF(SUM(CASE WHEN option_type = 'C' THEN open_interest ELSE 0 END), 0) AS put_call_oi_ratio
FROM options_chain_daily
GROUP BY underlying, date;

Signals:

E.3 Unusual Options Activity / Sweep Detection

Criteria for flagging a contract as unusual:

volume_to_oi_ratio > 5.0 AND volume > 1000 AND option_type in ('C', 'P')

For real-time sweep detection (requires Polygon.io):

Sweep fields to store:

ticker, strike, expiration, option_type, print_size_contracts, print_price,
ask_at_time, bid_at_time, aggressor (buyer/seller), timestamp, is_sweep (bool)

E.4 Phantom Liquidation Scanner

Purpose: Detect when large OI positions are closed all at once — signals forced liquidation or major position exit.

Detection rule (improved per Sonar Reasoning Pro): The original rule (|OI change| > 2x volume = liquidation) breaks during big moves. The improvement: track OI change separately for calls vs puts. Only flag as liquidation if BOTH calls and puts are dropping in the same direction. If calls drop but puts rise, that is a rehedge, not liquidation.

SELECT
  ticker, date,
  call_oi_change,
  put_oi_change,
  CASE
    WHEN call_oi_change < -0.15 AND put_oi_change < -0.15 THEN 'LIQUIDATION'
    WHEN call_oi_change < -0.15 AND put_oi_change > 0.05 THEN 'CALL_REHEDGE'
    WHEN put_oi_change < -0.15 AND call_oi_change > 0.05 THEN 'PUT_REHEDGE'
    ELSE 'NORMAL'
  END AS position_change_type
FROM (
  SELECT
    ticker,
    date,
    (today_call_oi - yesterday_call_oi) * 1.0 / NULLIF(yesterday_call_oi, 0) AS call_oi_change,
    (today_put_oi - yesterday_put_oi) * 1.0 / NULLIF(yesterday_put_oi, 0) AS put_oi_change
  FROM oi_daily_summary
);

E.5 FINRA Short Volume Daily Files (MUST-HAVE — Opus)

Source: https://www.finra.org/sites/default/files/short-sale-volume-files/ (free daily CSV) Cadence: Pull at 6 PM ET (data published T+1, so today's file has yesterday's data) Format: FINRA_<EXCHANGE>_<DATE>_ShortVolume.txt

Fields: Date, Symbol, ShortVolume, ShortExemptVolume, TotalVolume

Signal: When short volume exceeds 50% of total volume on a stock with rising call OI, it signals delta hedging by market makers (they short stock to hedge long delta from sold puts). This directly validates or contradicts the GEX model.

[PANEL: Sonar Reasoning Pro — NICE-TO-HAVE] Listed Short Volume Reporting from SEC/exchange websites gives the same signal 1-2 days earlier than FINRA ATS (which lags 2-3 days for threshold list). Source: sec.gov or exchange daily files.

E.6 SEC EDGAR: Form 4 Insider Trades

Source: https://efts.sec.gov/LATEST/search-index?q=%22Form+4%22&dateRange=custom&startdt={date}&enddt={date}&forms=4 Cadence: Daily at 6 PM ET Parse fields: issuer_name, ticker, reporting_person, transaction_date, shares, price_per_share, transaction_code (P=purchase, S=sale), ownership_type

Signal: Cluster of insider buys at specific price levels can identify where smart money sees support. Form 4 is well-structured XML — easy to parse.

E.7 SEC EDGAR: Form 424B2 Structured Product Filings

Purpose: Reveals exactly where banks are hedged via structured notes (autocallables, barrier reverse convertibles). These barrier levels create dealer hedging flows that produce gamma walls.

Source: https://efts.sec.gov/LATEST/search-index?q=%22424B2%22&forms=424B2 Cadence: Daily — new filings appear continuously during business hours

MVP approach (Sonar Reasoning Pro recommendation):

  1. Build this in Week 1, NOT Month 2
  2. Manually track 5 live structured products from Goldman, JPMorgan, Morgan Stanley
  3. Hardcode the barriers into the system
  4. Monitor them daily
  5. Automate scraping in Month 2 Cost: 4 hours now vs 2 weeks later, but 8 weeks of asymmetric alpha in between.

Regex extraction for MVP (Opus guidance):

patterns = {
    'barrier': r'barrier.*?(\d+\.?\d*)\s*%',
    'notional': r'notional.*?\$(\d+(?:,\d+)*(?:\.\d+)?)\s*(million|billion)',
    'underlying': r'linked to.*?(SPY|SPX|QQQ|NDX|[A-Z]{1,4})',
    'maturity': r'maturity.*?(\d{1,2}/\d{1,2}/\d{4})',
    'issuer': r'(Goldman Sachs|JPMorgan|Morgan Stanley|Bank of America|Barclays)',
}

Long-term (Month 2-3): Full NLP parsing. These filings are 50-200 pages of dense legal text. Regex covers ~60% of cases. Full automation requires fine-tuned models or substantial engineering. Start with regex, iterate.

E.8 REG SHO Threshold List

Source: https://www.nasdaqtrader.com/trader.aspx?id=regsho (free daily) Cadence: Daily at close

Signal: Stocks appearing on REG SHO threshold list (failed to deliver for 5+ consecutive days) have short squeeze potential. Combine with put/call OI to identify squeeze setups.

E.9 [PANEL: Grok 4.1 Fast — NICE-TO-HAVE] Order Flow Toxicity (VPIN)

Volume-synchronized probability of informed trading. Detects informed vs uninformed trading in options/equities — can predict short-term vol spikes or reversals, enhancing phantom liquidation and sweep detection.

Calculation: Requires intraday volume and price bars (Polygon.io). VPIN = |Buy volume - Sell volume| / Total volume, measured in equal-volume buckets.

Defer to v2 until Polygon.io subscription is active and intraday bars are being collected.


SECTION F: PREDICTION MARKET DATA

F.1 Kalshi API Integration

Base URL: https://api.elections.kalshi.com/trade-api/v2 Auth: RSA-PSS signing with API key (existing code from weather desk — adapt) Cadence: Every 15 minutes during market hours; every 60 minutes overnight

Markets to track (options-relevant):

Fields per market:

market_ticker, event_ticker, title, status, yes_bid, yes_ask, no_bid, no_ask,
last_price, volume, open_interest, close_time, expiration_time, result, timestamp

[PANEL: Sonar Reasoning Pro — MUST-HAVE] Capture full order book depth, not just midpoint prices. A Kalshi market might trade at 45 cents, but if there's only $500 at 45, it is illiquid and the price is a lie. Kalshi provides full order book via API (free). Store: bid_price_1..5, bid_size_1..5, ask_price_1..5, ask_size_1..5 for the top 5 levels.

-- Kalshi liquidity calculation
SELECT
  market_ticker,
  timestamp,
  yes_ask - yes_bid AS spread_cents,
  bid_size_1 + bid_size_2 + bid_size_3 AS top3_bid_depth_usd,
  ask_size_1 + ask_size_2 + ask_size_3 AS top3_ask_depth_usd,
  CASE WHEN (bid_size_1 + bid_size_2 + bid_size_3) < 500 THEN 1 ELSE 0 END AS illiquid_flag
FROM kalshi_orderbook
WHERE timestamp > datetime('now', '-15 minutes');

F.2 Polymarket Integration

GraphQL endpoint: https://clob.polymarket.com/ (REST) and TheGraph subgraph Cadence: Every 15 minutes

Fields: condition_id, question, end_time, outcome_prices (yes/no), volume_24h, liquidity_usd

[PANEL: Sonar Reasoning Pro — MUST-HAVE] Polymarket uses an AMM (automated market maker) with USDC collateral. AMM slippage is not captured by price alone. Monitor:

USDC depeg risk (Opus fix): Better leading indicators than Deribit for Polymarket liquidity:

  1. USDC/USDT exchange rate on DEXes (Curve 3pool)
  2. Tether/USDC basis on centralized exchanges Source: CoinGecko API (free): https://api.coingecko.com/api/v3/simple/price?ids=usd-coin,tether&vs_currencies=usd

[PANEL: Sonar Reasoning Pro — NICE-TO-HAVE] Cross-listing arbitrage: SPY vs SPX, QQQ vs NDX. When SPY calls are 2% cheaper than the synthetic SPX equivalent, that is detectable alpha. Calculate parity between chain data from both, flag deviations >1%.

F.3 Binary Probability Calculator — Prediction Market Bridge

Purpose: Convert options chain data to a probability of the underlying finishing above/below a specific strike by a specific date. Compare to Kalshi/Polymarket prices on the same event.

Standard approach (d2 from Black-Scholes):

P(S_T > K) = N(d2)
Where d2 = (ln(S/K) + (r - q - 0.5*σ²) * T) / (σ * sqrt(T))

CRITICAL FLAW (Opus + Sonar Reasoning Pro): Standard d2 assumes European options and no early exercise. US equity options are American (can exercise early). Use the following instead:

Improved approach — Breeden-Litzenberger (model-free):

Risk-neutral density: f(K) = exp(r*T) * d²C/dK²
Binary probability: P(S_T > K) = -exp(r*T) * dC/dK

Compute numerically from the options chain by taking finite differences between strikes.

Alternative — Bjerksund-Stensland (closed-form for American options, Sonar Reasoning Pro): Use Bjerksund-Stensland approximation for American exercise and dividend adjustment. The difference vs standard d2 can be 2-5% on your arbitrage signal — which is your entire edge vs Kalshi.

Test protocol: Pick any live SPY contract. Calculate probability using standard d2. Calculate using Bjerksund-Stensland. Compare both to actual Kalshi contract price. The one that better explains the Kalshi price is your truth test.

Use skew-adjusted IV (Opus): Do not use flat ATM IV in d2. Use the actual IV at each specific strike from the IV surface. This gives the risk-neutral probability that already incorporates skew.

Arbitrage threshold: Flag when |options_probability - kalshi_price| > 5c AND kalshi_liquidity_top3 > $300. Do the same comparison for Polymarket.


SECTION G: CALENDAR AND DAILY STATE TABLE

G.1 Daily State Table

Central lookup table queried by all downstream analytics. Build this first — everything else plugs into it.

CREATE TABLE IF NOT EXISTS daily_state (
  date TEXT PRIMARY KEY,
  -- Market structure
  is_trading_day INTEGER DEFAULT 1,
  is_monthly_opex INTEGER DEFAULT 0,  -- 3rd Friday of month
  is_quarterly_opex INTEGER DEFAULT 0, -- 3rd Friday of Mar/Jun/Sep/Dec
  is_weekly_opex INTEGER DEFAULT 0,
  is_0dte_day INTEGER DEFAULT 0,
  days_to_monthly_opex INTEGER,
  -- Fed calendar
  is_fomc_day INTEGER DEFAULT 0,
  is_fomc_week INTEGER DEFAULT 0,
  days_to_fomc INTEGER,
  fomc_expected_action TEXT,
  -- Economic calendar
  is_cpi_day INTEGER DEFAULT 0,
  is_ppi_day INTEGER DEFAULT 0,
  is_nfp_day INTEGER DEFAULT 0,
  is_gdp_day INTEGER DEFAULT 0,
  is_retail_sales_day INTEGER DEFAULT 0,
  is_pce_day INTEGER DEFAULT 0,
  days_to_cpi INTEGER,
  days_to_nfp INTEGER,
  -- Treasury calendar
  is_2y_auction INTEGER DEFAULT 0,
  is_10y_auction INTEGER DEFAULT 0,
  is_30y_auction INTEGER DEFAULT 0,
  is_quarter_end INTEGER DEFAULT 0,
  is_month_end INTEGER DEFAULT 0,
  -- Vol state (filled after market close)
  vix_close REAL,
  vix_regime TEXT,  -- 'low' <15, 'normal' 15-20, 'elevated' 20-30, 'stress' >30
  vix_pct_change_1d REAL,
  vvix_close REAL,
  skew_close REAL,
  vrp_20d REAL,
  -- Market state
  spy_close REAL,
  spy_pct_change_1d REAL,
  spy_pct_change_5d REAL,
  net_gex REAL,
  gex_regime TEXT,  -- 'positive' or 'negative'
  gamma_flip_level REAL,
  -- Macro state
  sofr REAL,
  yield_2y REAL,
  yield_10y REAL,
  yield_curve_2y10y REAL,
  hy_spread REAL,
  ig_spread REAL,
  rrp_balance_bn REAL,
  tga_balance_bn REAL,
  -- Prediction market state
  kalshi_fed_hold_prob REAL,
  cme_fedwatch_hold_prob REAL,
  kalshi_cme_divergence REAL,
  -- Derived flags
  convergence_pattern TEXT,  -- e.g., 'CPI+OPEX', 'FOMC+QUAD_WITCH'
  buyback_blackout_pct REAL,  -- estimated % of S&P500 in blackout window (±10% uncertainty)
  -- Updated timestamps
  created_at TEXT DEFAULT (datetime('now')),
  updated_at TEXT DEFAULT (datetime('now'))
);

CREATE INDEX IF NOT EXISTS idx_daily_state_date ON daily_state(date);
CREATE INDEX IF NOT EXISTS idx_daily_state_opex ON daily_state(is_monthly_opex, is_quarterly_opex);
CREATE INDEX IF NOT EXISTS idx_daily_state_fomc ON daily_state(is_fomc_day, is_fomc_week);

Buyback blackout estimate (Opus): Assume all companies enter blackout 14 calendar days before their earnings date and exit 2 days after. Calculate market-cap-weighted percentage. Label as estimate with ±10% uncertainty.

[PANEL: Sonar Reasoning Pro — NICE-TO-HAVE] Systematic rebalancer timing: Track known rebalance dates (FTSE quarterly, Russell annual in June, S&P 500 quarterly in March/June/September/December), factor rotation dates, and ETF creation/destruction. These create predictable volatility spikes 2-5 days before and 1-2 days after. Add flags to daily_state for these events.

G.2 Convergence Pattern Detector

MVP (Opus/Sonar Reasoning Pro): Hardcode alerts for the top 2 patterns that have the most documented precedent. Run Month 2 analysis on which patterns actually matter, then generalize.

Top 2 convergence patterns:

  1. CPI + OpEx same week: Vol crush from CPI relief combined with gamma pin at OpEx creates directional exhaustion. Historical: SPY moves <1% the day after.
  2. Quad witching + earnings cluster: Multiple large-cap earnings + quarterly OpEx creates unpredictable vol (GEX becomes unreliable due to massive OI rollovers).

Additional patterns to track (lower priority):

UPDATE daily_state
SET convergence_pattern = CASE
  WHEN is_cpi_day = 1 AND is_weekly_opex = 1 THEN 'CPI+OPEX'
  WHEN is_quarterly_opex = 1 AND (SELECT COUNT(*) FROM earnings_calendar WHERE date BETWEEN date('now', '-3 days') AND date('now', '+3 days')) > 20 THEN 'QUAD_WITCH+EARNINGS'
  WHEN is_fomc_week = 1 AND buyback_blackout_pct > 0.40 THEN 'FOMC+BLACKOUT'
  WHEN vix_regime = 'stress' AND gex_regime = 'negative' THEN 'STRESS+SHORT_GAMMA'
  ELSE NULL
END
WHERE date = date('now');

G.3 Earnings Calendar

Source: Yahoo Finance earnings calendar (scrape), or Nasdaq.com, or Polygon.io (paid) Cadence: Update weekly; refresh daily for any changes

CREATE TABLE IF NOT EXISTS earnings_calendar (
  ticker TEXT NOT NULL,
  earnings_date TEXT NOT NULL,
  time_of_day TEXT,  -- 'BMO' (before market open), 'AMC' (after market close), 'DMH' (during market hours)
  expected_eps REAL,
  expected_revenue_bn REAL,
  actual_eps REAL,
  actual_revenue_bn REAL,
  options_implied_move_pct REAL,  -- calculated from straddle price / stock price
  historical_avg_move_pct REAL,
  historical_move_beat_rate REAL,  -- % of times actual move > implied move
  post_earnings_direction TEXT,    -- 'up', 'down', 'flat' (filled after event)
  PRIMARY KEY (ticker, earnings_date)
);

CREATE INDEX IF NOT EXISTS idx_earnings_ticker ON earnings_calendar(ticker);
CREATE INDEX IF NOT EXISTS idx_earnings_date ON earnings_calendar(earnings_date);

SECTION H: SENTIMENT DATA

H.1 Reddit Options Sentiment

Source: Reddit PRAW API (free, rate-limited) Subreddits: r/options (primary), r/wallstreetbets (secondary, noisy) Cadence: Daily at market close — aggregate daily, not real-time

Simplified approach (Opus): Only r/options (more serious). Only flag when a single ticker's mention count exceeds 3 standard deviations from its 30-day average. Drop the YOLO frequency metric from r/wallstreetbets entirely.

[PANEL: Sonar Reasoning Pro] Reddit sentiment + options flow correlation matters more than sentiment alone. Build a confusion matrix:

Only use as contrarian when Reddit and options flow DIVERGE.

CREATE TABLE IF NOT EXISTS reddit_sentiment_daily (
  date TEXT NOT NULL,
  ticker TEXT NOT NULL,
  subreddit TEXT NOT NULL,
  mention_count INTEGER,
  bullish_count INTEGER,
  bearish_count INTEGER,
  sentiment_score REAL,  -- (bullish - bearish) / total
  is_3sigma_spike INTEGER DEFAULT 0,
  avg_30d_mentions REAL,
  std_30d_mentions REAL,
  PRIMARY KEY (date, ticker, subreddit)
);

H.2 X/Twitter Flow Account Aggregation

Source: X API (paid at meaningful scale) or Grok search Cadence: Once daily at market open — consensus themes only, not individual trade alerts

Simplified approach (Opus): 3x daily scans is over-engineered and these accounts post hundreds of alerts daily, mostly noise. The actual large institutional flows are better captured by volume/OI analysis. Simplify to: one daily scan for consensus themes (e.g., "everyone is talking about NVDA put buying"), not individual trade alerts.

[PANEL: Gemini 3.1 Pro] Remove X/Twitter flow account aggregation from v1. The signal is noisier than r/options, and the API cost at meaningful scale is prohibitive. Add back in v2 if there is evidence of signal.


SECTION I: INTERNATIONAL AND CROSS-ASSET SIGNALS

I.1 Pre-Market International Signals

Recommended (Opus): Remove SGX Nifty from tracking. India's Nifty has near-zero predictive value for US options markets. The correlation is driven entirely by shared global risk factors better captured by S&P 500 futures (which trade 23 hours/day) and VIX futures.

Keep:

I.2 Carry Trade Indicator

JPY and CHF as safe-haven flows:

Source: FRED (DEXJPUS — free daily), or forex data providers for intraday Cadence: Every 15 minutes during market hours for intraday; daily for historical

[PANEL: Sonar Reasoning Pro — cross-desk shared] Carry trade indicator is reusable for Forex desk (directly) and Crypto desk (BTC/ETH safe havens). Build as shared utility.

I.3 Deribit Crypto Options Feed

Source: Deribit WebSocket API (free, real-time) Endpoint: wss://www.deribit.com/ws/api/v2 Subscriptions: BTC and ETH options — ATM IV, 25-delta skew, term structure Cadence: Real-time WebSocket updates; store hourly snapshots

Purpose for options desk:

  1. Validate DIY Greeks math (validate on Deribit, then apply to equity chains)
  2. Early warning for Polymarket liquidity (crypto vol spikes often precede Polymarket drawdowns)
  3. USDC depeg risk (secondary — see F.2 for better leading indicators)

Validation approach: Compare your Black-Scholes Greeks calculation against Deribit's published Greeks for BTC/ETH options. If your delta is within 0.5% of Deribit's, the math is correct. Note (Sonar Reasoning Pro): Deribit pre-calculates Greeks using their own model assumptions — validate against multiple sources including academic formulas and community tools (OptionStrat), not Deribit alone.

[PANEL: Grok 4.1] Historical options chain data for backtesting is a critical gap. Without historical data, model validation is limited to forward testing, delaying learning cycles. Deribit provides free historical data download for BTC/ETH — use this to backtest GEX and binary probability calculations before applying to equities.


SECTION J: ETF FLOW DATA

J.1 ETF Creation/Redemption

Source: ETF.com (free scrape), ICI weekly flow reports (free), SPY/QQQ shares outstanding from Yahoo Finance Cadence: Weekly (ICI), daily for shares outstanding

[PANEL: Opus/DeepSeek/GPT-4.1 — NICE-TO-HAVE] ETF creation/redemption flows interact with options gamma. If SPY is near a gamma wall AND seeing large redemptions, the wall may not hold. ICI publishes weekly; ETF.com has more timely data.

Large creation = inflows = buying pressure. Large redemption = outflows.

Signal: SPY shares outstanding decrease > 1% in one day = significant redemption = potential gamma wall failure signal.

CREATE TABLE IF NOT EXISTS etf_flows (
  date TEXT NOT NULL,
  ticker TEXT NOT NULL,
  shares_outstanding INTEGER,
  shares_change INTEGER,
  flow_direction TEXT,  -- 'creation' or 'redemption'
  estimated_flow_usd_mm REAL,
  PRIMARY KEY (date, ticker)
);

J.2 Dividend Schedule

Source: Yahoo Finance (add dividend_date and dividend_amount fields), Nasdaq.com dividend calendar Cadence: Weekly update

[PANEL: Opus — MUST-HAVE] Early exercise of American calls is rational only when the dividend exceeds the remaining time value. GEX model assumes OI persists, but large OI in ITM calls will vanish the day before ex-div due to early exercise. Also, put-call parity requires dividend adjustment — without it, the binary probability calculator will be systematically biased for dividend-paying stocks.

CREATE TABLE IF NOT EXISTS dividend_calendar (
  ticker TEXT NOT NULL,
  declaration_date TEXT,
  ex_dividend_date TEXT NOT NULL,
  record_date TEXT,
  payment_date TEXT,
  dividend_amount REAL,
  frequency TEXT,  -- 'quarterly', 'monthly', 'annual', 'special'
  PRIMARY KEY (ticker, ex_dividend_date)
);

CREATE INDEX IF NOT EXISTS idx_div_ex_date ON dividend_calendar(ex_dividend_date);

Flag in daily_state: Add tickers_with_exdiv_tomorrow TEXT (comma-separated) — any tracked ticker going ex-div tomorrow needs GEX recalculation warning.


SECTION K: DATA QUALITY AND RELIABILITY

K.1 Data Quality Layer (GPT-4.1 — MUST-HAVE)

Implement explicit data validation and cleaning before inserting to any table. This is critical — bad data will silently poison all downstream analytics.

CREATE TABLE IF NOT EXISTS data_quality_log (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  source TEXT NOT NULL,           -- 'polygon', 'occ', 'kalshi', etc.
  table_name TEXT NOT NULL,
  record_key TEXT,                -- ticker + date + strike, etc.
  issue_type TEXT NOT NULL,       -- 'impossible_delta', 'negative_iv', 'stale_data', etc.
  issue_description TEXT,
  raw_value TEXT,
  flagged_at TEXT DEFAULT (datetime('now')),
  resolved INTEGER DEFAULT 0
);

CREATE INDEX IF NOT EXISTS idx_dq_source ON data_quality_log(source, flagged_at);
CREATE INDEX IF NOT EXISTS idx_dq_unresolved ON data_quality_log(resolved) WHERE resolved = 0;

Validation rules per source:

K.2 Staleness Monitoring

[PANEL: Sonar Reasoning Pro] Explicit data quality SLAs per source. Flag immediately when data is stale:

CREATE TABLE IF NOT EXISTS data_freshness (
  source TEXT PRIMARY KEY,
  last_successful_update TEXT,
  expected_update_cadence_minutes INTEGER,
  max_staleness_minutes INTEGER,
  is_stale INTEGER DEFAULT 0,
  consecutive_failures INTEGER DEFAULT 0
);

Staleness rules:

If SPY chain data is >30 minutes stale: Flag GEX calculation as unreliable rather than silently using stale data.

K.3 Circuit Breaker Pattern

Each data collector must implement:

  1. Exponential backoff on failure (1s, 2s, 4s, 8s, 16s, max 5 minutes)
  2. Circuit breaker: after 3 consecutive failures, stop retrying for 30 minutes
  3. Log failure to data_quality_log
  4. Alert via Telegram when circuit breaker trips
  5. Fall back to cached/last-known-good data for downstream analytics
  6. Auto-recover check every 30 minutes while breaker is open

K.4 Backup and Archival

[PANEL: GPT-4.1] If the VPS fails, data is lost. Add regular off-server backups.

Archival policy (Opus):

Backup:


SECTION L: SQLITE DATABASE SCHEMA

All tables use WAL mode. Multiple databases to eliminate cross-domain write contention (Opus recommendation). Stagger scraper schedules by 1-2 minutes to avoid simultaneous writes.

PRAGMA journal_mode = WAL;
PRAGMA synchronous = NORMAL;
PRAGMA cache_size = -64000;  -- 64MB cache
PRAGMA temp_store = MEMORY;

L.1 options.db — Chain and Greeks Data

-- Intraday chain snapshots (15-minute)
CREATE TABLE IF NOT EXISTS options_chain_intraday (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  ticker TEXT NOT NULL,
  timestamp TEXT NOT NULL,
  expiration_date TEXT NOT NULL,
  strike REAL NOT NULL,
  option_type TEXT NOT NULL CHECK (option_type IN ('C', 'P')),
  bid REAL,
  ask REAL,
  mid REAL,
  last_price REAL,
  volume INTEGER DEFAULT 0,
  open_interest INTEGER DEFAULT 0,
  implied_volatility REAL,
  delta REAL,
  gamma REAL,
  theta REAL,
  vega REAL,
  rho REAL,
  underlying_price REAL,
  spread_pct REAL,  -- (ask - bid) / mid
  dte INTEGER,      -- days to expiration
  is_0dte INTEGER DEFAULT 0,
  data_source TEXT DEFAULT 'polygon'
);

CREATE INDEX IF NOT EXISTS idx_chain_intraday_ticker_ts ON options_chain_intraday(ticker, timestamp);
CREATE INDEX IF NOT EXISTS idx_chain_intraday_expiry ON options_chain_intraday(ticker, expiration_date);
CREATE INDEX IF NOT EXISTS idx_chain_intraday_0dte ON options_chain_intraday(is_0dte, timestamp);

-- Daily end-of-day chain snapshots (from OCC or Polygon EOD)
CREATE TABLE IF NOT EXISTS options_chain_daily (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  date TEXT NOT NULL,
  ticker TEXT NOT NULL,
  expiration_date TEXT NOT NULL,
  strike REAL NOT NULL,
  option_type TEXT NOT NULL CHECK (option_type IN ('C', 'P')),
  volume INTEGER DEFAULT 0,
  open_interest INTEGER DEFAULT 0,
  implied_volatility REAL,
  delta REAL,
  gamma REAL,
  theta REAL,
  vega REAL,
  rho REAL,
  bid_close REAL,
  ask_close REAL,
  last_price REAL,
  underlying_close REAL,
  spread_pct REAL,
  dte INTEGER,
  data_source TEXT DEFAULT 'occ'
);

CREATE UNIQUE INDEX IF NOT EXISTS idx_chain_daily_unique ON options_chain_daily(date, ticker, expiration_date, strike, option_type);
CREATE INDEX IF NOT EXISTS idx_chain_daily_ticker ON options_chain_daily(ticker, date);

-- GEX calculations (stored after each computation)
CREATE TABLE IF NOT EXISTS gex_snapshots (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  ticker TEXT NOT NULL,
  timestamp TEXT NOT NULL,
  net_gex REAL,
  call_gex REAL,
  put_gex REAL,
  gamma_flip_level REAL,
  gamma_wall_upper REAL,
  gamma_wall_lower REAL,
  zero_gamma_level REAL,
  gex_0dte REAL,
  gex_0dte_pct REAL,          -- 0DTE as % of total GEX
  customer_origin_confidence REAL,  -- 0-1, how confident we are in dealer directionality
  gex_regime TEXT,            -- 'positive', 'negative'
  underlying_price REAL
);

CREATE INDEX IF NOT EXISTS idx_gex_ticker_ts ON gex_snapshots(ticker, timestamp);

-- IV history for IV rank and percentile calculation
CREATE TABLE IF NOT EXISTS iv_history (
  date TEXT NOT NULL,
  ticker TEXT NOT NULL,
  atm_iv REAL,
  iv_25d_call REAL,  -- 25-delta call IV
  iv_25d_put REAL,   -- 25-delta put IV
  iv_10d_call REAL,
  iv_10d_put REAL,
  skew_25d REAL,     -- iv_25d_put - iv_25d_call
  iv_term_1m REAL,   -- front month ATM IV
  iv_term_2m REAL,
  iv_term_3m REAL,
  iv_term_6m REAL,
  iv_rank_52w REAL,  -- (current - 52w low) / (52w high - 52w low)
  iv_pct_252d REAL,  -- % of last 252 days with lower IV
  PRIMARY KEY (date, ticker)
);

-- Vanna and Charm aggregate flows
CREATE TABLE IF NOT EXISTS second_order_greeks (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  ticker TEXT NOT NULL,
  timestamp TEXT NOT NULL,
  aggregate_vanna REAL,        -- sum across all strikes/expirations
  aggregate_charm REAL,
  vanna_flip_level REAL,       -- strike where aggregate vanna crosses zero
  charm_daily_impact REAL,     -- estimated delta change from charm over next 24h
  vega_flip_level REAL,        -- [PANEL: Sonar] strike where aggregate vega crosses zero
  aggregate_vega_front REAL,   -- [PANEL: Sonar] front-month total vega exposure
  aggregate_vega_back REAL     -- [PANEL: Sonar] back-month total vega exposure
);

-- Put/call ratios
CREATE TABLE IF NOT EXISTS put_call_ratios (
  date TEXT NOT NULL,
  ticker TEXT NOT NULL,
  put_volume INTEGER,
  call_volume INTEGER,
  put_call_volume_ratio REAL,
  put_oi INTEGER,
  call_oi INTEGER,
  put_call_oi_ratio REAL,
  is_volume_extreme INTEGER DEFAULT 0,  -- 1 if ratio > 1.2 or < 0.6
  PRIMARY KEY (date, ticker)
);

-- Sweep/unusual activity log
CREATE TABLE IF NOT EXISTS options_sweeps (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  timestamp TEXT NOT NULL,
  ticker TEXT NOT NULL,
  expiration_date TEXT NOT NULL,
  strike REAL NOT NULL,
  option_type TEXT NOT NULL,
  print_size_contracts INTEGER,
  print_price REAL,
  ask_at_time REAL,
  bid_at_time REAL,
  aggressor TEXT,  -- 'buyer', 'seller', 'unknown'
  is_sweep INTEGER DEFAULT 0,
  volume_to_oi REAL,
  sentiment TEXT   -- 'bullish', 'bearish'
);

CREATE INDEX IF NOT EXISTS idx_sweeps_ticker_ts ON options_sweeps(ticker, timestamp);

-- Phantom liquidation events
CREATE TABLE IF NOT EXISTS phantom_liquidation_events (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  date TEXT NOT NULL,
  ticker TEXT NOT NULL,
  call_oi_change_pct REAL,
  put_oi_change_pct REAL,
  event_type TEXT,   -- 'LIQUIDATION', 'CALL_REHEDGE', 'PUT_REHEDGE', 'NORMAL'
  underlying_move_pct REAL,
  notes TEXT
);

L.2 volatility.db — Vol and Macro Data

-- VIX family daily data
CREATE TABLE IF NOT EXISTS vix_daily (
  date TEXT PRIMARY KEY,
  vix_open REAL,
  vix_high REAL,
  vix_low REAL,
  vix_close REAL,
  vvix_close REAL,
  skew_close REAL,
  vix3m_close REAL,
  vix6m_close REAL,
  vix_m1_futures REAL,  -- front month VIX future
  vix_m2_futures REAL,  -- second month VIX future
  vix_event_month REAL, -- month containing next FOMC or CPI
  contango_m1_m2 REAL,  -- (m2/m1 - 1) * 100
  vix_regime TEXT
);

-- Realized volatility calculations
CREATE TABLE IF NOT EXISTS realized_vol (
  date TEXT NOT NULL,
  ticker TEXT NOT NULL,
  rv_cc_5d REAL,      -- close-to-close, 5 day
  rv_cc_10d REAL,
  rv_cc_20d REAL,
  rv_cc_30d REAL,
  rv_cc_60d REAL,
  rv_parkinson_20d REAL,   -- high-low estimator
  rv_yang_zhang_20d REAL,  -- OHLC estimator
  vrp_20d REAL,            -- VIX - rv_cc_20d (variance risk premium)
  iv_rv_spread_20d REAL,   -- atm_iv - rv_cc_20d
  PRIMARY KEY (date, ticker)
);

-- FRED macro series
CREATE TABLE IF NOT EXISTS fred_macro (
  date TEXT NOT NULL,
  series_id TEXT NOT NULL,
  value REAL,
  PRIMARY KEY (date, series_id)
);

CREATE INDEX IF NOT EXISTS idx_fred_series ON fred_macro(series_id, date);

-- Derived macro signals (calculated from fred_macro)
CREATE TABLE IF NOT EXISTS macro_signals (
  date TEXT PRIMARY KEY,
  sofr REAL,
  yield_2y REAL,
  yield_10y REAL,
  yield_curve_slope REAL,  -- 10y - 2y
  yield_curve_slope_chg_5d REAL,
  hy_spread REAL,
  ig_spread REAL,
  hy_spread_chg_5d REAL,
  rrp_balance_bn REAL,
  tga_balance_bn REAL,
  rrp_tga_combined_bn REAL,  -- proxy for system liquidity
  fed_balance_sheet_bn REAL,
  -- Real-time bond ETF proxies
  jnk_price REAL,   -- HY ETF
  lqd_price REAL,   -- IG ETF
  hy_spread_intraday REAL,  -- approximated from JNK
  -- CME FedWatch
  cme_fedwatch_hold_prob REAL,
  cme_fedwatch_hike_prob REAL,
  cme_fedwatch_cut_prob REAL,
  -- VSTOXX
  vstoxx_8am REAL,
  vstoxx_vix_spread REAL,
  -- Carry trade
  usdjpy REAL,
  usdchf REAL,
  jpy_chg_1d_pct REAL,
  carry_stress_flag INTEGER DEFAULT 0  -- 1 if JPY/CHF moves >0.5% intraday
);

-- CFTC COT options positions
CREATE TABLE IF NOT EXISTS cftc_cot (
  report_date TEXT NOT NULL,
  market_name TEXT NOT NULL,
  noncomm_long INTEGER,
  noncomm_short INTEGER,
  noncomm_net INTEGER,
  comm_long INTEGER,
  comm_short INTEGER,
  comm_net INTEGER,
  nonrept_long INTEGER,
  nonrept_short INTEGER,
  PRIMARY KEY (report_date, market_name)
);

L.3 prediction_markets.db — Kalshi and Polymarket

-- Kalshi market snapshots
CREATE TABLE IF NOT EXISTS kalshi_markets (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  timestamp TEXT NOT NULL,
  market_ticker TEXT NOT NULL,
  event_ticker TEXT,
  title TEXT,
  category TEXT,  -- 'fed', 'cpi', 'equity_threshold', 'macro'
  yes_bid REAL,
  yes_ask REAL,
  no_bid REAL,
  no_ask REAL,
  mid_price REAL,   -- (yes_bid + yes_ask) / 2
  volume_24h REAL,
  open_interest_usd REAL,
  close_time TEXT,
  expiration_time TEXT,
  days_to_expiry INTEGER,
  -- [PANEL: Sonar] Order book depth (top 5 levels)
  bid_price_1 REAL, bid_size_1 REAL,
  bid_price_2 REAL, bid_size_2 REAL,
  bid_price_3 REAL, bid_size_3 REAL,
  ask_price_1 REAL, ask_size_1 REAL,
  ask_price_2 REAL, ask_size_2 REAL,
  ask_price_3 REAL, ask_size_3 REAL,
  top3_bid_depth_usd REAL,
  top3_ask_depth_usd REAL,
  spread_cents REAL,
  illiquid_flag INTEGER DEFAULT 0  -- 1 if top3 depth < $500
);

CREATE INDEX IF NOT EXISTS idx_kalshi_ticker_ts ON kalshi_markets(market_ticker, timestamp);
CREATE INDEX IF NOT EXISTS idx_kalshi_expiry ON kalshi_markets(expiration_time);

-- Polymarket snapshots
CREATE TABLE IF NOT EXISTS polymarket_markets (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  timestamp TEXT NOT NULL,
  condition_id TEXT NOT NULL,
  question TEXT,
  category TEXT,
  yes_price REAL,
  no_price REAL,
  volume_24h_usd REAL,
  liquidity_usd REAL,
  end_time TEXT,
  amm_slippage_500 REAL,  -- estimated slippage for $500 position
  usdc_usdt_rate REAL,    -- [PANEL: Sonar] USDC/USDT peg health
  illiquid_flag INTEGER DEFAULT 0
);

CREATE INDEX IF NOT EXISTS idx_poly_condition_ts ON polymarket_markets(condition_id, timestamp);

-- Arbitrage signals — cross-venue and options bridge
CREATE TABLE IF NOT EXISTS arbitrage_signals (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  detected_at TEXT NOT NULL,
  signal_type TEXT NOT NULL,  -- 'kalshi_options_bridge', 'poly_options_bridge', 'cross_venue', 'fedwatch_kalshi'
  underlying TEXT,
  event_description TEXT,
  -- Prediction market side
  venue_1 TEXT,
  venue_1_price REAL,
  venue_1_side TEXT,
  venue_1_liquidity_usd REAL,
  -- Options or CME side
  venue_2 TEXT,
  venue_2_price REAL,  -- options-implied probability or CME FedWatch
  venue_2_method TEXT, -- 'breeden_litzenberger', 'bjerksund_stensland', 'fedwatch', 'kalshi'
  -- Edge
  edge_cents REAL,
  edge_after_fees_cents REAL,
  confidence_pct INTEGER,
  -- Context
  vix_at_detection REAL,
  gex_regime TEXT,
  -- Lifecycle
  expiration_time TEXT,
  status TEXT DEFAULT 'open',  -- 'open', 'traded', 'expired', 'invalidated'
  outcome TEXT,
  brier_score REAL,
  notes TEXT
);

CREATE INDEX IF NOT EXISTS idx_arb_signals_status ON arbitrage_signals(status, detected_at);
CREATE INDEX IF NOT EXISTS idx_arb_signals_type ON arbitrage_signals(signal_type, detected_at);

-- USDC peg monitoring (Polymarket liquidity proxy)
CREATE TABLE IF NOT EXISTS usdc_peg_monitor (
  timestamp TEXT PRIMARY KEY,
  usdc_usd_rate REAL,
  usdt_usd_rate REAL,
  usdc_usdt_spread_bps REAL,
  is_depeg_risk INTEGER DEFAULT 0  -- 1 if spread > 50 bps
);

L.4 flows.db — Institutional Flow and Positioning

-- OCC daily open interest
CREATE TABLE IF NOT EXISTS occ_daily_oi (
  date TEXT NOT NULL,
  exchange TEXT,
  underlying TEXT NOT NULL,
  expiration_date TEXT NOT NULL,
  strike REAL NOT NULL,
  option_type TEXT NOT NULL,
  volume INTEGER DEFAULT 0,
  open_interest INTEGER DEFAULT 0,
  PRIMARY KEY (date, underlying, expiration_date, strike, option_type)
);

CREATE INDEX IF NOT EXISTS idx_occ_underlying ON occ_daily_oi(underlying, date);

-- FINRA short volume
CREATE TABLE IF NOT EXISTS finra_short_volume (
  date TEXT NOT NULL,
  ticker TEXT NOT NULL,
  short_volume INTEGER,
  short_exempt_volume INTEGER,
  total_volume INTEGER,
  short_pct REAL,  -- short_volume / total_volume
  PRIMARY KEY (date, ticker)
);

-- SEC Form 4 insider trades
CREATE TABLE IF NOT EXISTS sec_form4 (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  filing_date TEXT NOT NULL,
  ticker TEXT NOT NULL,
  reporting_person TEXT,
  transaction_date TEXT,
  shares INTEGER,
  price_per_share REAL,
  transaction_code TEXT,  -- 'P'=purchase, 'S'=sale, 'A'=award, etc.
  ownership_type TEXT,    -- 'D'=direct, 'I'=indirect
  is_buy INTEGER,
  is_sell INTEGER
);

CREATE INDEX IF NOT EXISTS idx_form4_ticker ON sec_form4(ticker, filing_date);

-- REG SHO threshold list
CREATE TABLE IF NOT EXISTS regsho_threshold (
  date TEXT NOT NULL,
  ticker TEXT NOT NULL,
  exchange TEXT,
  consecutive_days INTEGER,
  PRIMARY KEY (date, ticker)
);

-- SEC Form 424B2 structured products (barrier levels)
CREATE TABLE IF NOT EXISTS structured_products_424b2 (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  filing_date TEXT NOT NULL,
  issuer TEXT NOT NULL,
  underlying TEXT NOT NULL,
  barrier_pct REAL,           -- barrier as % of initial price
  barrier_price REAL,         -- barrier in absolute price terms
  initial_price REAL,         -- stock price at issuance
  notional_usd_mm REAL,
  maturity_date TEXT,
  product_type TEXT,          -- 'autocallable', 'barrier_reverse_convertible', 'digital'
  is_active INTEGER DEFAULT 1,
  source TEXT DEFAULT 'manual_mvp',  -- 'manual_mvp', 'regex_auto', 'nlp_auto'
  accession_number TEXT,
  notes TEXT
);

CREATE INDEX IF NOT EXISTS idx_424b2_underlying ON structured_products_424b2(underlying, is_active);

-- FINRA ATS dark pool data (weekly aggregate)
CREATE TABLE IF NOT EXISTS finra_ats_dark_pool (
  week_ending TEXT NOT NULL,
  ticker TEXT NOT NULL,
  ats_share_quantity INTEGER,
  total_share_quantity INTEGER,
  ats_pct REAL,
  PRIMARY KEY (week_ending, ticker)
);

-- ETF flows
CREATE TABLE IF NOT EXISTS etf_flows (
  date TEXT NOT NULL,
  ticker TEXT NOT NULL,
  shares_outstanding INTEGER,
  shares_change INTEGER,
  flow_direction TEXT,  -- 'creation' or 'redemption'
  estimated_flow_usd_mm REAL,
  PRIMARY KEY (date, ticker)
);

-- Dividend calendar
CREATE TABLE IF NOT EXISTS dividend_calendar (
  ticker TEXT NOT NULL,
  declaration_date TEXT,
  ex_dividend_date TEXT NOT NULL,
  record_date TEXT,
  payment_date TEXT,
  dividend_amount REAL,
  frequency TEXT,
  PRIMARY KEY (ticker, ex_dividend_date)
);

CREATE INDEX IF NOT EXISTS idx_div_ex_date ON dividend_calendar(ex_dividend_date);

L.5 calendar.db — Events and State

-- Daily state (from Section G.1 above — duplicated here for reference)
-- See Section G.1 for full CREATE TABLE statement

-- Earnings calendar (from Section G.3 above)
-- See Section G.3 for full CREATE TABLE statement

-- Treasury auction calendar
CREATE TABLE IF NOT EXISTS treasury_auctions (
  auction_date TEXT NOT NULL,
  security_type TEXT NOT NULL,  -- 'Bill', 'Note', 'Bond', 'TIPS', 'FRN'
  term TEXT,                    -- '2Y', '5Y', '10Y', '30Y'
  cusip TEXT,
  bid_to_cover REAL,
  indirect_bidders_pct REAL,
  high_yield REAL,
  when_issued_yield REAL,
  tail_bps REAL,  -- high_yield - when_issued_yield (positive = weak auction)
  is_weak_auction INTEGER DEFAULT 0,  -- tail > 1bp or btc < 2.3
  PRIMARY KEY (auction_date, security_type, term)
);

-- Implied correlation index
CREATE TABLE IF NOT EXISTS implied_correlation (
  date TEXT NOT NULL,
  index_name TEXT NOT NULL,  -- 'COR1M', 'COR3M'
  value REAL,
  PRIMARY KEY (date, index_name)
);

-- Data freshness tracking (from K.2)
CREATE TABLE IF NOT EXISTS data_freshness (
  source TEXT PRIMARY KEY,
  last_successful_update TEXT,
  expected_update_cadence_minutes INTEGER,
  max_staleness_minutes INTEGER,
  is_stale INTEGER DEFAULT 0,
  consecutive_failures INTEGER DEFAULT 0
);

-- Data quality log (from K.1)
CREATE TABLE IF NOT EXISTS data_quality_log (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  source TEXT NOT NULL,
  table_name TEXT NOT NULL,
  record_key TEXT,
  issue_type TEXT NOT NULL,
  issue_description TEXT,
  raw_value TEXT,
  flagged_at TEXT DEFAULT (datetime('now')),
  resolved INTEGER DEFAULT 0
);

-- Reddit sentiment
CREATE TABLE IF NOT EXISTS reddit_sentiment_daily (
  date TEXT NOT NULL,
  ticker TEXT NOT NULL,
  subreddit TEXT NOT NULL,
  mention_count INTEGER,
  bullish_count INTEGER,
  bearish_count INTEGER,
  sentiment_score REAL,
  is_3sigma_spike INTEGER DEFAULT 0,
  avg_30d_mentions REAL,
  std_30d_mentions REAL,
  PRIMARY KEY (date, ticker, subreddit)
);

SECTION M: API ENDPOINTS REFERENCE

Source Endpoint Auth Rate Limit Notes
Polygon.io chains https://api.polygon.io/v3/snapshot/options/{ticker} API key (header) 5 req/min (free), 100/min ($29) Use paid plan
Polygon.io trades https://api.polygon.io/v2/last/trade/{ticker} API key Same For intraday
OCC daily files https://www.theocc.com/market-data/market-data-reports/volume-and-open-interest/ None No stated limit Pull 1x/day
CBOE VIX data https://cdn.cboe.com/products/us/indices/data/ None Rate-limit conservatively Rotate user agents
CBOE VIX futures https://cdn.cboe.com/products/us/futures/data/VX/ None Rate-limit Daily settlement CSVs
CBOE SKEW https://www.cboe.com/tradable_products/vix/skew_historical_data/ None Rate-limit Daily
CBOE Implied Corr https://www.cboe.com/tradable_products/vix/implied_correlation/ None Rate-limit Daily
FRED API https://api.stlouisfed.org/fred/series/observations?series_id={id}&api_key={key}&file_type=json Free API key 120 req/min Generous limits
CME FedWatch https://www.cmegroup.com/markets/interest-rates/cme-fedwatch-tool.html None (scrape) Rate-limit Parse JSON embedded in page
TreasuryDirect https://www.treasurydirect.gov/TA_WS/securities/search?type=Note&startDate={date}&endDate={date} None No stated limit Official gov API
Kalshi https://api.elections.kalshi.com/trade-api/v2/markets RSA-PSS auth 10 req/sec Existing code from weather desk
Kalshi orderbook https://api.elections.kalshi.com/trade-api/v2/markets/{ticker}/orderbook RSA-PSS auth 10 req/sec [PANEL: Sonar] Add this
Polymarket REST https://clob.polymarket.com/markets None Rate-limit Paginated
Polymarket TheGraph https://api.thegraph.com/subgraphs/name/polymarket/matic-markets None Rate-limit GraphQL
CoinGecko USDC https://api.coingecko.com/api/v3/simple/price?ids=usd-coin,tether&vs_currencies=usd None (50 req/min) 50/min free USDC peg monitoring
Deribit WebSocket wss://www.deribit.com/ws/api/v2 None for public No stated limit Subscribe to instruments
SEC EDGAR search https://efts.sec.gov/LATEST/search-index?forms=4&dateRange=custom&startdt={date} None Rate-limit (10 req/10s) Official free API
SEC EDGAR full-text https://efts.sec.gov/LATEST/search-index?forms=424B2 None Same For structured products
FINRA short volume https://www.finra.org/sites/default/files/short-sale-volume-files/ None None stated Daily CSV files
FINRA ATS https://www.finra.org/rules-guidance/transparency/finra-ats-data/ats-weekly-summary None None stated Weekly CSV
OCC CBOE customer/firm https://www.cboe.com/market-data/options-volume/ None Rate-limit Free daily data
CFTC COT options https://www.cftc.gov/dea/options/deaoiall.htm None None stated Weekly release
REG SHO list https://www.nasdaqtrader.com/trader.aspx?id=regsho None None Daily
Nasdaq earnings https://api.nasdaq.com/api/calendar/earnings?date={date} None Rate-limit Scrape
Yahoo Finance hist https://finance.yahoo.com/quote/{ticker}/history/ (via yfinance) None Rate-limit Historical OHLCV only — do NOT use for options chains
ETF.com flows https://www.etf.com/etfanalytics/etf-fund-flows-tool None (scrape) Rate-limit Weekly shares outstanding
Nasdaq dividends https://api.nasdaq.com/api/calendar/dividends?date={date} None Rate-limit Scrape
VSTOXX delayed https://www.stoxx.com/indices-data?isin=EU0009658152 None (scrape) Rate-limit 30-min delay on free
Eurex VIX futures https://www.eurex.com/ex-en/market-data/statistics/market-statistics-download None Rate-limit Alternative to VSTOXX scrape

SECTION N: COLLECTION CADENCE

Data Source Frequency Time DB Table
Polygon.io options chain (SPY, QQQ) Every 15 min 9:30 AM – 4:00 PM ET options_chain_intraday
Polygon.io options chain (earnings tickers) Every 15 min 9:30 AM – 4:00 PM ET options_chain_intraday
GEX calculation Every 15 min (after chain update) 9:30 AM – 4:00 PM ET gex_snapshots
VIX real-time Every 15 min 9:30 AM – 4:15 PM ET (used directly in daily_state)
Kalshi markets + orderbook Every 15 min 9:00 AM – 11:55 PM ET kalshi_markets
Polymarket markets Every 15 min 24/7 polymarket_markets
CME FedWatch Every 15 min 9:00 AM – 5:00 PM ET macro_signals
USDC/USDT peg Every 15 min 24/7 usdc_peg_monitor
JPY/CHF spot Every 15 min 24/7 macro_signals
Carry stress check Every 15 min 24/7 macro_signals
OCC daily files Daily 8:00 AM ET T+1 occ_daily_oi, options_chain_daily
FRED macro series Daily 6:00 PM ET After FRED publish fred_macro, macro_signals
VIX suite daily Daily 4:15 PM ET After close vix_daily
Realized volatility Daily 4:30 PM ET After OHLCV available realized_vol
IV history snapshot Daily 4:15 PM ET After close iv_history
Daily state table Daily 5:00 PM ET Compile from day's data daily_state
FINRA short volume Daily 6:00 PM ET T+1 data finra_short_volume
SEC Form 4 Daily 6:00 PM ET Filed throughout day sec_form4
SEC Form 424B2 Daily 6:00 PM ET Filed throughout day structured_products_424b2
REG SHO threshold Daily 6:00 PM ET After market close regsho_threshold
Earnings calendar Weekly (Friday) Update for next 14 days earnings_calendar
CFTC COT Weekly (Friday 3:30 PM ET) Data as of Tuesday cftc_cot
FINRA ATS dark pool Weekly (Friday) Week prior data finra_ats_dark_pool
VSTOXX Daily 8:00 AM ET Pre-market macro_signals
ETF flows (shares outstanding) Daily 5:00 PM ET From ETF issuer etf_flows
Dividend calendar Weekly (Monday) For next 14 days dividend_calendar
Treasury auctions Daily 1:00 PM ET Results published same day treasury_auctions
Implied correlation Daily 4:30 PM ET CBOE published EOD implied_correlation
Deribit BTC/ETH options Hourly snapshot + real-time WebSocket 24/7 (separate deribit tables)
Reddit sentiment Daily 4:30 PM ET After market close reddit_sentiment_daily

SECTION O: STORAGE ESTIMATES

Corrected estimates (Opus flaw fix):

SPY alone has ~5,000 active contracts across all strikes and expirations.

Realistic estimates:

Data Type Daily Volume Annual (raw)
SPY + QQQ intraday chains ~50 MB/day ~18 GB
Earnings tickers (average) ~5 MB/day ~1.8 GB
GEX snapshots ~1 MB/day ~365 MB
Kalshi/Polymarket ~5 MB/day ~1.8 GB
FRED/macro (tiny) <0.1 MB/day ~35 MB
OCC daily ~2 MB/day ~730 MB
Flows (Form 4, FINRA, etc.) ~0.5 MB/day ~180 MB
Total (raw) ~65 MB/day ~23 GB/year

After archival policy:

VPS headroom: 15 GB free on EdgeClaw VPS. Implement archival after 30 days to stay within budget. Consider separate storage for cold chain data.


SECTION P: KEY SQL QUERIES

P.1 Current GEX State

SELECT
  g.ticker,
  g.timestamp,
  g.net_gex,
  g.gex_regime,
  g.gamma_flip_level,
  g.gamma_wall_upper,
  g.gamma_wall_lower,
  g.gex_0dte_pct,
  ds.spy_close,
  ds.vix_close
FROM gex_snapshots g
JOIN daily_state ds ON date(g.timestamp) = ds.date
WHERE g.ticker = 'SPY'
  AND g.timestamp >= datetime('now', '-30 minutes')
ORDER BY g.timestamp DESC
LIMIT 1;

P.2 Active Arbitrage Signals

SELECT
  signal_type,
  underlying,
  event_description,
  venue_1, venue_1_price,
  venue_2, venue_2_price, venue_2_method,
  edge_cents,
  edge_after_fees_cents,
  confidence_pct,
  venue_1_liquidity_usd,
  expiration_time,
  detected_at
FROM arbitrage_signals
WHERE status = 'open'
  AND edge_after_fees_cents >= 3
  AND venue_1_liquidity_usd >= 300
  AND expiration_time > datetime('now')
ORDER BY edge_after_fees_cents DESC;

P.3 IV Rank and VRP Summary

SELECT
  iv.ticker,
  iv.date,
  iv.atm_iv,
  iv.iv_rank_52w,
  iv.iv_pct_252d,
  iv.skew_25d,
  rv.rv_cc_20d,
  rv.vrp_20d,
  rv.iv_rv_spread_20d,
  CASE
    WHEN rv.vrp_20d > 8 THEN 'EXPENSIVE_VOL_SELL'
    WHEN rv.vrp_20d < -3 THEN 'CHEAP_VOL_BUY'
    ELSE 'NEUTRAL'
  END AS vol_regime_signal
FROM iv_history iv
JOIN realized_vol rv ON iv.ticker = rv.ticker AND iv.date = rv.date
WHERE iv.date = date('now', '-1 day')
ORDER BY ABS(rv.vrp_20d) DESC;

P.4 Earnings Volatility Setup

SELECT
  ec.ticker,
  ec.earnings_date,
  ec.time_of_day,
  ec.options_implied_move_pct,
  ec.historical_avg_move_pct,
  (ec.options_implied_move_pct - ec.historical_avg_move_pct) AS vol_premium_pct,
  iv.atm_iv,
  iv.iv_rank_52w,
  CASE
    WHEN (ec.options_implied_move_pct - ec.historical_avg_move_pct) > 3.0
      AND iv.iv_rank_52w > 0.70 THEN 'IV_CRUSH_CANDIDATE'
    ELSE 'NORMAL'
  END AS setup_type
FROM earnings_calendar ec
JOIN iv_history iv ON ec.ticker = iv.ticker AND iv.date = date('now', '-1 day')
WHERE ec.earnings_date BETWEEN date('now') AND date('now', '+14 days')
  AND ec.actual_eps IS NULL  -- not yet reported
ORDER BY ec.earnings_date, vol_premium_pct DESC;

P.5 Unusual Options Activity

SELECT
  ticker,
  timestamp,
  strike,
  expiration_date,
  option_type,
  print_size_contracts,
  volume_to_oi,
  aggressor,
  sentiment,
  -- Join to get underlying context
  (SELECT spy_close FROM daily_state WHERE date = date(timestamp)) AS spy_context,
  (SELECT vix_close FROM daily_state WHERE date = date(timestamp)) AS vix_context
FROM options_sweeps
WHERE timestamp >= datetime('now', '-1 day')
  AND volume_to_oi > 10
  AND print_size_contracts > 500
ORDER BY print_size_contracts DESC
LIMIT 50;

P.6 Kalshi vs Options Bridge — Mispricing Scanner

SELECT
  a.signal_type,
  a.underlying,
  a.venue_1 AS prediction_market,
  a.venue_1_price AS pm_price,
  a.venue_2_price AS options_implied_prob,
  ROUND((a.venue_2_price - a.venue_1_price) * 100, 1) AS edge_cents_raw,
  a.edge_after_fees_cents,
  a.venue_1_liquidity_usd,
  k.spread_cents AS kalshi_spread,
  ds.vix_regime,
  ds.gex_regime,
  ds.convergence_pattern
FROM arbitrage_signals a
LEFT JOIN kalshi_markets k ON a.venue_1 = 'Kalshi'
  AND k.market_ticker LIKE '%' || a.underlying || '%'
  AND k.timestamp > datetime('now', '-20 minutes')
JOIN daily_state ds ON ds.date = date('now')
WHERE a.status = 'open'
  AND a.edge_after_fees_cents >= 4
  AND a.detected_at > datetime('now', '-4 hours')
ORDER BY a.edge_after_fees_cents DESC;

P.7 CME FedWatch vs Kalshi Divergence

SELECT
  ms.date,
  ms.cme_fedwatch_hold_prob,
  k.mid_price AS kalshi_hold_price,
  ROUND((ms.cme_fedwatch_hold_prob - k.mid_price) * 100, 1) AS divergence_cents,
  k.top3_bid_depth_usd,
  k.illiquid_flag
FROM macro_signals ms
JOIN kalshi_markets k ON date(k.timestamp) = ms.date
  AND k.market_ticker LIKE '%FED%HOLD%'
  AND k.timestamp = (
    SELECT MAX(timestamp) FROM kalshi_markets k2
    WHERE k2.market_ticker = k.market_ticker
      AND date(k2.timestamp) = ms.date
  )
WHERE ms.date = date('now')
  AND ABS(ms.cme_fedwatch_hold_prob - k.mid_price) > 0.05  -- >5 cent divergence
  AND k.illiquid_flag = 0;

P.8 GEX Intraday Change Tracker

SELECT
  g1.ticker,
  g1.timestamp AS current_time,
  g1.net_gex AS current_gex,
  g0.net_gex AS morning_gex,
  g1.net_gex - g0.net_gex AS gex_change,
  g1.gamma_flip_level,
  g1.gex_0dte_pct,
  CASE
    WHEN g0.gex_regime != g1.gex_regime THEN 'REGIME_CHANGE'
    WHEN ABS(g1.net_gex - g0.net_gex) > ABS(g0.net_gex) * 0.30 THEN 'LARGE_MOVE'
    ELSE 'STABLE'
  END AS gex_change_type
FROM gex_snapshots g1
JOIN gex_snapshots g0 ON g0.ticker = g1.ticker
  AND g0.timestamp = (
    SELECT MIN(timestamp) FROM gex_snapshots
    WHERE ticker = g1.ticker
      AND date(timestamp) = date(g1.timestamp)
  )
WHERE g1.ticker = 'SPY'
  AND g1.timestamp >= datetime('now', '-30 minutes')
ORDER BY g1.timestamp DESC
LIMIT 1;

SECTION Q: FLAW FIXES AND CRITICAL CORRECTIONS (Opus 4.6 Complete Review)

This section documents all identified flaws from the full panel review and their resolutions. Each is tracked for implementation.

# Flaw Severity Fix Status Implementation Note
1 GEX dealer directionality oversimplified CRITICAL Addressed in B.2 Use CBOE customer/firm volume; add confidence band
2 Yahoo Finance scraping for intraday chains CRITICAL Addressed in A.1 Polygon.io from Day 1, $29/month
3 SQLite write contention on 15-min polling HIGH Addressed in L (WAL mode) WAL mode + separate DBs + staggered schedules
4 Binary probability uses naive d2 (wrong for American options) HIGH Addressed in F.3 Use Bjerksund-Stensland; validate against Kalshi prices
5 VSTOXX lead time overstated (30-90 min claimed, 0-15 min actual) MEDIUM Addressed in C.6 Track VSTOXX/VIX spread instead; VIX futures as alternative
6 Storage estimate 4-5x too low MEDIUM Addressed in O Updated to ~23 GB/year raw, ~4-6 GB with archival
7 OCC data is T+1 — spec conflated timing MEDIUM Addressed in E.1 Use OCC as base; estimate intraday changes from volume
8 Polymarket USDC depeg — Deribit not a leading indicator MEDIUM Addressed in F.2 Use USDC/USDT on DEXes (CoinGecko) as actual indicator
9 Buyback blackout % hard to calculate MEDIUM Addressed in G.1 14-day pre-earnings heuristic; label as ±10% estimate
10 FRED credit spread has 24h lag MEDIUM Addressed in D.1 Bond ETF real-time proxies (JNK, LQD) for intraday
11 Phantom liquidation detection broken during big moves MEDIUM Addressed in E.4 Track calls and puts separately; only flag when both drop
12 Deribit validation creates false confidence MEDIUM Addressed in B.4 Validate against multiple sources, not Deribit alone
13 No explicit data quality layer CRITICAL (GPT-4.1) Addressed in K.1 data_quality_log table; validation rules per source
14 No backup/archival plan HIGH (GPT-4.1) Addressed in K.4 Daily SQLite dump; weekly off-server sync
15 Form 424B2 deferred to Month 2 CRITICAL (Sonar) Addressed in E.7 MVP in Week 1: manual tracking of top 5 products
16 Dividend data missing from Greeks calculation MUST-HAVE (Opus/Gemini) Addressed in B.1, J.2 Add q to all BS calculations; dividend_calendar table
17 Corporate actions not tracked MUST-HAVE (Gemini) Addressed in B.1 Pull from Polygon.io Reference Data API
18 Realized volatility not defined MUST-HAVE (Opus) Addressed in C.2 Three estimators at 5 windows each
19 IV rank database not built MUST-HAVE (Opus) Addressed in C.3 iv_history table; collect daily from Day 1
20 CME FedWatch not included MUST-HAVE (Opus) Addressed in D.2 Every 15 min during market hours
21 FINRA short volume not included MUST-HAVE (Opus) Addressed in E.5 Daily CSV pull at 6 PM ET
22 Bid-ask spread not tracked MUST-HAVE (Opus) Addressed in A.3 spread_pct = (ask-bid)/mid stored on every contract
23 Kalshi orderbook depth missing MUST-HAVE (Sonar) Addressed in F.1 Top 5 levels; illiquid_flag when depth < $500
24 Vega flip levels not calculated MUST-HAVE (Sonar) Addressed in B.3 Added to second_order_greeks table
25 Implied borrow rate not tracked MUST-HAVE (Sonar) Not yet in schema Future: track put/call parity deviations to back-calculate implied borrow

SECTION R: WHAT OPUS RECOMMENDS REMOVING OR DEFERRING

Item Action Reason
SGX Nifty as pre-market signal Remove Near-zero predictive value for US options; S&P futures cover the same information
CME CVOL for commodities/FX Defer (v2) Kalshi commodity markets have negligible volume; revisit in 6 months
Reddit r/wallstreetbets YOLO frequency Simplify SNR degraded since 2021; replace with r/options only + 3-sigma spike alerts
OCC Exercise and Assignment Stats (monthly) Defer Monthly granularity too coarse; only daily exercise data would be actionable, and OCC doesn't publish that for free
Full IV surface at 7 delta points × 9 expirations Simplify Use ATM IV, 25-delta skew, and IV at prediction market strikes. Covers 95% of use cases with 30% of complexity
All 8-9 VIX futures months Simplify M1, M2, and event month only
X/Twitter flow account aggregation 3x daily Simplify 1x daily for consensus themes only; actual flows captured better by volume/OI analysis
Full sector ETF chain tracking (XLF/XLE/XLK intraday) Simplify SPY already captures sector rotation; pull OI and P/C only for sector ETFs, not full intraday chains
MOVE Index tracking Replace Use T10Y2Y from FRED instead (already collected); stronger signal for options
CME CVOL for Kalshi commodity markets Defer Thin markets, low priority

SECTION S: CROSS-DESK SHARED INFRASTRUCTURE

The following components built for the Options Desk should be shared across all desks. Build as shared utilities, not desk-specific code.

Component Options Desk Use Other Desks
FRED macro pipeline Rates, credit spreads, liquidity Forex (yield differentials), Futures (Treasury data), Crypto (liquidity proxy)
Daily state table / calendar engine FOMC, CPI, OpEx, earnings flags ALL desks — this is the shared event calendar
Kalshi/Polymarket scraper framework Options-linked markets Weather (existing), Sports, Crypto — all prediction market desks
VIX suite and vol regime classification Core options signal Crypto (risk appetite), Forex (VIX/JPY correlation), Futures (roll costs)
Deribit crypto options feed Greeks validation + Poly liquidity Crypto desk (direct trading signal)
SEC EDGAR scraping infrastructure Form 4, 424B2 Stocks desk (insider signals), any desk needing corporate filing data
Binary probability calculator (Breeden-Litzenberger / Bjerksund-Stensland) Options-to-prediction-market bridge Any desk comparing model probabilities to prediction market prices (same math regardless of underlying event)
Credit spread monitoring (HY/IG from FRED + bond ETFs) Risk-off signal for options Forex (USD funding markets), Crypto (credit tightening leads crypto selloffs 1-5 days)
Carry trade indicator (JPY/CHF) Risk-off leading signal Forex desk (directly), Crypto desk
Data quality layer + circuit breaker All options data sources All desks — standardize the validation and alerting pattern

SECTION T: IMPLEMENTATION CHECKLIST

Phase 1 — Week 1: Core Foundation (Days 1-5)

Day 1-2: Prediction Markets (Execution Venues)

Day 2-3: Macro Foundation

Day 3-4: Vol Foundation

Day 4-5: Calendar and State

Day 5: Form 424B2 MVP (DO NOT DEFER)

Phase 2 — Week 2: Options Chain Engine

Day 6-7: Polygon.io Integration

Day 7-8: Greeks Engine

Day 8-9: IV Surface and RV

Day 9-10: Binary Probability Bridge

Phase 3 — Week 3: Flow and Positioning

Day 11: OCC Daily Files

Day 12: FINRA and REG SHO

Day 13: SEC EDGAR

Day 14: Sweep Detection and Flow Analytics

Day 15: CFTC COT

Phase 4 — Week 4: Validation and Refinements

Phase 5 — Month 2-3: Enhancements


SECTION U: BUILD ORDER (Opus 4.6 Recommended Sequence)

Step Component Effort Dependencies
1 Daily state table + calendar engine 2-3 days None
2 FRED API integration 1-2 days None
3 VIX suite from CBOE 2-3 days None
4 Kalshi scraper (adapt from weather desk) + orderbook depth 1-2 days Existing Kalshi code
5 Polymarket scraper + USDC monitoring 2-3 days None
6 Polygon.io subscription + chain data pipeline 3-5 days $29/month budget
7 Realized volatility calculators (close-to-close, Parkinson, Yang-Zhang) + VRP 1-2 days OHLCV data (Step 6)
8 Dividend calendar + corporate actions scraper 1 day None
9 GEX calculator (with CBOE customer/firm volume weighting) 3-5 days Chain data (Step 6)
10 Binary probability calculator (Bjerksund-Stensland + Breeden-Litzenberger) 2-3 days Chain + IV surface
11 Form 424B2 MVP (manual: 5 products × 3 issuers) 4 hours None — do first week
12 Earnings calendar + historical earnings database 2-3 days None
13 OCC daily files scraper + put/call ratios 1-2 days None
14 Phantom liquidation + unusual volume scanner + flow analytics 2-3 days Chain data (Step 6)
15 SEC EDGAR (Form 4 insider trades, REG SHO) 2-3 days None
16 CFTC COT options-only report parser 1 day None
17 FINRA short volume daily files 1 day None
18 CME FedWatch implied probabilities 1-2 days None
19 Convergence pattern detector 1 day Daily state (Step 1)
20 Reddit sentiment (simplified, r/options only) 1-2 days Low priority
21 Vanna/Charm calculator 1-2 days Chain data (Step 6)
22 Deribit crypto options feed 1 day Step 6 for validation context
23 Form 424B2 EDGAR full automation + regex 2-4 weeks Step 11 first
24 VSTOXX + international pre-market + JPY/CHF carry composite 2-3 days When core is stable

SECTION V: REALITY CHECK (Opus 4.6)

Realistic to build: Yes.

Biggest risk: Data pipeline reliability. You are pulling from 15+ sources on different schedules with different failure modes. CBOE might change their CSV format. OCC might delay publication. Polygon might have an outage. EDGAR might throttle you. Each individual source is manageable, but the combinatorial failure space is large. On any given day, at least one source will be broken. The system needs graceful degradation — every component must handle missing upstream data without crashing. Build circuit breakers and staleness alerts from Day 1. If SPY chain data is >30 minutes stale, the GEX calculation must flag itself as unreliable rather than silently using stale data.

First to break: Yahoo Finance scraping (if you do not use Polygon from Day 1). The second thing to break will be the 15-minute polling schedule — when multiple scrapers fire simultaneously, SQLite write contention will cause silent data loss (writes that timeout and get dropped). You will not notice until you see gaps in your time series days later.

Top change: Pay for Polygon.io ($29/month) from Week 1. Reliable options chain data is the FOUNDATION of this entire desk — GEX, vanna, charm, IV surface, binary probability calculator, flow detection, and earnings analysis all depend on it. Building on Yahoo Finance scraping is building on sand. $29/month is $0.97/day. One missed trading signal due to stale/missing chain data will cost more than a year of Polygon subscription. Additionally, Polygon's historical data lets you backfill the IV rank database immediately rather than waiting 252 days.

[PANEL: Sonar Reasoning Pro] What would make this A+:

  1. Explicit data quality SLA per source (e.g., "Kalshi data must update within 20 minutes or alert") — implemented in Section K.2
  2. Backtest plan for each signal before Month 1 launch: GEX flip predicts gamma needs 30 days of proof before you trust it in production
  3. Dependency graph (DAG) showing which calculations depend on which data — understand failure cascades before building
  4. Fallback strategy for each API/source: what happens if CBOE goes down, if Kalshi API times out
  5. Explicit signal routing: which signals go to which analyst AI on the desk (not currently defined in this spec)
  6. Cost-of-delay analysis on Form 424B2: manual MVP (4 hours) vs Month 2 automation — the MVP is almost free, the delay is 8 weeks of asymmetric alpha

Merged final complete. Opus 4.6 is the unmodified base. All panel additions are tagged with [PANEL: Model Name]. All flaw fixes are resolved inline. Sections K-V are implementation additions not in the original Opus review but required by the spec rules.


TODO: Upgrade Kalshi REST to WebSocket Feed

Status: NOT BUILT — add when this desk goes live for execution

Current state: All Kalshi data (prices, order books, trades) is fetched via REST API polling on cron schedules. This is fine for edge detection and monitoring, but NOT sufficient for live trade execution.

Why WebSocket matters:

Note: This desk primarily trades on other exchanges, but Kalshi has event markets (rate decisions, index levels, crypto milestones) that overlap with this desk. WebSocket upgrade applies when trading any Kalshi markets.

Added 2026-03-29 — upgrade REST to WS when desk moves to live execution

Source: ~/edgeclaw/results/spec-panel/options-desk/spec-final.md