Weather Lock-In Desk — Merged Final Spec

Compiled: 2026-03-15

Base: Claude Opus 4.6 (complete, unmodified)

Additions: cherry-picked unique contributions from all panel models, tagged [PANEL: Model Name]

Rule: NWS API ONLY for weather data. NO web search for temperatures.


Panel Grades Summary

Model Grade Cost
Claude Opus 4.6 B $0.5154
GPT-4.1 A-
Gemini 3.1 Pro B $0.0221
Grok 4.1 B $0.0219
DeepSeek V3.1 B $0.0011
Sonar Reasoning Pro D
Grok 4 Fast B

1. Strategy Overview

Core Signal: When a city's daily high temperature drops by 4°F or more after its typical peak time, the probability of recovery (returning to or exceeding the prior high) is extremely low (~92-99% lock-in rate depending on threshold). This creates a mispricing in Kalshi prediction market contracts for "will today's high exceed X°F?" — contracts that should settle at 0 remain priced above 5-89 cents.

Edge Type: Reactive (confirmed) → Predictive (planned v2).

Data Rule: NWS API only for all temperature observation and forecast data. Never use web search for temperature data. NWS provides free, no-key-required access to official hourly observations and forecasts — the same source Kalshi uses for settlement.


2. Missing Data — MUST-HAVE

2.1 NWS Hourly Forecast Data Per City

Why: The next priority is forecast modeling to predict highs earlier, but the spec includes zero specification for ingesting actual weather forecasts. NWS provides free hourly forecasts with predicted highs, wind, cloud cover, and precipitation probability. Without this, you're building a "predictive" system with only observational data — you can't predict the high without knowing what meteorologists think it will be. Comparing forecast vs. actual is also how you find when the market is mispriced (market follows forecasts; if forecast is wrong, contracts are mispriced).

Source: NWS API (api.weather.gov) — completely free, no API key needed, hourly resolution, includes forecast confidence. Endpoint: /gridpoints/{office}/{x},{y}/forecast/hourly

Priority: MUST-HAVE

2.2 Multiple Forecast Model Ensemble (GFS, NAM, HRRR, ECMWF)

Why: When models disagree on the daily high, that's where the biggest pricing edge exists. If GFS says 82°F and HRRR says 78°F, the market will price somewhere in between, and whichever model is right creates a massive mispricing. Model disagreement = opportunity signal. HRRR (High-Resolution Rapid Refresh) updates hourly and is specifically designed for short-range forecasting — perfect for intraday temperature prediction.

Source: Open-Meteo API (free, provides GFS/ECMWF/ICON ensemble). ECMWF via Open-Meteo is the gold standard. NWS NDFD (National Digital Forecast Database) provides official NWS model output.

Priority: MUST-HAVE

2.3 Cloud Cover, Wind Speed/Direction, and Humidity Real-Time Data

Why: These are the primary physical drivers of whether a temperature will recover after a drop. A 4°F drop caused by a passing cloud bank is very different from a 4°F drop caused by a cold front arrival. Cloud cover increasing = sun blocked = high likely locked. Wind shifting to northerly = cold air advection = high locked. Without these, you can't distinguish recoverable drops from permanent ones, which is exactly what causes the 7.7% failure rate at the 4°F threshold.

Source: NWS API observations (free, unlimited). Endpoint: /stations/{stationId}/observations/latest — includes cloud cover (sky condition), wind speed, wind direction, relative humidity, all in a single call.

Priority: MUST-HAVE

2.4 Dew Point Data

Why: Kalshi weather markets in some cities resolve on "feels like" or heat index, not raw temperature. Even if they resolve on actual temp, dew point is the single best predictor of overnight/late-day cooling rate. High dew point = temperature drops slowly (humid air holds heat). Low dew point = temperature drops fast (dry air loses heat quickly). This directly affects how fast and how far the drop goes after peak.

Source: NWS API observations (same endpoint as 2.3 — dew point is a standard field).

Priority: MUST-HAVE

2.5 Exact Kalshi Market Resolution Rules Per City

Why: The spec never specifies which weather station Kalshi uses for each city, or the exact resolution criteria (e.g., does "NYC high temperature" use Central Park, LaGuardia, JFK?). Different stations can differ by 2-5°F on the same day. If you're monitoring the wrong station, your "lock-in" signal fires on data that doesn't match what the market settles on. This is a silent killer — you think you won but the settlement station recorded a different number.

Source: Kalshi market rules pages — each market has a "resolution source" field. Document and hardcode the NWS station ID for each city's Kalshi contract. Verify by cross-checking several historical settlements.

Priority: MUST-HAVE

2.6 Historical Temperature Data (Multi-Year) Per City

Why: 11 days of backtest data across one season is dangerously thin. March weather patterns are not representative of July patterns. Desert cities (Phoenix, Vegas) behave completely differently in summer vs. winter. You need at least 1-2 years of hourly temperature data to validate that the lock-in rates hold across seasons, weather regimes, and edge cases. Without this, you're curve-fitting to a tiny sample.

Source: NOAA ISD/LCD (free, hourly data going back decades). API: https://www.ncei.noaa.gov/access/services/data/v1/. Also Open-Meteo Historical API (free, hourly data back to 1940, zero key needed).

[PANEL: GPT-4.1] Meteostat is an additional free source with clean hourly data going back decades and a simple Python/JSON API — useful as a cross-validation source for NOAA ISD.

Priority: MUST-HAVE

2.7 Sunrise/Sunset and Solar Angle Data Per City

Why: The spec identifies that morning drops are unreliable and late drops are reliable, and that cities have different peak times — but doesn't track the underlying physical driver: solar position. Solar noon (when the sun is highest) is the primary determinant of when peak temperature occurs (typically 2-4 hours after solar noon due to thermal lag). This varies by latitude, longitude within timezone, and time of year. Hardcoding peak times (e.g., "Phoenix = 5PM") will break as seasons change. Solar angle data makes peak time prediction dynamic and accurate year-round.

Source: Calculable from latitude/longitude + date (simple astronomy formula). The suncalc npm package provides solar noon, sunrise, sunset, and solar angle with zero API cost.

Priority: MUST-HAVE

2.8 Kalshi Contract Settlement History

Why: You need to know how every past weather contract actually settled to validate your backtest. The spec shows ROI calculations but doesn't specify whether these were validated against actual Kalshi settlements or just against temperature data. If there's any discrepancy between your temperature source and Kalshi's resolution source, your backtest is wrong.

Source: Kalshi API — /markets/{ticker}/history or settlement endpoints. Free with account.

Priority: MUST-HAVE

2.9 Exact Timestamped Dip and High Events Per City

[PANEL: GPT-4.1] Not just daily summaries — the system needs timestamped logs of every dip event and every peak event per city. This is needed for the predictive dip-timing analysis, and to build city-specific "peak window" models. The new data collector must explicitly log: timestamp of running high, timestamp of first drop past threshold, whether a recovery occurred and at what time.

Source: Derived from the real-time temperature collector — add explicit event logging, not just raw snapshots.

Priority: MUST-HAVE

2.10 Intra-Day Temperature Volatility Metrics Per City

[PANEL: Grok 4.1] Track historical intra-day volatility per city (rolling 30/60/90-day standard deviation of hourly temperature deltas). Phoenix in summer has wild temperature swings; NYC in winter is stable. You cannot use the same 4°F threshold everywhere. Some cities rarely move 4°F; others do it weekly. Volatility metrics help set city-specific drop thresholds and estimate signal frequency.

Source: Calculate from NOAA ISD historical data once downloaded. Store per-city, update rolling.

Priority: MUST-HAVE

2.11 City Correlation Matrix

[PANEL: Sonar Reasoning Pro] Critical risk gap: if a weather system causes NO recovery in NYC, it likely simultaneously affects Boston, Philadelphia, and DC. The spec treats each city as independent, but weather patterns are regional. This creates correlated drawdown that will damage the strategy during regional weather events. You need a rolling correlation matrix: when City A locks in, what is the conditional probability that Cities B, C, D also lock in on the same day?

Source: Calculate from existing weather_monitor.json data across city pairs. Compute rolling 24-hour correlation coefficients. Update weekly as new data accumulates.

Priority: MUST-HAVE

2.12 NWS Forecast Accuracy Baseline

[PANEL: Sonar Reasoning Pro] The spec mentions forecasts trigger false positives but never measures how often NWS predicts the actual high correctly. You need a baseline: "NWS predicted 85°F, actual was 83°F — how often does this occur per city per season?" Without this, you cannot distinguish signal from noise, and cannot build the predictive layer. This also tells you the size of the exploitable edge: if NWS is wrong by 3°F+ on 15% of days, that 15% is where the largest mispricings will cluster.

Source: NWS API forecast vs. NWS API observations, compared daily. Free, same data source.

Priority: MUST-HAVE

2.13 False-Positive Tracking by Peak-Time Window

[PANEL: Sonar Reasoning Pro] The spec claims "late afternoon drops are very reliable" but provides zero systematic evidence. You need: for all 4°F+ drops broken out by time-of-day window (before noon, noon-2PM, 2-4PM, 4PM+) and by city, what fraction locked in vs. recovered? Currently this is anecdotal. This data already exists partially in weather_monitor.json — it just needs to be computed and stored.

Source: Analyze existing data. Add explicit time-window bucketing to all new event logs.

Priority: MUST-HAVE

2.14 Precipitation Radar / Active Weather Alerts

Why: Thunderstorms cause sudden, dramatic temperature drops (10-20°F in minutes) that can look like lock-in signals but are followed by rapid recovery once the storm passes. An "active storm" filter would eliminate false signals. Approaching frontal systems are the #1 cause of late-day temperature spikes that break lock-in assumptions.

Source: NWS alerts API (free) — https://api.weather.gov/alerts/active?point={lat},{lon}. RainViewer API (free tier) for radar.

Priority: NICE-TO-HAVE

2.15 Polymarket Order Book Data and Contract Mapping

Why: The spec mentions dual-platform execution (Kalshi + Polymarket) to double capacity, but includes zero specification for Polymarket data collection, API integration, or contract-to-contract mapping between platforms. If this is part of the scaling plan, it needs a data spec.

Source: Polymarket CLOB API (free). Gamma Markets API for market discovery.

Priority: NICE-TO-HAVE (defer to v2)

2.16 Kalshi Trading Volume / Time-of-Day Liquidity Patterns

Why: If most liquidity shows up at market open and dries up by late afternoon, but your best signals fire at 5-7PM (late peakers), you'll have signals with no one to trade against. Understanding liquidity timing is critical for execution.

Source: Derivable from existing market_depth.json data if timestamps are included. Group by hour of day, track average depth.

Priority: NICE-TO-HAVE

2.17 Market-Maker Inventory Behavior (Bid Quantity Before vs. After Drop)

[PANEL: Sonar Reasoning Pro] When a 4°F drop signal fires, do market makers immediately pull their bids, or do they stay? If bid quantities vanish within 5 minutes of a drop, you're already too late to fill at the listed price. Track: bid quantity, ask quantity, and spread size in the 30 minutes before vs. after each historical drop signal in market_depth.json. This determines whether the strategy is trading against smart liquidity or exploitable retail flow.

Source: Existing market_depth.json — retrospective analysis. Add to real-time collector going forward.

Priority: NICE-TO-HAVE

2.18 Solar Radiation / Insolation Data

[PANEL: Grok 4 Fast / Gemini 3.1 Pro] Critical for late-peaking desert cities like Phoenix and Las Vegas — solar radiation directly predicts peak time and recovery likelihood. A drop occurring while solar radiation is still high means a recovery is more likely (sun still heating). A drop after peak solar radiation is a much more reliable lock-in signal. Reduces morning false signals by validating "sun past peak" before firing.

Source: NOAA Solar Data API (free). Alternatively, derive approximately from solar angle (via suncalc) combined with cloud cover percentage from NWS observations.

Priority: NICE-TO-HAVE


3. Critical Flaws and Fixes

3.1 Temperature Resolution ~1.8°F (1°C) — Undermines All Thresholds

Severity: CRITICAL

The spec notes this but doesn't address the implication: a "4°F drop" is actually either a 3.6°F drop or a 5.4°F drop — there is no actual 4°F drop in the data. This means the 4°F and 5°F thresholds in the backtest are actually measuring the same underlying event (a 2°C / 3.6°F drop), which explains why their stats are nearly identical (99.1% win rate, ~20.9% ROI for both). The entire threshold analysis is an artifact of Celsius-to-Fahrenheit conversion.

Fix: Redo all analysis in Celsius with integer thresholds (2°C drop, 3°C drop, etc.). NWS API reports in Fahrenheit natively with 1°F resolution — use NWS API as the primary observation source to eliminate the resolution artifact entirely.

[PANEL: DeepSeek V3.1] Verify raw sensor resolution before assuming 1°C rounding — many ASOS stations report to 0.1°F. The rounding may occur in the downstream data pipeline, not at the sensor. Check the raw METAR/ASOS data directly.

3.2 11 Days of Backtest Data — Statistically Insufficient for Capital Deployment

Severity: CRITICAL

230 city-days sounds like a lot, but it's 11 calendar days × ~20 cities. The weather patterns in those 11 days are highly correlated (same synoptic weather regime affects multiple cities simultaneously). The effective sample size is closer to 11 independent observations, not 230. A single unusual weather week could produce results that look nothing like this backtest.

Fix: Before deploying real capital, backtest against at least 6-12 months of NOAA historical hourly data for target cities. The current data is a proof-of-concept, not a deployment-ready backtest.

[PANEL: Sonar Reasoning Pro] The sample size is even worse than it appears: with 4-5 independent decision points per city per day, you have approximately 4-5 observations per decision type across 11 days. You need 30+ per decision type for any statistical confidence. Currently at ~1/7 of what is needed. Do not trade real capital until 90+ days of clean continuous data from EdgeClaw is collected.

3.3 Kelly Criterion Projections Are Misleading and Dangerous

Severity: CRITICAL

The Kelly projections showing $1 billion from $5,000 in 180 days are not just theoretical — they're wrong. Kelly assumes: (1) accurate edge estimation (you have 11 days of data), (2) independent bets (weather across cities is correlated), (3) known probability distribution (you don't know the true recovery rate with this sample size). The confidence interval on a 92.3% win rate from 91 events includes values as low as ~85%, which would dramatically change optimal Kelly sizing.

Fix: Remove Kelly projections entirely, or present them with proper confidence intervals and correlation adjustments. Use flat sizing until you have 6+ months of live trading data to estimate true edge.

[PANEL: Sonar Reasoning Pro] Replace Kelly table with honest projections: "If strategy hits 20% ROI on $200/city across 15 cities = $600/day. Liquidity ceiling limits scale to ~$2,000-3,000/day." That's the real ceiling.

3.4 Hardcoded City Peak Times Will Break With Seasonal Changes

Severity: HIGH

The spec lists fixed peak times (e.g., "Phoenix = 5PM") based on 11 days in early March. Peak temperature time shifts significantly with seasons — in Phoenix, summer peaks can occur at 3-4PM while winter peaks are 2-3PM. The thermal lag after solar noon also varies with humidity, cloud patterns, and surface type.

Fix: Replace hardcoded peak times with a dynamic model: solar_noon + thermal_lag(city, season, humidity). Or at minimum, use a rolling 30-day average of observed peak times per city. Do not hardcode seasonal parameters into the codebase.

3.5 ROI by Bid Range Analysis Has Survivorship Bias

Severity: HIGH

The "51-89c bid range = 171.7% ROI" finding is based on contracts where the market was pricing a 51-89% chance of the temperature going higher, and the market was wrong every time. But in an 11-day sample, this could easily be 3-5 contracts total. The sample size per bid bucket is not reported, making the ROI figures unreliable.

Fix: Report sample sizes for every bid range bucket. Do not trade the 51-89c bucket without at least 50+ observations confirming the edge.

3.6 No Specification for Which Weather Station Maps to Each Kalshi Market

Severity: HIGH

If your temperature monitor uses one weather station and Kalshi settles on a different one, your entire system is measuring the wrong thing. A 2-3°F discrepancy between stations is common.

Fix: Document the exact resolution source (NWS station ID) for every Kalshi weather market and ensure your data collection uses that exact station.

3.7 The Mar 9 Data Gap Is Not Accounted for in Backtest Statistics

Severity: HIGH

With only 1-2 snapshots/day after March 9, most intraday drops and recoveries would have been missed. The Mar 9-13 data is unreliable for lock-in analysis. "No observed recovery" should not be treated as "confirmed lock-in" when the system wasn't watching. If this data is included in the 230 city-days, lock-in rates are artificially inflated.

Fix: Exclude Mar 9-13 data from all statistics, or clearly flag it. Recompute all rates using only Mar 1-8 data (~160 city-days).

3.8 No Accounting for Kalshi Fees, Slippage, or Spread Costs

Severity: MEDIUM

Kalshi charges fees on trades (typically 1-2c per contract on each side). The ROI calculations appear to be gross, not net. On a 10c contract, a 2c round-trip fee is a 20% cost. On the 11-30c bucket (26.4% gross ROI), fees could eat half the profit.

Fix: Include Kalshi fee schedule in all ROI calculations. Also model bid-ask spread impact — if you're selling NO contracts, you're hitting the bid, not the mid price.

[PANEL: Grok 4 Fast] Taxes also apply. Kalshi sends 1099-B forms. Include estimated tax drag in projections.

3.9 SQLite Locking Under High-Frequency Writes

Severity: MEDIUM

SQLite handles this volume fine for writes, but concurrent reads during analysis while writes are happening can cause locking issues. Order book depth data is nested/hierarchical (multiple price levels per market per snapshot), which is awkward in SQLite without careful schema design.

Fix: Enable WAL mode for SQLite. For order book data specifically, store snapshots as JSON blobs with indexed metadata columns (timestamp, city, market_id). See schema section below.

3.10 Timezone Blindness in Cron Job

[PANEL: Gemini 3.1 Pro] The cron runs 8AM-10PM ET. 10PM ET is 7PM in Phoenix/Las Vegas. During summer, desert highs can occur at 6-7PM local time. The cron might shut off exactly when the most profitable late-peaking cities are locking in. Run the cron 24/7 or define per-city shutdown times based on the latest possible peak time for each city by season.

Severity: MEDIUM

3.11 Adverse Selection Risk from Faster Market Participants

[PANEL: Gemini 3.1 Pro] Weather markets are potentially read by algorithmic traders monitoring raw METAR (aviation) data. A 10-minute cron job is slow compared to real-time feeds. By the time the script sees the drop, faster participants may have already repriced the contracts. This doesn't necessarily kill the edge — retail-dominated markets may still lag — but it needs to be monitored: track whether fills occur at the listed bid price or at a worse price.

Severity: MEDIUM (monitor in paper trading before escalating)

3.12 Correlated Drawdown Is Unmodeled

[PANEL: Sonar Reasoning Pro] Weather patterns are regional. When one city doesn't recover, 5-10 adjacent cities often don't either. A cold front moves through the Northeast and simultaneously prevents recovery in NYC, Boston, Philadelphia, and DC. On simultaneous-loss days, a 5-city blowout on $5k bankroll is a significant drawdown. The spec assumes independence between cities, which is false.

Fix: Build a correlation matrix (see 2.11). Run Monte Carlo simulation modeling a "regional weather event" scenario before deploying capital. Limit simultaneous exposure to correlated city clusters.

Severity: CRITICAL (for capital deployment)

3.13 Recovery Definition Is Imprecise

[PANEL: Sonar Reasoning Pro] The spec reports "recovery chance: 20.7%" but never precisely defines what constitutes a recovery. Is it: (a) temp reaches exact prior high, (b) temp exceeds prior high by any amount, (c) temp comes within 1°F of prior high?

Fix: Define precisely: "Recovery = temperature reaches or exceeds the running high by any amount before market close." Separately track: partial recovery (within 1°F), full recovery (meets or exceeds), and overshoot (exceeds prior high by 1°F+). Each has different implications for contract settlement.

Severity: MEDIUM

3.14 Vultr Server as Primary Collector

[PANEL: DeepSeek V3.1] The Oracle EdgeClaw server should be the primary data collection platform. Vultr (451MB RAM, unreliable cron history) should be a backup only. All new data collection should be built on EdgeClaw from day one, with a fallback to Vultr only if EdgeClaw is unreachable.

Severity: MEDIUM


4. Over-Engineered Elements

4.1 Kelly Criterion Projections and Bankroll Growth Modeling

Recommendation: Remove

With 11 days of data and known liquidity caps of ~$200-315/city, Kelly sizing is irrelevant. You're liquidity-capped almost immediately. The projections add no actionable information and create false confidence. Replace with simple flat-bet sizing based on liquidity constraints.

4.2 Tracking 20 Cities Simultaneously from Day One

Recommendation: Simplify

Start with 5-6 cities that have the best liquidity (Chicago, NYC, Phoenix, Houston, Dallas, Miami). These cover early/mid/late peakers and have the deepest order books. Adding 14 more cities with thin liquidity (SF at 658 avg depth, Seattle at 733) adds complexity without proportional profit. Scale to 20 cities in v2 after the core system is proven.

[PANEL: Sonar Reasoning Pro] Start with 3-5 cities for MVP. Prove the edge is real first. 15 extra cities = 15x data complexity + 15x execution complexity + 15x correlated risk to manage.

4.3 Dual-Platform Execution (Kalshi + Polymarket) at Launch

Recommendation: Defer to v2

Building and maintaining two platform integrations doubles the API work, doubles the edge cases, and requires contract mapping. Get one platform working reliably first.

4.4 Full Order Book Depth at Every 10-Minute Interval

Recommendation: Simplify

Collect full depth once per hour, collect only best bid/ask at the 10-minute interval. Reduces data volume by ~80% while preserving signal quality.

[PANEL: Sonar Reasoning Pro] For small-liquidity cities (SF, Seattle), collect only top-of-book (best bid/ask + quantity). Full depth only matters for high-liquidity cities where fill impact analysis is relevant.

4.5 Tracking Contracts Below 10c Bid

[PANEL: Grok 4.1] The spec already concludes these are not worth the risk (ROI 3.6%). Collecting and analyzing this data adds noise without value. Filter out sub-10c contracts from all analysis tables.

Recommendation: Remove from analysis (still collect for data completeness, but exclude from signals)

4.6 Predicting the Exact Minute of a Temperature Dip Using AI

[PANEL: Gemini 3.1 Pro] Weather is a chaotic system. Predicting the exact minute a cloud covers a sensor is not achievable reliably. Instead of predicting the exact time, use radar/wind data to predict the probability of an early peak, and react instantly to 1°C (1.8°F) threshold crossings. Probabilistic prediction, not point prediction.

Recommendation: Simplify the v2 predictive layer


5. Build Order

Step 1: Weather Data Collection (Observations + Forecasts) for 5-6 High-Liquidity Cities

Effort: 2-3 days

Everything depends on reliable, accurate temperature data from the correct stations. This is the foundation. Collect:

  1. Actual temperature every 10 min from NWS API observations (correct station per Kalshi contract)
  2. NWS hourly forecast (predicted high, wind, cloud cover, precipitation probability)
  3. Current conditions: cloud cover, wind speed/direction, dew point, humidity (all from NWS)
  4. Active NWS weather alerts for each city

Validate that your observation source matches Kalshi's resolution source for each city before writing any trading logic.

[PANEL: DeepSeek V3.1] Build on Oracle EdgeClaw as primary. Add Vultr as a passive backup that runs the same collector and writes to a secondary file. Cross-check for gaps.

[PANEL: GPT-4.1] Implement health checks from day one: if a city's data collector misses 2+ consecutive cycles, send a Telegram alert immediately. Log all collection attempts with timestamps and HTTP status codes.

Step 2: Historical Backtest with NOAA Data (6-12 months)

Effort: 2-3 days

Before building any trading logic, validate the lock-in hypothesis against a statistically meaningful dataset. Pull hourly temperature data from NOAA ISD for 5-6 cities going back 6-12 months. Rerun the drop analysis in Celsius (not Fahrenheit) to eliminate the resolution artifact. This either confirms the edge or saves you from building a system around a mirage.

[PANEL: Sonar Reasoning Pro] Simultaneously build the city correlation matrix from this historical data. Running 30+ days of data, calculate how often cities lock in together on the same day. This is the most important risk metric.

Step 3: Kalshi Market Data Collection (Prices + Settlement Rules)

Effort: 1-2 days

Pull market listings, current bid/ask, and document resolution sources. Document the exact NWS station ID that Kalshi uses for each city. Validate by checking 5-10 historical settlements against your station's recorded values.

[PANEL: GPT-4.1] Also collect and store contract expiration time, settlement time, and contract rules text for each market. These are needed to prevent trading after the lock-in window closes.

Step 4: Core Lock-In Signal Engine with Dynamic Peak Time Filters

Effort: 2-3 days

This is the reactive system — detect when a 2°C (3.6°F) drop threshold is crossed after the city's estimated peak time. Use dynamic peak time estimation (solar_noon + thermal_lag) rather than hardcoded times. Start with the proven 2°C drop threshold. Add filters for active weather alerts, storm radar.

Step 5: Contract Pricing Analyzer (Identify Mispriced Contracts When Signal Fires)

Effort: 1-2 days

When the lock-in signal fires, scan all contracts above the current running high. Identify which ones have bids >10c (the profitable range). Calculate expected value after Kalshi fees. This turns the signal into actionable trade recommendations.

Step 6: Paper Trading / Shadow Mode (2-4 Weeks)

Effort: 2-4 weeks calendar time, minimal dev

Log every signal, every recommended trade, and every outcome. Compare against actual Kalshi settlements. Track: signal accuracy, timing, contract availability, theoretical P&L after fees. Also track whether fills would have been achievable at the listed bid price.

[PANEL: Sonar Reasoning Pro] During paper trading, run the execution impact simulation: take live order book snapshots, simulate placing your full position size into the book, calculate the weighted average execution price vs. the listed bid. This will reveal the true slippage before real capital is at risk.

Step 7: Forecast-Based Predictive Signal (v2)

Effort: 1-2 weeks

This is the aspiration: catch mispriced contracts at 30-50c instead of 5c by predicting the high earlier. Requires: comparing NWS forecast vs. ensemble models (GFS/HRRR via Open-Meteo), identifying when models disagree with market pricing, building confidence scores, and tracking NWS forecast accuracy over time as the baseline.

[PANEL: Sonar Reasoning Pro] The real edge for v2 may be: "NWS forecasted high 87°F, market priced in 87°F high, but NWS is wrong by 3°F+ on 15% of days." Identifying those days before the drop is 10x better than reacting after. Track forecast accuracy as the primary research task in parallel with Step 1.

Step 8: Scale to 20 Cities + Polymarket Integration (v3)

Effort: 3-5 days

Once the core system is profitable on 5-6 cities, expand to the full 20 and add Polymarket as a second execution venue.


6. SQLite Schema — CREATE Statements

-- Enable WAL mode immediately after opening database
PRAGMA journal_mode=WAL;
PRAGMA synchronous=NORMAL;

-- Cities configuration: maps each tracked city to its Kalshi resolution station
CREATE TABLE IF NOT EXISTS cities (
    city_id       TEXT PRIMARY KEY,          -- e.g., 'NYC', 'PHX', 'CHI'
    display_name  TEXT NOT NULL,             -- e.g., 'New York City'
    nws_station   TEXT NOT NULL,             -- e.g., 'KNYC' (NWS ASOS station ID)
    nws_office    TEXT NOT NULL,             -- e.g., 'OKX' (NWS forecast office)
    nws_grid_x    INTEGER NOT NULL,          -- NWS forecast grid coordinates
    nws_grid_y    INTEGER NOT NULL,
    latitude      REAL NOT NULL,
    longitude     REAL NOT NULL,
    timezone      TEXT NOT NULL,             -- e.g., 'America/New_York'
    kalshi_market_prefix TEXT,               -- e.g., 'HIGHNY' (Kalshi ticker prefix)
    kalshi_station_id TEXT,                  -- Station ID Kalshi uses for settlement
    peak_time_est TEXT,                      -- Initial static estimate, overridden by dynamic model
    is_active     INTEGER NOT NULL DEFAULT 1,
    created_at    TEXT NOT NULL DEFAULT (datetime('now'))
);

-- Raw temperature observations: one row per city per collection cycle
CREATE TABLE IF NOT EXISTS temperature_obs (
    id              INTEGER PRIMARY KEY AUTOINCREMENT,
    city_id         TEXT NOT NULL REFERENCES cities(city_id),
    observed_at     TEXT NOT NULL,           -- ISO8601 UTC timestamp
    temp_f          REAL NOT NULL,           -- Temperature in Fahrenheit (NWS native)
    temp_c          REAL GENERATED ALWAYS AS (ROUND((temp_f - 32) * 5.0 / 9.0, 2)) STORED,
    dew_point_f     REAL,
    humidity_pct    REAL,
    wind_speed_mph  REAL,
    wind_dir_deg    INTEGER,                 -- 0-360 degrees
    wind_dir_card   TEXT,                    -- e.g., 'NNW'
    sky_condition   TEXT,                    -- NWS sky condition code: CLR, FEW, SCT, BKN, OVC
    cloud_cover_pct REAL,                    -- Derived from sky condition
    solar_angle_deg REAL,                    -- Calculated via suncalc at observation time
    data_source     TEXT NOT NULL DEFAULT 'NWS',
    raw_json        TEXT,                    -- Full NWS API response blob
    created_at      TEXT NOT NULL DEFAULT (datetime('now')),
    UNIQUE(city_id, observed_at)
);

CREATE INDEX IF NOT EXISTS idx_temp_city_time ON temperature_obs(city_id, observed_at);
CREATE INDEX IF NOT EXISTS idx_temp_observed ON temperature_obs(observed_at);

-- Running daily high tracker: updated in real time as observations come in
CREATE TABLE IF NOT EXISTS daily_highs (
    id              INTEGER PRIMARY KEY AUTOINCREMENT,
    city_id         TEXT NOT NULL REFERENCES cities(city_id),
    date_local      TEXT NOT NULL,           -- YYYY-MM-DD in city's local timezone
    running_high_f  REAL NOT NULL,
    running_high_at TEXT NOT NULL,           -- Timestamp of current high observation
    solar_noon_utc  TEXT,                    -- Calculated solar noon for this date/city
    est_peak_time   TEXT,                    -- Dynamic peak time estimate (solar_noon + thermal lag)
    lock_in_at      TEXT,                    -- Timestamp when lock-in signal fired (NULL if not fired)
    lock_in_temp_f  REAL,                    -- Temperature that triggered lock-in signal
    drop_size_f     REAL,                    -- Size of drop that triggered signal (in F)
    drop_size_c     REAL,                    -- Size of drop in Celsius
    did_recover     INTEGER,                 -- 0=locked in, 1=recovered (NULL=pending)
    recovery_at     TEXT,                    -- Timestamp of recovery (if occurred)
    recovery_high_f REAL,                    -- High after recovery (if occurred)
    settlement_high_f REAL,                  -- Official Kalshi settlement value (filled post-close)
    updated_at      TEXT NOT NULL DEFAULT (datetime('now')),
    UNIQUE(city_id, date_local)
);

CREATE INDEX IF NOT EXISTS idx_daily_city_date ON daily_highs(city_id, date_local);
CREATE INDEX IF NOT EXISTS idx_daily_lockin ON daily_highs(lock_in_at) WHERE lock_in_at IS NOT NULL;

-- Lock-in signal events: one row per fired signal
CREATE TABLE IF NOT EXISTS lockin_signals (
    id              INTEGER PRIMARY KEY AUTOINCREMENT,
    city_id         TEXT NOT NULL REFERENCES cities(city_id),
    date_local      TEXT NOT NULL,
    fired_at        TEXT NOT NULL,           -- UTC timestamp signal fired
    prior_high_f    REAL NOT NULL,           -- Running high before the drop
    current_temp_f  REAL NOT NULL,           -- Temperature that triggered signal
    drop_size_f     REAL NOT NULL,
    drop_size_c     REAL NOT NULL,
    time_since_solar_noon_min INTEGER,       -- Minutes after solar noon (negative = before)
    sky_condition   TEXT,                    -- Cloud cover at signal time
    wind_speed_mph  REAL,                    -- Wind speed at signal time
    wind_dir_deg    INTEGER,
    active_nws_alert INTEGER DEFAULT 0,      -- 1 if NWS alert was active at signal time
    alert_type      TEXT,                    -- Type of NWS alert (e.g., 'Thunderstorm Warning')
    signal_type     TEXT DEFAULT 'reactive', -- 'reactive' or 'predictive' (v2)
    outcome         TEXT,                    -- 'locked_in', 'recovered', 'pending'
    outcome_at      TEXT,                    -- When outcome determined
    created_at      TEXT NOT NULL DEFAULT (datetime('now'))
);

CREATE INDEX IF NOT EXISTS idx_signal_city_date ON lockin_signals(city_id, date_local);
CREATE INDEX IF NOT EXISTS idx_signal_outcome ON lockin_signals(outcome);

-- NWS hourly forecasts: one row per city per forecast hour per collection time
CREATE TABLE IF NOT EXISTS nws_forecasts (
    id              INTEGER PRIMARY KEY AUTOINCREMENT,
    city_id         TEXT NOT NULL REFERENCES cities(city_id),
    collected_at    TEXT NOT NULL,           -- When this forecast was fetched
    forecast_for    TEXT NOT NULL,           -- The hour this forecast covers (ISO8601 UTC)
    forecasted_high_f REAL,                  -- Predicted temperature for that hour
    precip_prob_pct REAL,                    -- Precipitation probability 0-100
    wind_speed_mph  REAL,                    -- Forecasted wind speed
    wind_dir_card   TEXT,                    -- Forecasted wind direction
    sky_condition   TEXT,                    -- Forecasted cloud cover
    short_forecast  TEXT,                    -- NWS short forecast text
    raw_json        TEXT,
    created_at      TEXT NOT NULL DEFAULT (datetime('now')),
    UNIQUE(city_id, collected_at, forecast_for)
);

CREATE INDEX IF NOT EXISTS idx_forecast_city_time ON nws_forecasts(city_id, forecast_for);

-- Forecast vs. actual comparison: populated post-day, drives forecast accuracy baseline
CREATE TABLE IF NOT EXISTS forecast_vs_actual (
    id              INTEGER PRIMARY KEY AUTOINCREMENT,
    city_id         TEXT NOT NULL REFERENCES cities(city_id),
    date_local      TEXT NOT NULL,
    forecast_high_f REAL,                    -- NWS AM forecast for the day's high
    actual_high_f   REAL,                    -- Observed settlement high
    error_f         REAL GENERATED ALWAYS AS (actual_high_f - forecast_high_f) STORED,
    abs_error_f     REAL GENERATED ALWAYS AS (ABS(actual_high_f - forecast_high_f)) STORED,
    nws_forecast_time TEXT,                  -- What time of day the forecast was pulled (7AM, 10AM, etc.)
    kalshi_settled  INTEGER DEFAULT 0,       -- 1 if Kalshi also settled that day
    created_at      TEXT NOT NULL DEFAULT (datetime('now')),
    UNIQUE(city_id, date_local, nws_forecast_time)
);

CREATE INDEX IF NOT EXISTS idx_fva_city_date ON forecast_vs_actual(city_id, date_local);

-- Kalshi markets: current market state per city per contract
CREATE TABLE IF NOT EXISTS kalshi_markets (
    id              INTEGER PRIMARY KEY AUTOINCREMENT,
    city_id         TEXT NOT NULL REFERENCES cities(city_id),
    market_ticker   TEXT NOT NULL,           -- Full Kalshi ticker e.g. HIGHNY-2026MAR15-T85
    date_local      TEXT NOT NULL,
    threshold_f     REAL NOT NULL,           -- Temperature threshold the contract is about
    contract_type   TEXT NOT NULL,           -- 'over', 'under', 'between'
    status          TEXT NOT NULL DEFAULT 'open', -- 'open', 'closed', 'settled'
    settlement_value REAL,                   -- Kalshi settlement (1.00 or 0.00)
    settlement_at   TEXT,                    -- Timestamp of settlement
    resolution_station TEXT,                 -- NWS station Kalshi used
    expires_at      TEXT,                    -- Market close time
    created_at      TEXT NOT NULL DEFAULT (datetime('now')),
    UNIQUE(market_ticker, date_local)
);

CREATE INDEX IF NOT EXISTS idx_market_city_date ON kalshi_markets(city_id, date_local);
CREATE INDEX IF NOT EXISTS idx_market_ticker ON kalshi_markets(market_ticker);

-- Kalshi order book snapshots: collected every 10 minutes
-- Stores full depth as JSON blob + indexed top-of-book for fast signal queries
CREATE TABLE IF NOT EXISTS order_book_snapshots (
    id              INTEGER PRIMARY KEY AUTOINCREMENT,
    market_ticker   TEXT NOT NULL,
    city_id         TEXT NOT NULL,
    collected_at    TEXT NOT NULL,           -- UTC timestamp
    best_yes_bid    REAL,                    -- Best YES bid price (0-1 scale)
    best_yes_ask    REAL,
    best_no_bid     REAL,
    best_no_ask     REAL,
    yes_depth_total INTEGER,                 -- Total YES contracts available at all levels
    no_depth_total  INTEGER,
    spread          REAL,                    -- best_ask - best_bid
    depth_json      TEXT,                    -- Full order book JSON (all levels)
    created_at      TEXT NOT NULL DEFAULT (datetime('now')),
    UNIQUE(market_ticker, collected_at)
);

CREATE INDEX IF NOT EXISTS idx_book_market_time ON order_book_snapshots(market_ticker, collected_at);
CREATE INDEX IF NOT EXISTS idx_book_city_time ON order_book_snapshots(city_id, collected_at);

-- Trade signals and paper trade log
CREATE TABLE IF NOT EXISTS trade_signals (
    id              INTEGER PRIMARY KEY AUTOINCREMENT,
    signal_id       INTEGER NOT NULL REFERENCES lockin_signals(id),
    city_id         TEXT NOT NULL,
    market_ticker   TEXT NOT NULL,
    date_local      TEXT NOT NULL,
    generated_at    TEXT NOT NULL,
    action          TEXT NOT NULL,           -- 'sell_yes' or 'buy_no'
    contracts       INTEGER NOT NULL,        -- Recommended position size
    target_price    REAL NOT NULL,           -- Price at signal time (0-1 scale)
    expected_value  REAL,                    -- EV after fees
    kalshi_fee_est  REAL,                    -- Estimated fee cost
    mode            TEXT NOT NULL DEFAULT 'paper', -- 'paper' or 'live'
    executed        INTEGER DEFAULT 0,
    fill_price      REAL,                    -- Actual fill price (null if paper/not filled)
    slippage_f      REAL,                    -- target_price minus fill_price
    pnl             REAL,                    -- Realized P&L after fees
    settled_at      TEXT,
    notes           TEXT,
    created_at      TEXT NOT NULL DEFAULT (datetime('now'))
);

CREATE INDEX IF NOT EXISTS idx_signal_trade ON trade_signals(signal_id);
CREATE INDEX IF NOT EXISTS idx_trade_date ON trade_signals(date_local);

-- City temperature correlation matrix: updated weekly from historical data
CREATE TABLE IF NOT EXISTS city_correlations (
    id              INTEGER PRIMARY KEY AUTOINCREMENT,
    city_a          TEXT NOT NULL REFERENCES cities(city_id),
    city_b          TEXT NOT NULL REFERENCES cities(city_id),
    corr_coefficient REAL NOT NULL,          -- -1 to 1
    sample_days     INTEGER NOT NULL,        -- Days of data used
    window_days     INTEGER NOT NULL DEFAULT 90, -- Rolling window
    computed_at     TEXT NOT NULL DEFAULT (datetime('now')),
    UNIQUE(city_a, city_b, computed_at)
);

-- Peak time model: learned from data, updated rolling
CREATE TABLE IF NOT EXISTS peak_time_model (
    id              INTEGER PRIMARY KEY AUTOINCREMENT,
    city_id         TEXT NOT NULL REFERENCES cities(city_id),
    month           INTEGER NOT NULL,        -- 1-12
    avg_peak_hour   REAL NOT NULL,           -- Average hour of day (local time) peak occurs
    stddev_hours    REAL NOT NULL,           -- Standard deviation
    sample_days     INTEGER NOT NULL,
    conservative_cutoff REAL NOT NULL,       -- avg_peak - 1*stddev (safe threshold)
    updated_at      TEXT NOT NULL DEFAULT (datetime('now')),
    UNIQUE(city_id, month)
);

-- Data collection health log: one row per collection cycle per city
CREATE TABLE IF NOT EXISTS collector_health (
    id              INTEGER PRIMARY KEY AUTOINCREMENT,
    city_id         TEXT NOT NULL,
    collection_type TEXT NOT NULL,           -- 'temperature', 'forecast', 'order_book', 'alerts'
    attempted_at    TEXT NOT NULL,
    success         INTEGER NOT NULL,        -- 1=ok, 0=failed
    http_status     INTEGER,
    error_msg       TEXT,
    latency_ms      INTEGER,
    data_age_sec    INTEGER,                 -- How old was the data returned (staleness check)
    created_at      TEXT NOT NULL DEFAULT (datetime('now'))
);

CREATE INDEX IF NOT EXISTS idx_health_city_type ON collector_health(city_id, collection_type, attempted_at);

-- NWS weather alerts: active alerts per city
CREATE TABLE IF NOT EXISTS nws_alerts (
    id              INTEGER PRIMARY KEY AUTOINCREMENT,
    city_id         TEXT NOT NULL REFERENCES cities(city_id),
    alert_id        TEXT NOT NULL,           -- NWS alert ID
    event_type      TEXT NOT NULL,           -- e.g., 'Severe Thunderstorm Warning'
    severity        TEXT,                    -- 'Minor', 'Moderate', 'Severe', 'Extreme'
    headline        TEXT,
    onset_at        TEXT,
    expires_at      TEXT,
    collected_at    TEXT NOT NULL,
    is_active       INTEGER DEFAULT 1,
    created_at      TEXT NOT NULL DEFAULT (datetime('now')),
    UNIQUE(city_id, alert_id)
);

CREATE INDEX IF NOT EXISTS idx_alert_city_active ON nws_alerts(city_id, is_active, expires_at);

7. NWS API Endpoints

All endpoints are free. No API key required. Base URL: https://api.weather.gov

7.1 Temperature Observations (Primary Data Source)

GET /stations/{stationId}/observations/latest

Returns: temperature, dew point, wind speed, wind direction, sky condition, humidity, precipitation, barometric pressure.

Example: /stations/KNYC/observations/latest (Central Park, NYC)

Response fields used:

Important: Check properties.timestamp against current time. If observation is older than 30 minutes, treat as stale and log to collector_health as a staleness error.

7.2 Hourly Forecast (Forward-Looking Data)

First, get the forecast grid point for a city (one-time setup):

GET /points/{lat},{lon}

Response gives properties.forecastHourly URL.

Then fetch hourly forecast:

GET /gridpoints/{office}/{x},{y}/forecast/hourly

Returns: 168-hour hourly forecast with temperature, wind, sky cover, precipitation probability.

Collection cadence: Every 60 minutes (forecasts update at most hourly).

7.3 Active Weather Alerts

GET /alerts/active?point={lat},{lon}

Returns all active NWS alerts (warnings, watches, advisories) for a geographic point.

Collection cadence: Every 10 minutes (same cycle as temperature observations).

7.4 Historical Observations (For Backtest Data Pull)

GET /stations/{stationId}/observations?start={ISO8601}&end={ISO8601}&limit=500

Returns up to 500 observations per call. Paginate using the Link: <url>; rel="next" header.

For bulk historical data (multi-year), prefer NOAA ISD:

https://www.ncei.noaa.gov/access/services/data/v1/
?dataset=local-climatological-data
&stations={stationId}
&startDate={YYYY-MM-DD}
&endDate={YYYY-MM-DD}
&dataTypes=HourlyDryBulbTemperature,HourlyDewPointTemperature,HourlyWindSpeed,HourlyWindDirection,HourlySkyConditions
&format=json
&units=standard

8. Collection Cadence

Data Type Interval Source Notes
Temperature observations Every 10 min NWS /stations/{id}/observations/latest Primary signal data
NWS weather alerts Every 10 min NWS /alerts/active Same cron as observations
Kalshi order book (top-of-book) Every 10 min Kalshi API Only best bid/ask + quantities
Kalshi order book (full depth) Every 60 min Kalshi API Full level data for liquidity analysis
NWS hourly forecasts Every 60 min NWS /gridpoints/.../forecast/hourly Forecasts update at most hourly
Solar position Calculated at startup suncalc library No network call, pure math
NOAA historical data One-time pull + weekly incremental NOAA ISD API For backtesting
City correlation matrix Weekly recompute Derived from daily_highs Run Sunday night
Peak time model Monthly update Derived from daily_highs Update rolling per-month averages
Collector health check Every 10 min Self-check Alert if city misses 2+ cycles

Cron schedule (on EdgeClaw, Oracle VPS — primary):

*/10 * * * *   node /home/ubuntu/edgeclaw/scripts/weather-collector.js
0 * * * *      node /home/ubuntu/edgeclaw/scripts/forecast-collector.js
0 4 * * 0      node /home/ubuntu/edgeclaw/scripts/compute-correlations.js
0 3 1 * *      node /home/ubuntu/edgeclaw/scripts/update-peak-model.js

Backup on Vultr (passive mirror, writes to separate file):

*/10 * * * *   node /opt/cron-scripts/weather-backup-collector.js

9. Storage Estimates

Table Rows/Day Row Size 30-Day 1-Year
temperature_obs 20 cities × 144 = 2,880 ~500B ~41MB ~500MB
daily_highs 20 ~300B ~180KB ~2MB
lockin_signals ~5-15 ~400B ~180KB ~2MB
nws_forecasts 20 cities × 24 hours × 4 pulls = 1,920 ~300B ~17MB ~200MB
forecast_vs_actual 20 ~200B ~120KB ~1.5MB
kalshi_markets ~200 (all contracts) ~300B ~1.8MB ~22MB
order_book_snapshots 200 markets × 144 cycles = 28,800 ~2KB (top only) / ~8KB (full) ~240MB ~2.9GB
trade_signals ~5-20 ~300B ~90KB ~1MB
collector_health 20 cities × 4 types × 144 cycles = 11,520 ~200B ~66MB ~800MB
nws_alerts ~5-10 ~400B ~6KB ~72KB

Total estimated storage:

Optimization: Store order book depth_json as compressed blobs (zlib). Prune collector_health to last 90 days. Prune raw raw_json from temperature_obs after 30 days (keep derived fields permanently).


10. Key SQL Queries

10.1 Detect Lock-In Signal (Run After Each Temperature Observation)

-- Get current running high and check if drop threshold crossed
SELECT
    dh.city_id,
    dh.date_local,
    dh.running_high_f,
    dh.running_high_at,
    t.temp_f AS current_temp,
    t.observed_at,
    (dh.running_high_f - t.temp_f) AS drop_size_f,
    ((dh.running_high_f - t.temp_f) * 5.0 / 9.0) AS drop_size_c,
    t.sky_condition,
    t.wind_speed_mph,
    t.solar_angle_deg,
    dh.est_peak_time
FROM daily_highs dh
JOIN temperature_obs t ON t.city_id = dh.city_id
    AND t.observed_at = (
        SELECT MAX(observed_at) FROM temperature_obs WHERE city_id = dh.city_id
    )
WHERE dh.date_local = date('now', 'localtime')
    AND dh.lock_in_at IS NULL           -- Signal not already fired
    AND dh.did_recover IS NULL          -- Not already resolved
    AND (dh.running_high_f - t.temp_f) >= 3.6  -- 2°C drop in Fahrenheit
    AND t.observed_at > dh.est_peak_time        -- Past estimated peak time
    AND NOT EXISTS (                    -- No active severe weather alert
        SELECT 1 FROM nws_alerts a
        WHERE a.city_id = dh.city_id
        AND a.is_active = 1
        AND a.expires_at > datetime('now')
        AND a.severity IN ('Severe', 'Extreme')
    );

10.2 Find Mispriced Kalshi Contracts When Signal Fires

-- After lock-in fires for a city, find all 'over' contracts above current running high
SELECT
    km.market_ticker,
    km.threshold_f,
    obs.best_yes_bid,
    obs.best_no_bid,
    obs.yes_depth_total,
    obs.no_depth_total,
    -- Expected value: we expect YES to settle at 0 (lock-in = temp won't exceed threshold)
    -- Selling YES at best_yes_bid = profit of best_yes_bid per contract (minus fees)
    (obs.best_yes_bid - 0.02) AS ev_after_fee,
    obs.collected_at
FROM kalshi_markets km
JOIN order_book_snapshots obs ON obs.market_ticker = km.market_ticker
    AND obs.collected_at = (
        SELECT MAX(collected_at) FROM order_book_snapshots
        WHERE market_ticker = km.market_ticker
    )
WHERE km.city_id = :city_id
    AND km.date_local = :date_local
    AND km.contract_type = 'over'
    AND km.status = 'open'
    AND km.threshold_f > :current_running_high_f  -- Only contracts above the locked high
    AND obs.best_yes_bid >= 0.10                  -- Only bid >= 10c (profitable range)
ORDER BY obs.best_yes_bid DESC;

10.3 Historical Lock-In Rate by City and Drop Threshold

SELECT
    city_id,
    ROUND(drop_size_c) AS drop_c,
    COUNT(*) AS total_signals,
    SUM(CASE WHEN did_recover = 0 THEN 1 ELSE 0 END) AS locked_in,
    SUM(CASE WHEN did_recover = 1 THEN 1 ELSE 0 END) AS recovered,
    ROUND(100.0 * SUM(CASE WHEN did_recover = 0 THEN 1 ELSE 0 END) / COUNT(*), 1) AS lock_in_rate_pct
FROM daily_highs
WHERE lock_in_at IS NOT NULL
    AND did_recover IS NOT NULL
GROUP BY city_id, ROUND(drop_size_c)
ORDER BY city_id, drop_c;

10.4 NWS Forecast Accuracy Baseline Per City

SELECT
    city_id,
    strftime('%m', date_local) AS month,
    nws_forecast_time,
    COUNT(*) AS sample_days,
    ROUND(AVG(error_f), 2) AS avg_error_f,
    ROUND(AVG(abs_error_f), 2) AS avg_abs_error_f,
    ROUND(100.0 * SUM(CASE WHEN abs_error_f >= 3.0 THEN 1 ELSE 0 END) / COUNT(*), 1) AS pct_miss_3f,
    MAX(abs_error_f) AS worst_miss_f
FROM forecast_vs_actual
GROUP BY city_id, strftime('%m', date_local), nws_forecast_time
ORDER BY city_id, month;

10.5 City Correlation Analysis (Joint Lock-In Events)

-- How often do two cities lock in on the same day?
SELECT
    a.city_id AS city_a,
    b.city_id AS city_b,
    COUNT(*) AS days_both_locked_in,
    (SELECT COUNT(*) FROM daily_highs WHERE city_id = a.city_id AND lock_in_at IS NOT NULL) AS city_a_total,
    ROUND(100.0 * COUNT(*) / (SELECT COUNT(*) FROM daily_highs WHERE city_id = a.city_id AND lock_in_at IS NOT NULL), 1) AS joint_pct
FROM daily_highs a
JOIN daily_highs b ON a.date_local = b.date_local
    AND a.city_id < b.city_id          -- Avoid duplicate pairs
    AND b.lock_in_at IS NOT NULL
WHERE a.lock_in_at IS NOT NULL
GROUP BY a.city_id, b.city_id
ORDER BY joint_pct DESC;

10.6 Slippage Analysis (Paper Trade Execution Quality)

SELECT
    city_id,
    COUNT(*) AS total_trades,
    ROUND(AVG(target_price), 3) AS avg_target,
    ROUND(AVG(fill_price), 3) AS avg_fill,
    ROUND(AVG(slippage_f), 4) AS avg_slippage,
    ROUND(AVG(pnl), 2) AS avg_pnl,
    SUM(pnl) AS total_pnl,
    ROUND(100.0 * SUM(CASE WHEN pnl > 0 THEN 1 ELSE 0 END) / COUNT(*), 1) AS win_rate_pct
FROM trade_signals
WHERE mode = 'paper'
    AND executed = 1
    AND settled_at IS NOT NULL
GROUP BY city_id
ORDER BY total_pnl DESC;

10.7 Collector Health Report (Last 24 Hours)

SELECT
    city_id,
    collection_type,
    COUNT(*) AS total_attempts,
    SUM(success) AS successful,
    COUNT(*) - SUM(success) AS failed,
    ROUND(100.0 * SUM(success) / COUNT(*), 1) AS success_rate_pct,
    MAX(CASE WHEN success = 0 THEN attempted_at END) AS last_failure,
    ROUND(AVG(CASE WHEN success = 1 THEN latency_ms END)) AS avg_latency_ms,
    MAX(CASE WHEN success = 1 THEN data_age_sec END) AS max_data_age_sec
FROM collector_health
WHERE attempted_at >= datetime('now', '-24 hours')
GROUP BY city_id, collection_type
ORDER BY city_id, collection_type;

10.8 Peak Time Model Update (Monthly)

-- Compute observed peak hours from historical data and update model
INSERT OR REPLACE INTO peak_time_model (city_id, month, avg_peak_hour, stddev_hours, sample_days, conservative_cutoff)
SELECT
    city_id,
    CAST(strftime('%m', date_local) AS INTEGER) AS month,
    AVG(CAST(strftime('%H', running_high_at) AS REAL) + CAST(strftime('%M', running_high_at) AS REAL) / 60.0) AS avg_peak_hour,
    -- SQLite doesn't have STDDEV natively; approximate with manual calculation
    SQRT(AVG(
        (CAST(strftime('%H', running_high_at) AS REAL) + CAST(strftime('%M', running_high_at) AS REAL) / 60.0)
        * (CAST(strftime('%H', running_high_at) AS REAL) + CAST(strftime('%M', running_high_at) AS REAL) / 60.0)
    ) - AVG(CAST(strftime('%H', running_high_at) AS REAL) + CAST(strftime('%M', running_high_at) AS REAL) / 60.0)
      * AVG(CAST(strftime('%H', running_high_at) AS REAL) + CAST(strftime('%M', running_high_at) AS REAL) / 60.0)) AS stddev_hours,
    COUNT(*) AS sample_days,
    -- Conservative cutoff: avg minus 1 stddev (don't fire signal before this time)
    AVG(CAST(strftime('%H', running_high_at) AS REAL) + CAST(strftime('%M', running_high_at) AS REAL) / 60.0)
    - SQRT(AVG(
        (CAST(strftime('%H', running_high_at) AS REAL) + CAST(strftime('%M', running_high_at) AS REAL) / 60.0)
        * (CAST(strftime('%H', running_high_at) AS REAL) + CAST(strftime('%M', running_high_at) AS REAL) / 60.0)
    ) - AVG(CAST(strftime('%H', running_high_at) AS REAL) + CAST(strftime('%M', running_high_at) AS REAL) / 60.0)
      * AVG(CAST(strftime('%H', running_high_at) AS REAL) + CAST(strftime('%M', running_high_at) AS REAL) / 60.0)) AS conservative_cutoff
FROM daily_highs
WHERE running_high_at IS NOT NULL
    AND sample_days >= 20         -- Only update if we have enough data
GROUP BY city_id, month;

11. Implementation Checklist

Phase 1 — Foundation (Days 1-3)

Phase 2 — Historical Backtest (Days 3-5)

Phase 3 — Kalshi Integration (Days 4-6)

Phase 4 — Signal Engine (Days 5-8)

Phase 5 — Paper Trading (Weeks 2-5)

Phase 6 — Predictive Layer (v2, After Paper Trading Proves Edge)


12. Cross-Desk Reuse Candidates

Component Reusable For
NWS/NOAA API wrapper (observations, forecasts, alerts) Sports desk (outdoor games), agriculture futures, energy desk (temperature drives nat gas)
Kalshi API client (market listing, pricing, order book, settlement) All Kalshi prediction market desks — politics, economics, event contracts
Polymarket API client (when built in v2) All Polymarket desks — crypto, politics, event contracts
Order book depth analyzer and liquidity scoring Any desk trading on Kalshi/Polymarket, crypto desk
Time-series data collection framework (10-min cron, SQLite, gap detection, health alerts) Crypto desk (price monitoring), forex desk (rate monitoring), any desk requiring regular snapshots
Signal-to-contract mapping logic Any event-driven prediction market desk
Backtest framework (historical data → simulated signals → theoretical P&L with fees) Every single desk — build as a shared utility from day one
Collector health monitoring and alerting system All data collectors across all desks
City correlation matrix calculator Generalize to any multi-market correlated-risk desk
Monte Carlo drawdown simulator All desks with multiple simultaneous positions
Execution impact simulator (order placement, slippage, market impact) All trading desks — shared utility
Settlement rule parser and amendment tracker Kalshi and Polymarket sports desk — settlement ambiguities are universal
Brier score and settlement tracking module All prediction market desks

13. Reality Check

Is this buildable? Yes. The core reactive system (Steps 1-5) is buildable within 2 weeks. NWS API is free, Kalshi API is free, the main costs are the VPS servers and AI API calls for the analyst pipeline.

Biggest risk: Deploying real capital before adequate data. The 11-day backtest is proof-of-concept only. The temperature resolution artifact means threshold analysis is less granular than it appears. The correlated nature of weather across cities means effective sample size is far smaller than 230 city-days.

[PANEL: Sonar Reasoning Pro] The highest risk is correlated drawdown: a regional weather event affecting the entire Northeast simultaneously can wipe out 5-10 positions at once. This is the single most dangerous unmodeled risk. Build the correlation matrix before trading with real capital.

[PANEL: Gemini 3.1 Pro] Second highest risk is adverse selection: if faster market participants are already repricing contracts before the 10-minute cron fires, you may only get fills on contracts where the outcome is already doubtful. Monitor this explicitly during paper trading by tracking fill rate vs. bid price at signal time.

First to break: The weather data collection pipeline. Cron-based HTTP polling every 10 minutes is fragile — NWS API has intermittent outages, returns stale data without error codes, and rate-limits during severe weather events. Without robust error handling, retry logic, staleness detection, and alerting, the system will break silently exactly when the most unusual weather creates the best opportunities.

Most important first action: Before writing any trading logic, run the lock-in analysis against 6-12 months of NOAA historical hourly data for target cities. This takes 2-3 days and either confirms a real durable edge (proceed with confidence) or reveals the 11-day results were a lucky stretch (save yourself from losses). This is the highest-ROI activity possible right now.

[PANEL: Sonar Reasoning Pro] In parallel with the historical analysis, build the NWS forecast accuracy baseline. If NWS forecasts are wrong by 3°F+ on 15-20% of days, the real predictive edge lives there — in identifying those days before the drop, not reacting after. That reframing (from reactive lock-in detector to NWS-miss predictor) is the v2 upgrade that turns a good strategy into an excellent one.


End of merged final spec. Panel sources: Claude Opus 4.6 (base), GPT-4.1, Gemini 3.1 Pro, Grok 4.1, Grok 4 Fast, DeepSeek V3.1, Sonar Reasoning Pro.


TODO: Upgrade Kalshi REST to WebSocket Feed

Status: NOT BUILT — add when this desk goes live for execution

Current state: All Kalshi data (prices, order books, trades) is fetched via REST API polling on cron schedules. This is fine for edge detection and monitoring, but NOT sufficient for live trade execution.

Why WebSocket matters:

What to build when ready:

  1. Connect to Kalshi WebSocket feed (wss://trading-api.kalshi.com/trade-api/ws/v2)
  2. Subscribe to order book channels for active tickers
  3. Stream trade events for real-time freshness (replaces REST trades polling)
  4. Feed live prices into execution engine for precise entry/exit
  5. Update Fill Quality Monitoring to measure slippage vs quoted price

Note: Sports markets use RFQ (Request for Quote) so the visible order book is usually empty — but the trade feed still matters for freshness signals and the WebSocket is required for order submission. Weather markets DO have real visible order books where this upgrade is even more critical.

Added 2026-03-29 — upgrade REST to WS when desk moves to live execution

Source: ~/edgeclaw/results/spec-panel/weather-lock-in/spec-final.md