Wallet Intelligence Module — MERGED FINAL SPEC

(Panel-Reviewed, Opus Base + Cherry-Picked Panel Additions)

Merged by: Claude Code (Sonnet 4.6) Date: 2026-03-15 Source files: opus.md (base), gemini-3.1-pro.md, gpt-4.1.md, grok-4-fast.md, grok-4.1.md, deepseek-v3.1.md, sonar-reasoning-pro.md Part III panel: opus.md (base), gemini-3.1-pro.md, gpt-4.1.md, grok-4-fast.md (prediction market signals) Rule: Opus is the complete base — nothing removed. All unique panel additions tagged [PANEL: Model Name].


Purpose

Track the best and worst traders on prediction market platforms (Polymarket, Kalshi) to generate:

This is a cross-desk tool — prediction markets cover sports, crypto, politics, weather, and more. Every desk in the pipeline benefits from wallet intelligence.

On-chain wallets feed signals into two venue types:

Build Priority (all panelists agree):

  1. Cross-platform divergence detector (quick win, no wallet tracking needed)
  2. Wallet data collection + scoring (shadow mode — collect and score, don't trade)
  3. 6-month shadow mode with simulated copy-trades including realistic slippage
  4. Go live only if shadow mode proves positive expected value

The Cross-Platform Edge

The core strategy is NOT "copy the whale's trade on the same platform" — that fails because slippage eats the edge (the whale already moved the price). Instead:

Detect sharp money on Polymarket → trade the same event on Kalshi before Kalshi catches up.

Polymarket is where the whales live (bigger liquidity, on-chain, fast). Kalshi is a separate pool of traders who react slower. When a whale pushes a Polymarket contract from 42 to 48 cents, the same event on Kalshi might still sit at 43. You're trading the information flow lag between platforms.

This sidesteps the slippage problem because you're buying on the platform that hasn't moved yet.

[PANEL: Sonar Reasoning Pro] — The cross-platform arbitrage thesis assumes significant overlap between Polymarket and Kalshi event catalogs. Empirical reality is ~30-40% overlap. The first task (Phase 0) must measure actual overlap before committing to cross-platform infrastructure. If overlap is <30%, the strategy must pivot to pure wallet intelligence (same-platform copy-trading with better entry timing).


Phase 0 Validation Gate

Before building anything beyond the divergence detector, run exactly 4 weeks of data collection and evaluate against these thresholds:

PROCEED with full build if:

PIVOT to same-platform wallet intelligence only if: divergences exist but are <3/day or <90 seconds. The cross-platform arb is too thin, but wallet scoring still has value for same-platform copy-trading with better entry timing.

KILL cross-platform component if: <1 divergence >5 cents per day. The platforms are too tightly coupled.

Validation data stored in:

CREATE TABLE phase0_validation (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  date TEXT NOT NULL,
  divergences_detected INTEGER,
  divergences_above_5c INTEGER,
  divergences_above_5c_after_fees INTEGER,
  median_gap_duration_seconds REAL,
  median_lagging_liquidity_usd REAL,
  max_gap_cents REAL,
  notes TEXT
);

Populate daily. After 28 rows, run the decision. This costs almost nothing and prevents building an elaborate system on a false premise.


Platform Capabilities

Polymarket (Primary — On-Chain, Full Transparency)

All Polymarket trades happen on Polygon blockchain. Every wallet's full history is publicly readable:

Data Sources (all free, no keys):

Kalshi (Secondary — Opaque, Flow-Based)

Kalshi does NOT expose individual account data. Approach:

[PANEL: DeepSeek V3.1] — Kalshi trader concentration metrics: Kalshi's lack of wallet data makes position limits critical. Derive estimated % of limit consumed from observed trade sizes vs published limits.

[PANEL: GPT-4.1] — Competitor platform expansion (nice-to-have, v2): PredictIt API (free/public), Betfair Exchange API (free read-only) for politics and sports overlap. Expands cross-platform arb surface area.


Kalshi Authentication and Rate Limit Handling

Auth flow (required — without this, all Kalshi API calls return 401):

  1. POST to https://trading-api.kalshi.com/trade-api/v2/login with {email, password} → receive {token, member_id}
  2. Token expires after 24h. Store in memory, refresh proactively at 23h
  3. Auth does NOT require placing trades — data polling is free

Rate limits: Kalshi allows ~10 requests/second for authenticated users. Token bucket rate limiter: bucket size 10, refill rate 10/sec. On 429 response: exponential backoff starting at 1s, max 30s, 5 retries.

[PANEL: DeepSeek V3.1] — Kalshi trade tape has 15-30s delay during volatility. Assume 45s worst-case latency, not 5s. Design alert urgency accordingly.

Request priority queue: Divergence checks (high) > flow signal polling (medium) > market discovery (low). When approaching rate limit, shed low-priority requests first.

Kalshi WebSocket: wss://trading-api.kalshi.com/trade-api/ws/v2 available for orderbook and trade streaming. Use for HOT markets (see tiered polling below) instead of REST, preserving REST budget for discovery and flow analysis.

Rate tracking table:

CREATE TABLE api_rate_tracking (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  platform TEXT NOT NULL,
  endpoint TEXT NOT NULL,
  request_count INTEGER DEFAULT 0,
  error_count INTEGER DEFAULT 0,
  last_429_at TEXT,
  period_start TEXT NOT NULL,
  period_minutes INTEGER DEFAULT 1
);

RPC Strategy (Tiered — Reliability-Critical)

Primary: Alchemy free tier — 300M compute units/month, ~300K eth_getLogs calls. Endpoint: https://polygon-mainnet.g.alchemy.com/v2/{API_KEY}. Sign up at alchemy.com (free).

Fallback: polygon-bor-rpc.publicnode.com for low-priority calls (gas price estimation, basic block queries).

Usage allocation:

Minimize RPC dependency overall: Use CLOB API and WebSocket as primary trade data sources. Use RPC only for:

  1. Wallet discovery via on-chain event scanning
  2. Proxy wallet mapping (see below)
  3. Settlement event detection as backup to API polling
  4. Funding chain analysis (deferred to v2)
CREATE TABLE data_source_status (
  source TEXT PRIMARY KEY,
  status TEXT,              -- 'live', 'degraded', 'down'
  last_message_at TEXT,
  reconnect_count INTEGER DEFAULT 0,
  current_backoff_seconds REAL DEFAULT 0
);

Architecture

Database Tables

-- ═══════════════════════════════════════════════════════════════
-- CANONICAL MARKET REFERENCE
-- ═══════════════════════════════════════════════════════════════

CREATE TABLE markets (
  market_id TEXT PRIMARY KEY,
  platform TEXT NOT NULL,                -- 'polymarket' or 'kalshi'
  condition_id TEXT,                      -- Polymarket condition_id
  token_id TEXT,                          -- Polymarket outcome token
  kalshi_ticker TEXT,                     -- Kalshi ticker
  title TEXT NOT NULL,
  category TEXT,
  created_at TEXT,
  end_date TEXT,
  settled INTEGER DEFAULT 0,
  outcome TEXT,                           -- 'yes', 'no', null if unsettled
  settlement_source TEXT,                 -- e.g., 'BLS', 'NOAA', 'AP' (Kalshi)
  status TEXT DEFAULT 'active',          -- 'active', 'halted', 'suspended', 'cancelled' [PANEL: GPT-4.1]
  resolution_disputed INTEGER DEFAULT 0,  -- flag for oracle disputes [PANEL: Opus review]
  fetched_at TEXT NOT NULL DEFAULT (datetime('now'))
);

CREATE INDEX idx_markets_platform ON markets(platform);
CREATE INDEX idx_markets_category ON markets(category);
CREATE INDEX idx_markets_unsettled ON markets(settled) WHERE settled = 0;
CREATE INDEX idx_markets_status ON markets(status) WHERE status != 'active';

-- ═══════════════════════════════════════════════════════════════
-- CROSS-PLATFORM EVENT MATCHING
-- ═══════════════════════════════════════════════════════════════
-- THE LOAD-BEARING TABLE. All cross-platform features depend on this.
-- No entry here = no divergence signal for that market pair.

CREATE TABLE event_pairs (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  poly_market_id TEXT REFERENCES markets(market_id),
  kalshi_market_id TEXT REFERENCES markets(market_id),
  event_slug TEXT NOT NULL,
  match_method TEXT NOT NULL,            -- 'manual', 'fuzzy_title', 'embedding', 'series_rule'
  match_confidence REAL NOT NULL,        -- 0-1
  verified INTEGER DEFAULT 0,           -- 1 = human-confirmed
  title_similarity REAL,
  date_overlap_days REAL,
  category_match INTEGER,
  resolution_rules_match INTEGER DEFAULT 0,  -- 1 if oracle/rules verified same [PANEL: Gemini 3.1 Pro]
  polling_tier TEXT DEFAULT 'cold',     -- 'hot', 'warm', 'cold' (drives polling frequency)
  created_at TEXT NOT NULL DEFAULT (datetime('now')),
  verified_at TEXT,
  verified_by TEXT,
  UNIQUE(poly_market_id, kalshi_market_id)
);

CREATE INDEX idx_ep_verified ON event_pairs(verified, match_confidence);
CREATE INDEX idx_ep_tier ON event_pairs(polling_tier);

-- ═══════════════════════════════════════════════════════════════
-- POLYMARKET PROXY WALLET MAPPING
-- ═══════════════════════════════════════════════════════════════
-- Critical for deduplication. A user's CLOB address (proxy) may differ
-- from their on-chain EOA. Without this, same trader appears twice.

CREATE TABLE wallet_address_map (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  eoa_address TEXT NOT NULL,
  proxy_address TEXT NOT NULL,
  platform TEXT NOT NULL DEFAULT 'polymarket',
  discovered_via TEXT,                   -- 'proxy_factory_event', 'gamma_api', 'manual'
  discovered_at TEXT NOT NULL DEFAULT (datetime('now')),
  UNIQUE(eoa_address, proxy_address)
);

CREATE INDEX idx_wam_proxy ON wallet_address_map(proxy_address);
CREATE INDEX idx_wam_eoa ON wallet_address_map(eoa_address);

-- ═══════════════════════════════════════════════════════════════
-- POLYMARKET WALLET PROFILES
-- ═══════════════════════════════════════════════════════════════

CREATE TABLE wallet_profiles (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  address TEXT NOT NULL UNIQUE,          -- canonical proxy_address
  proxy_address TEXT,                    -- explicit proxy (same as address for Polymarket)
  eoa_address TEXT,                      -- underlying EOA if known
  label TEXT,                            -- 'sharp', 'dumb', 'neutral', 'bot', 'mm', 'whale'
  entity_name TEXT,                      -- from Arkham/manual: 'Wintermute', 'Jump Trading', etc.
  likely_hedge INTEGER DEFAULT 0,        -- 1 if likely hedging portfolio, not directional [PANEL: Sonar Reasoning Pro]
  first_seen TEXT NOT NULL,
  last_active TEXT NOT NULL,
  total_trades INTEGER DEFAULT 0,
  total_settled INTEGER DEFAULT 0,
  total_volume_usd REAL DEFAULT 0,
  win_count INTEGER DEFAULT 0,
  loss_count INTEGER DEFAULT 0,
  win_rate REAL DEFAULT 0,
  total_pnl_usd REAL DEFAULT 0,
  roi REAL DEFAULT 0,
  max_drawdown_usd REAL DEFAULT 0,
  avg_entry_timing REAL,                 -- avg % of time-to-settlement when entering (0=early, 1=late)
  brier_score REAL,                      -- calibration accuracy (lower = better)
  brier_score_30d REAL,                  -- rolling 30-trade Brier for regime detection
  brier_n INTEGER DEFAULT 0,             -- number of markets contributing to Brier
  news_adjusted_timing_score REAL,       -- timing score discounting news-proximate trades
  composite_score REAL DEFAULT 0,        -- 0-100 ranking
  confidence_tier TEXT DEFAULT 'low',    -- 'low' (<50 trades), 'medium' (50-199), 'high' (200+)
  -- Market maker specific
  avg_spread_offered REAL,
  liquidity_side_bias REAL,
  -- Per-oracle specialization [PANEL: Sonar Reasoning Pro]
  oracle_specialization TEXT,            -- JSON: {oracle: brier_score} e.g. {"BLS": 0.12, "AP": 0.21}
  -- Metadata
  funding_source TEXT,
  social_link TEXT,
  updated_at TEXT NOT NULL DEFAULT (datetime('now'))
);

-- ═══════════════════════════════════════════════════════════════
-- WALLET CATEGORY STATS
-- ═══════════════════════════════════════════════════════════════

CREATE TABLE wallet_category_stats (
  address TEXT NOT NULL,
  category TEXT NOT NULL,
  trades INTEGER DEFAULT 0,
  settled INTEGER DEFAULT 0,
  win_rate REAL,
  brier_score REAL,
  roi REAL,
  pnl_usd REAL DEFAULT 0,
  avg_entry_timing REAL,
  PRIMARY KEY (address, category)
);

CREATE INDEX idx_wcs_category ON wallet_category_stats(category, roi DESC);

-- ═══════════════════════════════════════════════════════════════
-- INDIVIDUAL TRADES
-- ═══════════════════════════════════════════════════════════════

CREATE TABLE wallet_trades (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  address TEXT NOT NULL,
  market_id TEXT NOT NULL REFERENCES markets(market_id),
  position_id TEXT,                      -- groups related buys/sells (lifecycle managed by code)
  side TEXT NOT NULL,                    -- 'yes' or 'no'
  action TEXT NOT NULL,                  -- 'buy' or 'sell'
  price REAL NOT NULL,                   -- 0-1
  size REAL NOT NULL,                    -- number of shares
  cost_usd REAL,
  tx_hash TEXT,
  block_number INTEGER,
  traded_at TEXT NOT NULL,
  settled INTEGER DEFAULT 0,            -- 0=open, 1=settled
  outcome TEXT,                         -- 'win', 'loss', null if unsettled
  pnl_usd REAL,
  market_settled_at TEXT,
  days_before_settlement REAL,
  timing_bucket TEXT,                   -- 'early' (<20%), 'mid' (20-70%), 'late' (>70%)
  vwap_at_settlement REAL,             -- VWAP of shares still held at settlement, for Brier calc
  -- News proximity enrichment
  news_proximate INTEGER DEFAULT 0,    -- 1 if relevant headline within 30min before trade
  news_headline TEXT,
  news_source TEXT,
  news_detected_at TEXT,
  fetched_at TEXT NOT NULL DEFAULT (datetime('now')),
  UNIQUE(address, tx_hash)             -- [PANEL: GPT-4.1] prevent duplication from multiple sources
);

CREATE INDEX idx_wt_addr_time ON wallet_trades(address, traded_at DESC);
CREATE INDEX idx_wt_market ON wallet_trades(market_id, address);
CREATE INDEX idx_wt_unsettled ON wallet_trades(settled) WHERE settled = 0;
CREATE INDEX idx_wt_outcome ON wallet_trades(address, outcome) WHERE settled = 1;
CREATE INDEX idx_wt_position ON wallet_trades(position_id) WHERE position_id IS NOT NULL;

-- ═══════════════════════════════════════════════════════════════
-- NET POSITIONS
-- ═══════════════════════════════════════════════════════════════

CREATE TABLE wallet_positions (
  address TEXT NOT NULL,
  market_id TEXT NOT NULL REFERENCES markets(market_id),
  side TEXT NOT NULL,
  net_shares REAL NOT NULL,
  avg_entry_price REAL NOT NULL,
  cost_basis_usd REAL NOT NULL,
  current_value_usd REAL,
  unrealized_pnl REAL,
  size_vs_typical REAL,                 -- this position size / wallet avg (conviction signal)
  position_state TEXT DEFAULT 'open',   -- 'open', 'closed', 'settled'
  opened_at TEXT,
  closed_at TEXT,
  updated_at TEXT NOT NULL DEFAULT (datetime('now')),
  PRIMARY KEY (address, market_id)
);

CREATE INDEX idx_wp_addr ON wallet_positions(address);
CREATE INDEX idx_wp_open ON wallet_positions(position_state) WHERE position_state = 'open';

-- ═══════════════════════════════════════════════════════════════
-- FOLLOW / FADE LISTS
-- ═══════════════════════════════════════════════════════════════

CREATE TABLE wallet_lists (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  address TEXT NOT NULL,
  list_type TEXT NOT NULL,              -- 'follow' or 'fade'
  fade_action TEXT,                      -- null for follow; 'avoid' or 'invert' for fade
  rank INTEGER,
  score REAL NOT NULL,
  category_score_used TEXT,
  reason TEXT,
  added_at TEXT NOT NULL DEFAULT (datetime('now')),
  expires_at TEXT,
  active INTEGER DEFAULT 1,
  user_id TEXT
);

CREATE INDEX idx_wl_active ON wallet_lists(active, list_type);

-- ═══════════════════════════════════════════════════════════════
-- ALERTS
-- ═══════════════════════════════════════════════════════════════

CREATE TABLE wallet_alerts (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  address TEXT NOT NULL,
  wallet_label TEXT,
  list_type TEXT NOT NULL,              -- 'follow' or 'fade'
  market_id TEXT NOT NULL REFERENCES markets(market_id),
  category TEXT,
  side TEXT NOT NULL,
  price REAL NOT NULL,
  size REAL NOT NULL,
  signal TEXT NOT NULL,                 -- 'copy', 'fade', 'reduce', 'exit' [PANEL: Opus review]
  alert_strength REAL,
  -- Execution awareness
  estimated_slippage REAL,
  edge_after_slippage REAL,
  available_liquidity REAL,
  -- Fee-aware net edge [PANEL: Opus aplus]
  kalshi_fee_estimate REAL,
  poly_fee_estimate REAL,
  net_edge_after_fees REAL,
  -- Outcome tracking
  acted_on INTEGER DEFAULT 0,
  market_settled INTEGER DEFAULT 0,
  outcome TEXT,
  alerted_at TEXT NOT NULL DEFAULT (datetime('now')),
  user_id TEXT
);

CREATE INDEX idx_wa_time ON wallet_alerts(alerted_at DESC);
CREATE INDEX idx_wa_addr ON wallet_alerts(address);

-- ═══════════════════════════════════════════════════════════════
-- KALSHI FLOW SIGNALS
-- ═══════════════════════════════════════════════════════════════

CREATE TABLE kalshi_flow_signals (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  market_id TEXT REFERENCES markets(market_id),
  ticker TEXT NOT NULL,
  signal_type TEXT NOT NULL,
    -- 'volume_spike', 'price_divergence', 'spread_collapse',
    -- 'momentum_shift', 'open_gap', 'correlated_series',
    -- 'iceberg', 'event_window', 'size_clustering', 'book_imbalance'
  direction TEXT,                       -- 'yes' or 'no'
  magnitude REAL,
  baseline_used TEXT,                   -- 'market' or 'category' (cold-start fallback)
  context TEXT,                         -- JSON with supporting data
  resolved INTEGER DEFAULT 0,
  outcome TEXT,
  detected_at TEXT NOT NULL DEFAULT (datetime('now'))
);

CREATE INDEX idx_kfs_time ON kalshi_flow_signals(detected_at DESC);
CREATE INDEX idx_kfs_ticker ON kalshi_flow_signals(ticker, detected_at DESC);

-- ═══════════════════════════════════════════════════════════════
-- KALSHI MARKET BASELINES (cold-start protocol for flow signals)
-- ═══════════════════════════════════════════════════════════════

CREATE TABLE kalshi_market_baselines (
  ticker TEXT NOT NULL,
  hour_of_day INTEGER,                  -- 0-23 (time-of-day normalization)
  day_of_week INTEGER,                  -- 0-6 (volume differs Mon vs Sat)
  avg_hourly_volume REAL,
  stddev_hourly_volume REAL,
  avg_spread_cents REAL,
  avg_book_depth_usd REAL,
  sample_days INTEGER DEFAULT 0,
  baseline_reliable INTEGER DEFAULT 0,  -- 1 when sample_days >= 7
  updated_at TEXT NOT NULL DEFAULT (datetime('now')),
  PRIMARY KEY (ticker, hour_of_day, day_of_week)
);

-- ═══════════════════════════════════════════════════════════════
-- CROSS-PLATFORM DIVERGENCE
-- ═══════════════════════════════════════════════════════════════

CREATE TABLE platform_divergence (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  event_pair_id INTEGER REFERENCES event_pairs(id),
  market_id_poly TEXT REFERENCES markets(market_id),
  market_id_kalshi TEXT REFERENCES markets(market_id),
  event_description TEXT,
  kalshi_yes_price REAL,
  poly_yes_price REAL,
  divergence_cents REAL,                -- SIGNED: poly - kalshi
  abs_divergence REAL,
  leader TEXT,                          -- 'kalshi' or 'polymarket'
  triggered_by TEXT,                    -- 'whale_move', 'organic', 'news', 'unknown'
  -- Execution info
  kalshi_liquidity REAL,
  poly_liquidity REAL,
  -- Fee-aware net edge (kills false positives from fee blindness)
  kalshi_fee_estimate REAL,
  poly_fee_estimate REAL,
  estimated_slippage REAL,
  net_edge_after_fees REAL,            -- divergence_cents - kalshi_fee - poly_fee - slippage
  -- Outcome
  detected_at TEXT NOT NULL DEFAULT (datetime('now')),
  converged_at TEXT,
  convergence_time_seconds REAL,       -- seconds (not minutes — arb windows often <90s)
  resolved INTEGER DEFAULT 0,
  winner TEXT                          -- which platform's price was closer to outcome
);

CREATE INDEX idx_pd_unresolved ON platform_divergence(resolved, detected_at) WHERE resolved = 0;
CREATE INDEX idx_pd_time ON platform_divergence(detected_at DESC);

-- ═══════════════════════════════════════════════════════════════
-- FEE SCHEDULE (lookup table for net edge calculation)
-- ═══════════════════════════════════════════════════════════════

CREATE TABLE fee_schedule (
  platform TEXT NOT NULL,
  market_category TEXT,
  fee_type TEXT NOT NULL,              -- 'taker', 'maker', 'settlement', 'gas'
  fee_cents REAL NOT NULL,
  fee_percent REAL,
  effective_date TEXT NOT NULL,
  PRIMARY KEY (platform, market_category, fee_type, effective_date)
);

-- Seed data (update as platforms change):
-- Kalshi taker fee: ~7 cents/contract (varies by market, range 5-10c)
-- Kalshi maker fee: 0 cents
-- Kalshi settlement fee: 0 cents
-- Polymarket CLOB fee: varies by market (check via API)
-- Polymarket gas: ~1-3 cents per tx on Polygon (estimate from RPC eth_gasPrice)

-- ═══════════════════════════════════════════════════════════════
-- PHASE 0 VALIDATION
-- ═══════════════════════════════════════════════════════════════

CREATE TABLE phase0_validation (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  date TEXT NOT NULL,
  divergences_detected INTEGER,
  divergences_above_5c INTEGER,
  divergences_above_5c_after_fees INTEGER,
  median_gap_duration_seconds REAL,
  median_lagging_liquidity_usd REAL,
  max_gap_cents REAL,
  notes TEXT
);

-- ═══════════════════════════════════════════════════════════════
-- WALLET CONSENSUS (herding detection)
-- ═══════════════════════════════════════════════════════════════

CREATE TABLE wallet_consensus (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  market_id TEXT NOT NULL REFERENCES markets(market_id),
  category TEXT,
  follow_wallets_in INTEGER NOT NULL,
  fade_wallets_in INTEGER NOT NULL,
  dominant_side TEXT,
  signal_quality TEXT,                 -- 'strong', 'crowded', 'contrarian'
  first_entry_at TEXT,
  last_entry_at TEXT,
  detected_at TEXT NOT NULL DEFAULT (datetime('now'))
);

CREATE INDEX idx_wcon_market ON wallet_consensus(market_id);

-- ═══════════════════════════════════════════════════════════════
-- WALLET SIMILARITY (behavioral fingerprinting) — DEFERRED TO v2
-- ═══════════════════════════════════════════════════════════════
-- Table defined now for data integrity. Computation pipeline deferred.
-- Simple heuristic at launch: same funding wallet (Polygon RPC) = likely same entity.

CREATE TABLE wallet_similarity (
  address_a TEXT NOT NULL,
  address_b TEXT NOT NULL,
  similarity_score REAL NOT NULL,
  feature_vector TEXT,                 -- JSON: category_dist, avg_timing, avg_size, frequency
  same_funder INTEGER DEFAULT 0,       -- quick heuristic: shared Polygon funding source [PANEL: Sonar Reasoning Pro]
  likely_same_entity INTEGER DEFAULT 0,
  computed_at TEXT NOT NULL DEFAULT (datetime('now')),
  PRIMARY KEY (address_a, address_b)
);

-- ═══════════════════════════════════════════════════════════════
-- SYSTEM META-SCORING (track our own accuracy)
-- ═══════════════════════════════════════════════════════════════

CREATE TABLE system_predictions (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  signal_source TEXT NOT NULL,         -- 'follow_copy', 'fade_invert', 'divergence_arb', 'consensus', 'flow_signal'
  market_id TEXT REFERENCES markets(market_id),
  predicted_side TEXT NOT NULL,
  confidence REAL NOT NULL,
  entry_price_simulated REAL,
  config_id INTEGER REFERENCES shadow_configurations(config_id),  -- which weight config
  settled INTEGER DEFAULT 0,
  outcome TEXT,                        -- 'correct', 'incorrect'
  pnl_simulated REAL,
  created_at TEXT NOT NULL DEFAULT (datetime('now')),
  settled_at TEXT
);

CREATE INDEX idx_sp_source ON system_predictions(signal_source, settled);
CREATE INDEX idx_sp_unsettled ON system_predictions(settled) WHERE settled = 0;

-- ═══════════════════════════════════════════════════════════════
-- SHADOW MODE WEIGHT CONFIGURATIONS
-- ═══════════════════════════════════════════════════════════════
-- Run 4 configs in parallel during shadow mode.
-- After 200+ predictions per config, pick winner empirically.

CREATE TABLE shadow_configurations (
  config_id INTEGER PRIMARY KEY,
  config_name TEXT NOT NULL,
  weight_roi REAL,
  weight_brier REAL,
  weight_timing REAL,
  weight_recency REAL,
  weight_winrate REAL,
  active INTEGER DEFAULT 1
);

-- Seed 4 configs:
-- 1: Current consensus: ROI 30%, Brier 25%, Timing 20%, Recency 15%, Win Rate 10%
-- 2: ROI-heavy: ROI 45%, Brier 20%, Timing 15%, Recency 15%, Win Rate 5%
-- 3: Brier-heavy: ROI 20%, Brier 40%, Timing 20%, Recency 15%, Win Rate 5%
-- 4: Timing-heavy: ROI 20%, Brier 20%, Timing 35%, Recency 15%, Win Rate 10%

-- ═══════════════════════════════════════════════════════════════
-- NEWS PROXIMITY (enrichment for wallet trades)
-- ═══════════════════════════════════════════════════════════════
-- Tracks news headlines fetched per market for proximity tagging.

CREATE TABLE market_news (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  market_id TEXT REFERENCES markets(market_id),
  headline TEXT NOT NULL,
  source TEXT,
  published_at TEXT NOT NULL,
  fetched_at TEXT NOT NULL DEFAULT (datetime('now'))
);

CREATE INDEX idx_mn_market_time ON market_news(market_id, published_at DESC);

-- ═══════════════════════════════════════════════════════════════
-- MACRO EVENT CALENDAR [PANEL: Sonar Reasoning Pro]
-- ═══════════════════════════════════════════════════════════════
-- Smart money acts pre-event. Without this, can't distinguish
-- early-alpha entries from "just read the calendar" entries.

CREATE TABLE macro_events (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  event_name TEXT NOT NULL,            -- 'FOMC Rate Decision', 'CPI Release', 'NFL Kickoff'
  event_type TEXT NOT NULL,            -- 'economic', 'sports', 'political', 'crypto', 'other'
  scheduled_at TEXT NOT NULL,
  actual_at TEXT,                      -- filled in after event fires
  source TEXT,                         -- 'FRED', 'investing.com', 'ESPN API'
  related_market_ids TEXT,             -- JSON array of market_ids this event touches
  created_at TEXT NOT NULL DEFAULT (datetime('now'))
);

CREATE INDEX idx_me_time ON macro_events(scheduled_at);

-- ═══════════════════════════════════════════════════════════════
-- MARKET CREATOR TRACKING [PANEL: Sonar Reasoning Pro]
-- ═══════════════════════════════════════════════════════════════
-- Smart money creates or seeds markets they have information about.
-- Market creators often have edge. Retail never sees this pattern.

CREATE TABLE market_creators (
  market_id TEXT NOT NULL REFERENCES markets(market_id),
  creator_address TEXT NOT NULL,
  creator_entity TEXT,                 -- from Arkham if known
  initial_seed_usd REAL,
  platform TEXT NOT NULL,
  created_at TEXT NOT NULL,
  PRIMARY KEY (market_id, creator_address)
);

-- ═══════════════════════════════════════════════════════════════
-- ORDERBOOK SNAPSHOTS (for slippage validation and arb decay analysis)
-- ═══════════════════════════════════════════════════════════════

CREATE TABLE orderbook_snapshots (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  market_id TEXT NOT NULL REFERENCES markets(market_id),
  platform TEXT NOT NULL,
  bid1_price REAL, bid1_size REAL,
  bid2_price REAL, bid2_size REAL,
  bid3_price REAL, bid3_size REAL,
  ask1_price REAL, ask1_size REAL,
  ask2_price REAL, ask2_size REAL,
  ask3_price REAL, ask3_size REAL,
  snapped_at TEXT NOT NULL DEFAULT (datetime('now'))
);

CREATE INDEX idx_obs_market_time ON orderbook_snapshots(market_id, snapped_at DESC);

-- ═══════════════════════════════════════════════════════════════
-- SURVIVORSHIP BIAS TRACKER [PANEL: DeepSeek V3.1]
-- ═══════════════════════════════════════════════════════════════
-- Track ALL wallets that ever crossed 60%+ win rate.
-- Monitor how many regressed. Report survival rate as system calibration.

CREATE TABLE wallet_peak_tracking (
  address TEXT PRIMARY KEY,
  peak_win_rate REAL,
  peak_at TEXT,
  peak_trade_count INTEGER,
  current_win_rate REAL,
  regressed INTEGER DEFAULT 0,         -- 1 if win_rate dropped below 50% after peak >60%
  updated_at TEXT NOT NULL DEFAULT (datetime('now'))
);

-- ═══════════════════════════════════════════════════════════════
-- SQLITE WRITE BUFFER TRACKING
-- ═══════════════════════════════════════════════════════════════
-- Tracks failed writes from the in-memory buffer for recovery.

CREATE TABLE failed_writes_log (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  table_name TEXT NOT NULL,
  payload TEXT NOT NULL,               -- JSON of the failed row
  error TEXT,
  failed_at TEXT NOT NULL DEFAULT (datetime('now')),
  recovered INTEGER DEFAULT 0
);

Event Matching Pipeline

The event_pairs table is the load-bearing wall of all cross-platform features. Without it, the divergence detector compares random unrelated markets.

Matching pipeline:

  1. Normalize titles: lowercase, strip "Will", "by", date suffixes, platform-specific boilerplate
  2. Levenshtein ratio on normalized titles — threshold >0.75 = candidate
  3. For candidates, check: date range overlap (end_date within 7 days) and category match
  4. Score = 0.5 × title_sim + 0.3 × date_overlap_score + 0.2 × category_match
  5. Scores >0.85: auto-match (verified=0, goes to weekly human review). Scores 0.6-0.85: manual review queue immediately.
  6. Deterministic series rules for structured events: Kalshi ticker pattern FED-{date}-T{rate} maps to Polymarket titles containing "Federal Reserve" + same date
  7. Resolution rule check [PANEL: Gemini 3.1 Pro]: Two markets may look identical but resolve differently (Polymarket uses UMA oracle, Kalshi uses Bloomberg print). Flag mismatches. If rules differ, it is NOT an arbitrage — it is a correlation trade with tail risk. Populate resolution_rules_match field.
  8. Human reviews unverified matches weekly via CLI tool showing both titles side-by-side

All divergence detection queries JOIN through event_pairs — no verified entry, no divergence signal.


Proxy Wallet Mapping

On startup and daily: query Polygon for ProxyCreated events from Polymarket's proxy factory contract (address: 0xaB45c5A4B0c941a2F231C04C3f49182e1A254052 on Polygon). Each event contains the EOA and proxy address.

Convention: use proxy_address as canonical (what the CLOB API uses). Before creating a new wallet_profile, check wallet_address_map to see if this address (EOA or proxy) is already tracked. Without this, the same trader appears twice — once from CLOB data and once from on-chain scanning — splitting their stats and corrupting scoring.


Scoring System

Wallet Ranking Composite Score (0-100):

[PANEL: Sonar Reasoning Pro] — Entry timing weight should be 35-40%, not 20%. A wallet that buys at 20c and gets paid at 95c (early + right) should rank far higher than one buying at 80c getting paid at 100c (late + right). Consider increasing timing weight to 35% and reducing Brier to 15% based on shadow mode results.

Weight configurations tested in parallel during shadow mode (see shadow_configurations table). Weights finalized empirically after 200+ settled predictions per config.

Confidence Tiers:

Minimum thresholds:

List thresholds:

Precise Brier Score Calculation

Brier is 25% of composite. Must be calculated correctly:

  1. Predicted probability = volume-weighted average entry price (VWAP) of shares still held at settlement
  2. If wallet exits completely before settlement (net shares = 0): EXCLUDE from Brier. They made a trade, not a prediction. Count for ROI/win-rate only.
  3. Partial exit: use VWAP of remaining shares only
  4. Formula: brier = (predicted_probability - outcome)^2 where outcome = 1.0 (YES wins) or 0.0 (NO wins)
  5. Conviction-weighted Brier: weighted_brier = brier × log2(1 + cost_basis_usd / 100). A $10,000 position contributes ~6.6x more than a $100 position.
  6. Aggregate wallet Brier = weighted average of all conviction-weighted individual Briers
  7. [PANEL: DeepSeek V3.1] — Bayesian-adjusted Brier for small samples: add 10 virtual trades at mean probability before computing. Brier is unstable at <50 trades.

Position Lifecycle Tracking

position_id assignment rules (required for P&L, Brier, exit detection):

  1. A position = contiguous period where wallet holds non-zero net shares in a market on a specific side
  2. position_id = hash(address + market_id + side + open_timestamp) where open_timestamp = first trade creating non-zero position
  3. States: OPEN (net_shares > 0), CLOSED (net_shares hit 0 via sells), SETTLED (market resolved while position was open)
  4. When net_shares hits 0 via selling → position CLOSED. If wallet buys again → NEW position with new position_id
  5. P&L per position: realized_pnl = sum(sell_proceeds) - sum(buy_costs) (closed). settlement_pnl = (outcome × net_shares) - cost_basis (settled)
  6. Exit alerts: >50% reduction in single trade or series within 1 hour → 'reduce' alert. Position hits 0 → 'exit' alert.

Wallet Age Asymmetry

[PANEL: Sonar Reasoning Pro] — Wallets that enter a market on day 1 (before retail knows it exists) have different edge than those entering on day 30. Track: (first_entry_at - market_created_at) / (market_end_date - market_created_at). Entries in the <20% bucket should be weighted 3x in timing score. Already derivable from existing data — just needs explicit computation.


Data Collection Flow

WebSocket Manager (Required — Not Optional)

Formal component: WebSocketManager

Polymarket connections: wss://ws-subscriptions-clob.polymarket.com/ws/market/{market_id}. Subscribe to trades and book channels.

Protocol:

  1. Heartbeat: send ping every 30s. No pong within 10s → mark stale
  2. Reconnection: on close/error/stale: backoff 1s → 2s → 4s → 8s → 16s → 30s max. Max 10 retries, then fallback to REST polling
  3. Gap backfill on reconnect: note last received timestamp. After reconnecting, call REST GET /trades?market={id}&after={last_timestamp} immediately to backfill missed trades
  4. Connection pooling: subscribe to multiple markets per connection (Polymarket supports comma-separated asset IDs). Target: 1 connection per 20 markets = 10 connections for 200 tracked markets
  5. Graceful degradation: when WebSocket is down, increase REST polling from 15min → 2min for follow-list markets
  6. Gap detection: no message from any subscribed market in 5 minutes during market hours → log warning + trigger REST backfill

REST-first principle: Build REST polling fallback FIRST. Add WebSocket as enhancement. If WebSocket is down, system degrades gracefully, not breaks.

SQLite Write Buffer (Required — Correctness, Not Optimization)

All WebSocket-sourced writes go through an in-memory queue:

  1. Write queue: simple array, flushed every 2 seconds
  2. Flush = single BEGIN/COMMIT transaction wrapping all queued writes as batch INSERT
  3. If SQLITE_BUSY: retry after 100ms, up to 5 times
  4. If all retries fail: write to failed_writes.jsonl (and failed_writes_log table) for manual recovery
  5. During daily batch jobs (scoring, list generation): acquire advisory lock (row in a locks table). Write buffer pauses flushing. Queued writes accumulate in memory, flush after batch completes (~60s max window)
  6. Max queue size: 10,000 entries. If exceeded: drop oldest entries + log error
  7. At normal volume (500 wallets × 5 trades/day ≈ 1 trade/35 seconds), queue rarely exceeds 1-2 entries between flushes

Tiered Polling Frequency

HOT markets (daily volume >$50K on both platforms, OR follow-list wallet has open position):

WARM markets (daily volume $5K-$50K):

COLD markets (daily volume <$5K):

Tier assignment: recalculate daily from trailing 7-day average volume. Promote to HOT immediately when follow-list wallet enters a position. Rate limit budget: HOT polling ≈ 2 req/sec (20% of 10/sec Kalshi limit), WARM ≈ 0.8 req/sec (8%), leaving 72% for flow analysis and discovery.

Daily Full Scan (6AM ET)

  1. Pull Polymarket leaderboard / batch profiles from Gamma API
  2. Fetch settled markets from last 24h (one call) → update all wallets holding those markets. O(settled_markets) not O(wallets)
  3. Incrementally update wallet stats (running aggregates, not full recompute)
  4. Recalculate composite scores under all 4 shadow configurations
  5. Rebuild follow/fade/watch lists
  6. Run wallet similarity (weekly, not daily — O(n²) is expensive; only for wallets active last 30 days)
  7. Compare current Kalshi vs Polymarket prices on overlapping events → log divergences
  8. Check market halt/suspension status [PANEL: GPT-4.1]

Cross-Platform Divergence Monitoring (Tiered)

  1. Fetch prices for event_pairs where verified=1 (or match_confidence >0.85)
  2. If divergence > 5 cents AND net_edge_after_fees > 2 cents: log to platform_divergence, alert to EDGE TEAM
  3. Track convergence: when/if gap closes, how long it took (in seconds, not minutes)
  4. Track leader: which platform moved first
  5. Polling frequency determined by event_pairs.polling_tier (HOT/WARM/COLD)

Kill threshold: net_edge_after_fees > 2 cents minimum. This kills false positives from the Kalshi 7-cent taker fee eating apparent divergences under ~12 cents.

Kalshi Flow Analysis (Every 15 Minutes, WARM markets)

  1. Volume spikes (>3x hourly average — uses time-of-day and day-of-week normalized baseline from kalshi_market_baselines)
  2. Order book depth changes (iceberg detection)
  3. Correlated series movement (multiple related markets move simultaneously)
  4. Event window analysis: weight volume around scheduled events higher (uses macro_events table)
  5. Maker/taker inference from trade tape
  6. Overnight/weekend gap analysis at market open

Cold-start protocol: For the first 7 days after a market is added, collect hourly snapshots but do NOT generate flow signals (baseline_reliable=0). After 7 days, compute baselines per hour-of-day and day-of-week (baseline_reliable=1). Use category-level baselines as fallback for new markets. Baseline decay: recompute weekly using 30-day rolling window. Markets inactive >14 days reset to baseline_reliable=0.

News Proximity Enrichment

For each active market, extract 2-3 keywords from title. Every 30 minutes, check Google News RSS for those keywords (free, unlimited, no API key required). When a follow-list wallet trades, look back 30 minutes for matching headlines. Tag trade as news_proximate=1 if found.

Scoring impact: Trades where news_proximate=1 get 0.5x weight for the Timing component. Add news_adjusted_timing_score to wallet_profiles that discounts news-proximate trades. A wallet consistently early BEFORE public news has exponentially more value.

[PANEL: DeepSeek V3.1 / GPT-4.1 / Grok 4.1 / Grok 4 Fast] — Social sentiment (X/Twitter, Reddit) around market events often precedes price moves, especially in politics and crypto. Free options: Twitter API v2 basic tier for low-volume polling, Google News RSS. Tag whale trades with social_proximate indicator. Classify as:

Settlement Scan (Event-Driven, Hourly Fallback)

  1. Prefer: Polygon WebSocket settlement events → instant update
  2. Fallback: hourly poll for resolved markets
  3. Update wallet P&L, win/loss, Brier scores — incremental, not full recompute
  4. Track which platform was closer to outcome on divergence events
  5. Update system_predictions outcomes
  6. Check for cancelled/disputed markets [PANEL: Sonar Reasoning Pro] — filter these from wallet scoring or weight at 0.5x

Consensus Detection

When multiple follow-list wallets enter the same market:


Regime Change Detection


Execution-Aware Alerting

Before firing any copy/fade alert:

  1. Available liquidity at alert price on target platform
  2. Estimated slippage for intended position size (from order book depth)
  3. Net edge after fees: expected edge minus slippage minus platform fees (see fee_schedule table)
  4. Kill threshold: net_edge_after_fees > 2 cents minimum. Below this, suppress the alert.
  5. Alert format includes: Fees: Kalshi 7c taker + Poly ~2c gas = 9c | Net edge: {net}c

Market Maker Tracking

Market makers profit from spread, not prediction. Track separately:


Wallet Discovery (Beyond Leaderboard)

  1. On-chain large trade scanning: Monitor Polygon CTFE + Neg Risk Adapter events for all trades above $5K. New address → auto-add as candidate. After 50 settled trades → auto-classify.
  2. Settlement backtracking: Find settled markets where price was wrong early. Identify who was correct before the crowd. These wallets the leaderboard misses because they don't have high volume.
  3. Behavioral fingerprinting: New wallet feature vector >0.85 cosine similarity to known follow-list wallet → promote to watch list immediately. Also catches wallet rotation.
  4. Sybil detection: Similarity >0.95 = flag for review (NOT auto-merge). Check on-chain funding relationships before clustering. Default: treat as separate until proven otherwise. [PANEL: Sonar Reasoning Pro]
  5. Market creator tracking: Wallets that create or seed markets often have edge. Pull creator address from Gamma API. Cross-ref with known institutional addresses. [PANEL: Sonar Reasoning Pro]
  6. Arkham Intel labels: Seed entity identification for known funds and MMs. Treat as hints, not ground truth (free tier: ~100 queries/day, spotty Polymarket coverage).
  7. Deposit monitoring: Large USDC transfers to Polymarket proxy wallet contract. New $10K+ deposits signal new whale entering.

Anti-Gaming / Quality Filters


Category Intelligence

Track wallet accuracy per category:


Shadow Mode

Before any live trades:

  1. Run full system 6+ months (or 100 trades / 2 months — whichever first) [PANEL: Sonar Reasoning Pro — reduced from 6 months]
  2. For every alert, simulate copy-trade at REALISTIC price (include 2-5 cent slippage)
  3. Track simulated P&L in system_predictions (tagged by config_id for weight comparison)
  4. Compute system-level Brier score on own signals weekly
  5. Require: 200+ simulated trades with positive expected value after slippage before going live
  6. [PANEL: Sonar Reasoning Pro] — Don't wait for full shadow mode before any live exposure. If Brier >0.52 after 100 trades/2 months: go live at 5-10% sizing while shadow continues.
  7. If shadow mode shows negative expectancy → adjust weights/thresholds or kill

Self-Improvement Loop

Every alert generates a prediction: "this signal will be profitable." After settlement:


Integration Points


Storage & Performance


API Endpoints Reference

Polymarket

# Market data
GET https://clob.polymarket.com/markets           — list active markets
GET https://clob.polymarket.com/markets/{id}      — single market
GET https://clob.polymarket.com/books?token_id={id} — order book
GET https://clob.polymarket.com/trades?market={id}&after={ts} — trade tape (with timestamp backfill)

# Wallet data (Gamma API)
GET https://gamma-api.polymarket.com/profiles/{address} — wallet profile
GET https://gamma-api.polymarket.com/leaderboard     — top wallets
GET https://gamma-api.polymarket.com/positions/{address} — current positions
GET https://gamma-api.polymarket.com/activity/{address}  — trade history

# WebSocket
wss://ws-subscriptions-clob.polymarket.com/ws/market/{asset_ids_csv}
  Channels: trades, book

Kalshi

# Auth (required)
POST https://trading-api.kalshi.com/trade-api/v2/login
  Body: {email, password} → {token, member_id}

# Markets
GET https://trading-api.kalshi.com/trade-api/v2/markets        — all markets
GET https://trading-api.kalshi.com/trade-api/v2/markets/{ticker} — single market
GET https://trading-api.kalshi.com/trade-api/v2/markets/{ticker}/orderbook — live order book
GET https://trading-api.kalshi.com/trade-api/v2/markets/{ticker}/trades   — trade tape
GET https://trading-api.kalshi.com/trade-api/v2/series/{ticker}           — series markets

# WebSocket
wss://trading-api.kalshi.com/trade-api/ws/v2
  Auth: Bearer token in Upgrade headers
  Use for HOT markets only (preserves REST rate limit budget)

Polygon RPC

# Primary: Alchemy free tier
https://polygon-mainnet.g.alchemy.com/v2/{API_KEY}
  eth_getLogs — CTFE contract events (wallet trades, settlements)
  eth_getLogs — proxy factory events (ProxyCreated)
  eth_gasPrice — gas estimation for Poly fees

# Polygon contracts
CTFE (standard trades): 0x4bFb41d5B3570DeFd03C39a9A4D8dE6Bd8B8982E
Neg Risk Adapter (multi-outcome markets): 0xd91E80cF2E7be2e162c6513ceD06f1dD0dA35296
Proxy Factory: 0xaB45c5A4B0c941a2F231C04C3f49182e1A254052

External Data

# News proximity (free)
Google News RSS: https://news.google.com/rss/search?q={keywords}&hl=en
  No API key. No rate limit. Extract title + published date.

# Macro event calendar (free)
FRED: https://api.stlouisfed.org/fred/releases?api_key={free_key}
Investing.com: scrape economic calendar page

# Wallet labels (free tier)
Arkham Intelligence: https://api.arkhamintelligence.com — ~100 queries/day free
  Use only for new wallets, cache aggressively. Do NOT poll daily.
Dune Analytics: https://dune.com/queries — community Polymarket dashboards

Collection Cadence Summary

Component Frequency Method Rate budget
Polymarket HOT markets Real-time WebSocket ~10 connections for 200 markets
Kalshi HOT markets 15s Kalshi WebSocket ~2 req/sec
Polymarket WARM markets 2min REST CLOB API ~0.8 req/sec
Kalshi WARM markets 2min REST REST API ~0.8 req/sec
COLD markets 15min REST REST APIs ~0.1 req/sec
Divergence detection Tier-driven Derived from above n/a
Kalshi flow signals 15min REST API included above
Daily wallet scan 6AM ET Gamma API + CLOB 1 batch
Settlement scan Hourly + event Polygon WS + REST low
News proximity 30min Google News RSS unlimited
Orderbook snapshots 5min (active) CLOB + Kalshi ~0.07 req/sec
Proxy wallet mapping Daily + startup Polygon RPC Alchemy
Baseline updates Weekly Computed n/a
Wallet similarity Weekly Computed n/a
Macro event calendar Daily FRED / scrape low

Storage Estimates

Table Est. rows/year Avg row size GB/year
wallet_trades 900K 200 bytes ~0.18 GB
markets 50K 300 bytes ~0.015 GB
platform_divergence 50K 200 bytes ~0.01 GB
kalshi_flow_signals 100K 200 bytes ~0.02 GB
orderbook_snapshots 3M 100 bytes ~0.30 GB
system_predictions 200K 150 bytes ~0.03 GB
market_news 500K 300 bytes ~0.15 GB
wallet_profiles 5K 500 bytes ~0.003 GB
kalshi_market_baselines 100K 100 bytes ~0.01 GB
Total (main DB) ~0.72 GB/year

Well within SQLite capacity. Archive wallet_trades at 180 days to keep main table under ~450K rows.


Alert Format

WALLET ALERT — COPY SIGNAL
Wallet: 0x1a2b...3c4d (Follow #3, ROI: +34%, Brier: 0.18)
Entity: Wintermute (Arkham)
Market: "BTC above $100K by June 2026"
Action: Bought YES @ $0.42, $15,000 size
Conviction: 3.2x their typical size
Category: Crypto (cat. accuracy: 78%)
Timing: Early (12% of time to settlement) [NEWS: none — genuine alpha]
Strength: HIGH (0.87)
Slippage est: 2.1c | Kalshi fee: 7c | Net edge: +4.9c
Kalshi same event: YES @ $0.39 (3c gap — ARB OPPORTUNITY if net >2c)

---

DIVERGENCE ALERT — CROSS-PLATFORM
Event: "Fed cuts rates in March" [event_pair verified, rules match confirmed]
Polymarket YES: $0.62 (moved +8c in 2h, whale-driven)
Kalshi YES: $0.55 (hasn't moved)
Gross gap: 7 cents
Fees: Kalshi 7c taker + Poly ~2c gas = 9c total
NET EDGE: -2 cents — DO NOT TRADE (below 2c kill threshold)
[For reference only — watch for convergence]

---

CONSENSUS ALERT — SHARP MONEY CONVERGENCE
Market: "ETH above $5K by Q3"
Follow-list wallets in: 4/50 (top 8%)
Sides: 3 YES, 1 NO
Avg entry: YES @ $0.31
Price move since first entry: +3c (still early)
Category: Crypto
Escalated to: Crypto Desk research queue
Note: 1 follow-list wallet on opposite side — possible contrarian signal, review manually

Build Order

Step 1 — Event matching table (2-3 days) Manual curation of top 50 overlapping markets (elections, Fed rates, BTC price, major sports). Build Levenshtein + deterministic rules. Unblocks everything cross-platform. Validate actual overlap % — if <30%, pivot strategy.

Step 2 — Phase 0 validation: Cross-platform price polling + divergence detector (3-5 days) No wallet tracking needed. Just: fetch Polymarket price, fetch Kalshi price, compare on verified event_pairs, log divergences. Run for 4 weeks. Answer: how many >5c-after-fees divergences per day? How long do they last? This validates or kills the core thesis.

Step 3 — Polymarket CLOB + Gamma API integration (3-5 days) Basic market data flowing: active markets, prices, volumes, leaderboard wallets. Populates markets table and seeds wallet_profiles with top ~200 leaderboard wallets. REST polling only — no WebSocket yet.

Step 4 — Kalshi API integration (authenticated) (3-5 days) Auth handling, rate limiting, market discovery, order book snapshots, trade tape collection. Combined with step 3, both platforms flow.

Step 5 — Wallet trade history backfill for top 100 leaderboard wallets (5-7 days) Pull full trade history from Gamma API. Populate wallet_trades. Calculate initial P&L, win rate, Brier. First follow/fade candidates. Design as re-runnable incremental batch.

Step 6 — Scoring engine + follow/fade list generation (3-5 days) Implement composite scoring (all 4 shadow configs in parallel). Generate initial lists. Start shadow mode. The 6-month clock starts here.

Step 7 — Polymarket WebSocket integration (5-7 days) Upgrade from 15-minute REST to real-time. WebSocketManager with reconnection, gap backfill, and graceful degradation. Only after REST system is stable.

Step 8 — Alert system (Telegram) (2-3 days) Format and send wallet alerts + divergence alerts. Include fee-aware net edge in every alert. Test that alerts are actionable (not too frequent, easy to parse).

Step 9 — Settlement tracking + system self-scoring (3-5 days) When markets resolve: update wallet P&L, system_predictions outcomes, system-level Brier. Weekly accuracy report.

Step 10 — Kalshi flow signal detection (5-7 days) After 4-6 weeks of Kalshi data (step 4), baselines are established. Enable flow signals starting with volume_spike and price_divergence. Validate Brier before adding more signal types. Defer iceberg detection.

Step 11 — On-chain wallet discovery scanning (5-7 days) Scan CTFE + Neg Risk Adapter for large trades from unknown addresses. Auto-add to candidate pool. Start using Alchemy for reliable eth_getLogs.

Step 12 — v2 features (weeks each, 3-6 months of data required) Wallet similarity clustering, sophisticated MM tracking, consensus detection enhancements, funding chain analysis, full social sentiment pipeline.


Cross-Desk Reuse

Component Reusable for
Event matching infrastructure (entity resolution) Sports, crypto, politics, economics, macro desks
Kalshi API client (auth, rate limiting, market data) Economics, weather, politics desks
Polymarket API client (CLOB + Gamma + WebSocket) Crypto, politics, sports desks
Brier score calculation + settlement tracking All desks (standardize as shared module)
Telegram alert bot with structured templates All desks
Divergence detection pattern (price gap across sources) Crypto (CEX vs DEX), sports (sportsbook lines), forex (broker spreads)
WebSocket manager (reconnection + gap detection) Crypto exchanges, forex, sports real-time feeds
Sharp money scoring methodology Sports betting (sharp vs square), crypto (whale wallet tracking)
Slippage modeling function (order book depth → fill price) All desks with execution
Macro event calendar Macro, crypto, sports, options desks

Reality Check

Realistic if: Build order is followed strictly and v2 features are deferred. First divergence signals in ~2 weeks. First wallet signals in ~5-6 weeks.

Biggest risk: The cross-platform arbitrage edge may not exist at the scale needed. Empirical overlap is ~30-40%. Divergence windows may close in seconds, not minutes. The Phase 0 validation gate (4 weeks, 3 threshold tests) prevents the most expensive failure: spending months building an elaborate system on a false premise.

First to break:

  1. Polymarket WebSocket connection stability — build REST fallback first, add WebSocket as enhancement
  2. SQLite under concurrent load — write buffer required from day 1
  3. Event matching — fuzzy title matching fails 20-30% of the time; requires manual curation, not scalable without human review queue

Second to break:

  1. Kalshi API auth token expiry if refresh logic is not implemented
  2. Kalshi trade tape delay (15-30s, up to 45s in volatility) causing stale divergence reads

Budget note: Polymarket, Kalshi, Polygon RPC (Alchemy free tier), Telegram bot, Google News RSS — all free. No additional API costs required for this spec. This is within the $115-145/month total budget with room to spare.

Top change: Validate market overlap empirically BEFORE building anything beyond the divergence detector. If <30% overlap, the entire cross-platform strategy pivots to pure wallet intelligence — still highly valuable but different architecture priority.


PART III — PREDICTION MARKET SIGNAL INTELLIGENCE

Overview

This section addresses a broader question than wallet intelligence alone: can prediction markets (Kalshi, Polymarket) serve as leading indicators for other trading venues, and can informed/insider trading on prediction markets be detected and exploited?

Core thesis: The informed-trading detection pipeline (on-chain wallet intelligence, already covered in Parts I-II) is more robust and immediately actionable than the cross-venue signal exploitation thesis. The cross-venue thesis is partially valid but far more nuanced and less consistently profitable than it appears.

Bottom line: The on-chain intelligence pipeline should be built first. Cross-venue signal exploitation should be built selectively, starting with the political/macro scenario only.

[PANEL: Gemini 3.1 Pro] The Signal Intelligence Desk (SigInt) should function as an internal API/data provider only — it does not execute trades. It generates standardized JSON alerts consumed by other desks:

{
  "signal_type": "whale_activity | pre_news_move | volume_spike | divergence",
  "confidence": 0.82,
  "asset": "FXI",
  "direction": "short",
  "source_contract": "US tariff >25% on China by April",
  "source_platform": "polymarket",
  "predicted_lag_time": "12-48h",
  "downstream_venues": ["IBKR", "OANDA"],
  "expires_at": "2026-04-01T00:00:00Z"
}

TOPIC 1: PREDICTION MARKETS AS LEADING INDICATORS


Scenario 1: Prediction Markets --> Sportsbooks (Player Props, Game Totals)

Feasibility: 3/10

The premise is largely backwards for sports. The information flow in sports betting runs from sharp sportsbooks (Pinnacle, Circa, CRIS) to prediction markets, not the other way around. Here is why:

The one exception: When Kalshi lists a contract that sportsbooks don't cover (e.g., a niche stat, a quarter-level prop), information CAN flow from Kalshi outward. But these contracts have even thinner liquidity.

Time lag reality: When a significant injury is announced (say, a QB is ruled out), the repricing sequence typically goes:

  1. Twitter/X breaking news: T+0
  2. Pinnacle/sharp books: T+15 seconds to T+2 minutes
  3. Kalshi/Polymarket sports: T+1 to T+5 minutes
  4. DraftKings/FanDuel: T+30 seconds to T+5 minutes (automated feeds from Pinnacle)

The window between Kalshi repricing and DraftKings repricing is inconsistent and often nonexistent because DraftKings also monitors the same news feeds.

Estimated edge: Near zero for automated systems. Occasionally 1-3 cents on a prop that Kalshi reprices first, but the frequency is too low (maybe 2-5 tradeable signals per week across all sports) and the window too short (<60 seconds) to build a reliable desk around.

Example workflow -- showing why it's difficult:

The Kalshi contract moved 10 seconds before DraftKings but DraftKings already had the line moving. You might catch 5 cents on the game total, but only if your order hits before DraftKings' automated line movement. In practice, your fill rate on stale lines is <20%.

[PANEL: GPT-4.1] The real bottleneck for any PM-to-sportsbook edge is soft book account lifespan. Even if the signal is real, soft books (DraftKings, FanDuel) limit and ban winning accounts quickly. The edge is increasingly well-known among arbers, and account longevity -- not signal quality -- is the binding constraint.

Infrastructure needed (if pursued anyway):


Scenario 2: Prediction Markets --> DFS Lineup Construction

Feasibility: 5/10

This is more interesting than Scenario 1 because the information asymmetry is real but the edge is indirect and hard to quantify.

Why it has some merit:

The edge mechanism:

Real example:

Estimated edge: 2-5% ROI improvement in GPPs on days with significant prediction market signal divergence. This is meaningful in DFS where long-run ROI of 5-10% is elite. But the signal only fires maybe 3-8 times per week across all major slates.

Infrastructure needed:


Scenario 3: Political/Regulatory Prediction Markets --> Stocks and Forex

Feasibility: 7/10 -- THE BEST SCENARIO IN TOPIC 1

This is where prediction markets genuinely lead. Here's why:

  1. Prediction markets aggregate political intelligence that financial markets price poorly. A tariff announcement's probability might move from 30% to 70% on Polymarket over 48 hours based on DC insider chatter, congressional staff leaks, and lobbyist intelligence. The stock market won't price this in until a news article appears or the announcement happens.

  2. The participants are different. Polymarket political bettors include DC insiders, political operatives, and people with genuine information advantages. Equity/forex traders are mostly quants, institutions, and retail -- they're good at pricing earnings and economic data but bad at pricing political risk.

  3. Liquidity exists on both sides. Polymarket political contracts regularly have $1M-$50M+ in volume. And the downstream assets (SPY, QQQ, sector ETFs, forex majors) have functionally unlimited liquidity.

Specific sub-scenarios:

Fed Rate Contracts --> Treasury Yields, USD, Bank Stocks

Feasibility: 6/10. This is the most competitive sub-scenario. CME FedWatch (fed funds futures) already prices this efficiently and has been doing so for 30 years. Polymarket's Fed rate contracts are largely derivative of CME FedWatch. However:

Tariff/Trade Policy Contracts --> Sector Stocks, Commodities

Feasibility: 8/10. This is the strongest sub-scenario. Here is a concrete case study:

Example workflow with real numbers:

  1. Monitor: Polymarket "New tariff on [country/sector]" contract
  2. Signal: Contract moves from <40c to >60c within 24 hours with >$500K volume
  3. Identify downstream assets: sector ETFs, individual stocks, commodity futures, affected forex pairs
  4. Execute: Short affected sector ETF or buy puts. Size: 1-2% of bankroll per signal.
  5. Horizon: Hold 1-5 days
  6. Exit: When financial markets have fully repriced (prediction market price stabilizes, mainstream media cycle complete)

Expected edge: 2-5% per trade, 8-15 signals per year that meet the quality threshold. Annual expected return from this sub-desk alone: 15-40% on allocated capital, with significant variance.

Election Contracts --> Sector Rotation

Feasibility: 7/10. Well-documented and studied.

FDA Approval Contracts --> Biotech/Pharma Stocks

Feasibility: 7/10. This is real but legally sensitive.

CPI/Jobs Data --> Broad Market

Feasibility: 4/10. Economic data surprises are extremely hard to predict and prediction markets don't have a clear advantage over the options market (which already prices expected moves via implied volatility).

Infrastructure for Scenario 3 (Political/Macro):


Scenario 4: Weather Prediction Markets --> Energy/Commodities

Feasibility: 4/10

Challenge to the premise: Weather prediction markets on Kalshi have extremely thin liquidity ($1K-$50K per contract typically). Energy traders at commodity desks use the same NWS/ECMWF/GFS data that weather prediction market participants use, and they have PhD meteorologists on staff. The information flow is: weather models --> energy desks AND weather markets simultaneously.

The theoretical window exists only when:

Time window: Only realistic during off-hours. Natural gas futures trade nearly 24 hours but are thin overnight. If a weather model update drops at 4 AM ET and Kalshi weather contracts move, the NG open could gap in the expected direction.

Problem: The edge is tiny (maybe 1-3 cents on NG per event) and the frequency is low (a few times per heating/cooling season). Energy markets are sophisticated; this isn't a consistent edge.

Recommendation: Do not prioritize this.


Scenario 5: Crypto Prediction Markets --> Spot/Derivatives

Feasibility: 2/10

This doesn't work. Here's why:

[PANEL: Grok 4 Fast] Dissent on weekends (rated 7/10): During weekend low-liquidity windows, Polymarket BTC/crypto contracts can lead spot by 10-30 minutes. Example: Dec 31, 2023 (weekend), Polymarket "BTC >$45K by Jan 1" spiked to 75c at 11 PM UTC on whale bet. BTC spot lagged from $42.5K to $43.2K by 11:15 PM. This is a niche edge -- viable for weekends only, with Polymarket liquidity too low for reliable signals, but worth monitoring as a sentiment indicator for weekend crypto positioning.

Recommendation: Do not build this as a standalone desk. Monitor as part of the broader wallet intelligence system.


TOPIC 1 PREMISE CHALLENGES

  1. Scenario 1 (sports) is not worth building. The information flow runs the wrong direction. Use Pinnacle as the oracle (which we already do) and forget about Kalshi-to-sportsbook signals.

  2. Scenario 5 (crypto) is a dead end. Crypto prediction markets are derivatives of spot, not information sources. The answer to "do they ever lead spot?" is functionally never in a tradeable way (weekend dissent noted above).

  3. The overall thesis of Topic 1 is weaker than it appears. Prediction markets lead other venues ONLY for political/regulatory events, where the prediction market participant base has genuinely different information than financial market participants. For sports, weather, and crypto, the participant bases overlap too much.

  4. [PANEL: GPT-4.1] If the cross-venue edge proves too fleeting (especially on sports/crypto), use PM signals for model validation and risk management rather than direct trading. A prediction market probability shift can serve as a sanity check on internal models, even if the trading window is too short to exploit directly.


TOPIC 2: DETECTING INFORMED/INSIDER TRADING ON PREDICTION MARKETS

This is the more promising direction. The on-chain transparency of Polymarket creates a genuine, unique data advantage. This topic is undersold relative to Topic 1. Most quant funds and retail traders are not doing this yet. The data is free and public. This should be the primary focus.

Note: Much of the wallet profiling infrastructure is already specified in Parts I-II of this document. This section focuses on the broader detection methods and the tradeable edge they create.


Method 1: Pre-News Price Movement Detection

Approach: Build a system that:

  1. Records every Polymarket contract price at 1-minute granularity
  2. Records timestamps of corresponding real-world news events
  3. Correlates: did the price move significantly BEFORE the news?

Defining "significant movement":

Defining "news timestamp" (recommended hierarchy):

  1. Wire services: AP, Reuters, Bloomberg terminal flash (T+0 baseline). Access via Bloomberg API ($2K/mo) or news API aggregators.
  2. Twitter/X first mention: Search for the event on X with academic API ($100/mo). The first tweet from a verified journalist is T+0 for social media.
  3. Mainstream article: First indexed article on Google News. This is T+5 to T+60 min typically.

Use the wire service timestamp as ground truth. If the prediction market moved >8c before the wire service timestamp, flag it.

False positive rate (based on studies of Polymarket data from 2024-2025):

To reduce false positives:

Real examples:

  1. 2024 Biden withdrawal (July 21, 2024): Polymarket "Biden Democratic nominee" dropped from ~85c to ~70c over the 48 hours before the official announcement. Multiple large sells appeared from previously inactive wallets. The NYT didn't report imminent withdrawal until ~12 hours before the announcement. Signal preceded mainstream news by 36+ hours.

  2. 2024 Trump VP pick: Polymarket "JD Vance VP nominee" spiked from ~30c to ~55c approximately 6 hours before the announcement. Whale wallets placed $100K+ in coordinated buys.

  3. Supreme Court decisions: Polymarket contracts on major SCOTUS rulings have shown pre-announcement movement in the correct direction in roughly 60% of cases, versus 50% expected by chance. The window is typically 1-4 hours before the opinion is released.


Method 2: Whale Wallet Profiling on Polymarket

This method is extensively covered in Parts I-II of this document. The following adds the scoring system and data access specifics relevant to the prediction market signal exploitation context.

This is the highest-value intelligence product in the entire system.

Data access:

Wallet scoring system (composite, 0-100):

Metric Weight Threshold for "suspicious"
Win rate 25% >65% over 20+ resolved bets
Pre-news timing 25% >30% of bets placed <6h before relevant news
Category concentration 15% >80% of bets in one category
Wallet age at first large bet 10% <7 days old with first bet >$5K
Cluster membership 15% Linked to 3+ wallets via funding source or timing patterns
Profit/loss ratio 10% >3:1 P/L on resolved positions

Thresholds:

[PANEL: Grok 4 Fast] Bayesian scoring model formula: Score = (win_rate * volume) + (pre_news_bets * 2) - (diversification_penalty). Threshold >0.7 for "informed." Build via Python with NetworkX for cluster analysis.

Cluster detection (critical because informed traders create fresh wallets):

  1. Funding source analysis: Trace where the wallet's MATIC/USDC came from. If 5 wallets all received initial funding from the same source wallet (or from the same centralized exchange deposit), they're likely one entity.
  2. Timing correlation: If 4 wallets all buy YES on the same contract within a 10-minute window, and this pattern repeats across multiple contracts, flag as a cluster.
  3. Gas price patterns: Same gas price strategy across wallets suggests same bot/software.
  4. Position sizing patterns: If wallets consistently bet similar amounts (e.g., always $2,500), they may be splitting a larger position.

[PANEL: Gemini 3.1 Pro] Market maker wallet exclusion: Identify and filter out known market makers (Wintermute, etc.) to improve signal quality. Market makers show symmetric trading (buy and sell), tight spread placement, high volume but low directional PnL. An A+ system would monitor Wintermute's wallets across all chains to detect liquidity pulls before PM prices move.

Implementation tables:

- wallets (address, first_seen, last_active, total_volume, win_rate, score)
- trades (tx_hash, wallet, contract_id, side, amount, price, timestamp)
- wallet_clusters (cluster_id, wallet_address, link_type, confidence)
- contract_events (contract_id, event_type, event_time, news_time, pre_news_flag)
- alerts (wallet_or_cluster, contract_id, signal_type, confidence, timestamp)

Polygon RPC calls needed:

Polling frequency: Every block (~2 seconds on Polygon). In practice, batch every 30 seconds to stay within free RPC tier limits.

[PANEL: Gemini 3.1 Pro] Mempool sniping: To truly win on Polymarket copy-trading, you cannot wait for the transaction to be confirmed on the Polygon blockchain. Reading the mempool (pending transactions) lets you see a whale's trade before it confirms, and pay a higher gas fee to get your transaction mined first, effectively front-running the insider. This is technically possible but ethically and legally aggressive -- treat as v2/v3 feature if ever pursued.


Method 3: Kalshi Order Book Analysis

Feasibility: 5/10

What Kalshi's API exposes:

What you CAN detect:

What you CANNOT detect:

This makes Kalshi a weaker signal source than Polymarket. You can detect THAT informed activity is happening but not WHO is doing it, so you can't build persistent profiles.

Still useful as a confirmation signal: If Polymarket whale wallet + Kalshi volume spike align on the same event, confidence in the signal doubles.


Method 4: Following the Informed Money -- The Tradeable Edge

Decision tree when informed activity is detected:

  1. Is the downstream asset liquid enough to trade?

  2. How confident is the signal?

  3. How fast do we need to act?

  4. Which venue to trade?

Realistic win rate and edge (based on analysis of Polymarket whale wallet behavior in 2024-2025):

Annualized expectation: With 50-100 tradeable signals per year and 2% average bankroll per trade at 63% win rate with 2:1 avg win/loss, this is a ~25-50% annual return on allocated capital. High variance. Expect drawdowns of 10-15% within any given quarter.

[PANEL: Gemini 3.1 Pro] Wallet poisoning risk: Whales could fake positions to trigger copy-traders, then dump. A known whale might buy a small position to pump the price via copy-traders following them, then exit their real position against the inflated price. Mitigation: only trigger on large size relative to the whale's history, and track the whale's exits in real-time. If a followed whale exits within 2 hours of entry, flag as potential poison.

[PANEL: Gemini 3.1 Pro] MEV (Maximal Extractable Value) bot risk: On-chain copy-trades can be front-run by MEV bots that see your pending transaction in the mempool. Mitigation: use private RPC endpoints (e.g., Flashbots for Polygon) to submit transactions that bypass the public mempool.

Legal considerations:


Method 5: Historical Case Studies

Case 1: Theo (Polymarket 2024 Election Whale)
Case 2: Biden Withdrawal (July 2024)
Case 3: Sports Injury Information
Case 4: FDA Decisions

Extracted detection rules from case studies:

  1. Fresh wallet (<30 days) + large bet (>$5K) + pre-news timing = HIGH confidence informed
  2. Wallet cluster (3+ wallets) + coordinated buying + single category = HIGH confidence
  3. Volume spike (>5x average) + price move (>10c) + no visible news catalyst = MEDIUM confidence
  4. Single wallet + large bet + no historical pattern = LOW confidence (could be retail whale)

Method 6: System Architecture -- Detection Pipeline

Data Collection Layer
Source Method Frequency Cost
Polymarket trades Polygon RPC eth_getLogs Every 30 seconds $0 (Alchemy free tier)
Polymarket markets REST API /markets Every 5 minutes $0
Polymarket prices REST API /prices Every 60 seconds $0
Kalshi order book WebSocket feed Real-time $0 (with account)
Kalshi trades REST API /markets/{ticker}/trades Every 60 seconds $0
News timestamps NewsAPI.org or Mediastack Every 60 seconds $0-50/mo
Twitter/X first mentions X API v2 filtered stream Real-time $100/mo (Basic tier)
Storage Schema (SQLite, can upgrade to Postgres later)
-- Prediction Market Signal Intelligence tables
-- (extends the wallet intelligence schema from Parts I-II)

CREATE TABLE pm_contracts (
    id TEXT PRIMARY KEY,          -- Polymarket condition_id or Kalshi ticker
    platform TEXT,                -- 'polymarket' or 'kalshi'
    title TEXT,
    category TEXT,                -- 'politics', 'sports', 'crypto', 'weather', 'finance', 'fda'
    resolution_date TEXT,
    resolved_outcome TEXT,
    created_at TEXT
);

CREATE TABLE pm_prices (
    contract_id TEXT,
    timestamp INTEGER,            -- unix epoch
    yes_price REAL,
    no_price REAL,
    volume_24h REAL,
    PRIMARY KEY (contract_id, timestamp)
);

CREATE TABLE pm_trades_onchain (
    tx_hash TEXT PRIMARY KEY,
    block_number INTEGER,
    timestamp INTEGER,
    wallet TEXT,
    contract_id TEXT,
    side TEXT,                    -- 'YES' or 'NO'
    amount REAL,                  -- in USDC
    price REAL
);

CREATE TABLE pm_wallets (
    address TEXT PRIMARY KEY,
    first_seen INTEGER,
    last_active INTEGER,
    total_volume REAL,
    total_trades INTEGER,
    win_rate REAL,
    score REAL,                   -- 0-100 composite
    cluster_id TEXT,
    categories TEXT,              -- JSON array of categories traded
    updated_at INTEGER
);

CREATE TABLE pm_wallet_clusters (
    cluster_id TEXT,
    wallet TEXT,
    link_type TEXT,               -- 'funding_source', 'timing_correlation', 'gas_pattern'
    confidence REAL,
    PRIMARY KEY (cluster_id, wallet)
);

CREATE TABLE pm_news_events (
    id INTEGER PRIMARY KEY,
    contract_id TEXT,
    event_description TEXT,
    wire_timestamp INTEGER,       -- first wire service mention
    social_timestamp INTEGER,     -- first tweet
    article_timestamp INTEGER,    -- first article
    pre_news_move_detected INTEGER -- boolean
);

CREATE TABLE pm_signals (
    id INTEGER PRIMARY KEY,
    signal_type TEXT,             -- 'whale_activity', 'cluster_buy', 'pre_news_move', 'volume_spike'
    contract_id TEXT,
    wallet_or_cluster TEXT,
    confidence REAL,
    detected_at INTEGER,
    downstream_assets TEXT,       -- JSON array of tickers
    action_taken TEXT,
    outcome TEXT,
    pnl REAL
);

CREATE TABLE pm_downstream_map (
    contract_pattern TEXT,        -- regex or keyword matching PM contract titles
    category TEXT,
    downstream_ticker TEXT,       -- e.g., 'FXI', 'XLE', 'USD/CNH'
    downstream_venue TEXT,        -- 'IBKR', 'OANDA', 'Kalshi', 'Polymarket'
    direction TEXT,               -- 'long' or 'short' relative to YES outcome
    notes TEXT,
    PRIMARY KEY (contract_pattern, downstream_ticker)
);
Real-Time vs Batch Processing
Detection Mode Latency Target Reason
Whale trade alert Real-time <60 seconds Need to trade before market reprices
Volume spike Real-time <60 seconds Same
Pre-news correlation Batch (hourly) ~1 hour Retrospective analysis
Wallet scoring update Batch (daily) ~24 hours Doesn't change rapidly
Cluster detection Batch (daily) ~24 hours Computationally intensive
Win rate calculation Batch (on resolution) N/A Only possible after contract resolves
Alert System (Telegram)

Alerts go to the boss's Telegram via the existing bridge infrastructure. Alert format:

SIGNAL: Whale Activity Detected
Contract: "Trump wins 2028 election" (Polymarket)
Wallet: 0xabc...def (Score: 78, Win Rate: 71%)
Action: Bought $45K YES at 42c (5 min ago)
Cluster: Part of 3-wallet cluster (avg score: 72)
Volume: 8x normal hourly volume
Downstream: Consider long DJT, short renewable ETFs
Confidence: HIGH

Priority levels:


COMBINED ASSESSMENT

Feasibility Ratings Summary

Scenario Feasibility (1-10) Edge Size Frequency Priority
T1-S1: Prediction markets --> Sportsbooks 3 Near zero Low Skip
T1-S2: Prediction markets --> DFS 5 2-5% ROI lift 3-8/week Medium
T1-S3: Political/Regulatory --> Stocks/Forex 7 2-5% per trade 8-15/year HIGH
T1-S4: Weather --> Energy 4 Tiny Rare Skip
T1-S5: Crypto PM --> Spot 2 None reliable N/A Skip
T2-M1: Pre-news detection 7 Diagnostic (not direct) Ongoing HIGH
T2-M2: Whale wallet profiling 9 8-15c per signal 50-100/year HIGHEST
T2-M3: Kalshi order book 5 Confirmation signal Ongoing Medium
T2-M4: Following informed money 8 25-50% annual Ongoing HIGH

Priority Ranking: Build These 3 First

  1. Polymarket Whale Wallet Profiling + Cluster Detection (T2-M2). This is the foundation. Without wallet intelligence, you can't follow informed money. Build the on-chain data pipeline, wallet scoring, and cluster detection first. Estimated build time: 2-3 weeks. Cost: $0-50/mo (free RPC tier + Dune free).

  2. Political/Regulatory Signal Pipeline (T1-S3). Map prediction market contracts to downstream financial instruments. Build the signal detector for tariff, election, and FDA contracts. Wire it to IBKR for equity trades and OANDA for forex. Estimated build time: 1-2 weeks after the wallet system. Cost: $0-200/mo.

  3. Informed Money Following System (T2-M4). This is the trading layer on top of #1 and #2. When whale wallet intelligence identifies a high-confidence signal, generate trade alerts and (optionally) auto-execute. Estimated build time: 1 week after #1 and #2. Cost: brokerage minimums only.

[PANEL: Grok 4 Fast] Cost-benefit ROI framework: Infrastructure runs approximately $3K/month all-in (APIs, compute, data feeds). Expected annualized profit from the top 3 priorities: ~$50K/year on moderate capital allocation. The system pays for itself within 1 month if signal quality matches backtested expectations. Paper-trade for 3 months to validate before committing real capital.


Infrastructure Cost Summary

Item Monthly Cost
Polygon RPC (Alchemy free tier) $0
Dune Analytics (free or Pro) $0-349
News API $0-50
X API (Basic) $100
IBKR account $0 (no minimums)
OANDA (already have) $0
Kalshi account (already have) $0
Polymarket monitoring (no account needed for on-chain data) $0
Total $100-499/mo

One-time: $2K for securities attorney consultation on downstream trading legality.


Risk Analysis

Risk Severity Mitigation
False signals (noise trades flagged as informed) HIGH Require multiple confirmation signals; backtest against historical data before live trading
Legal/regulatory (SEC scrutiny on informed-following) MEDIUM Attorney opinion letter; document that all data sources are public
Platform bans (Kalshi/Polymarket ban scraping) LOW Use official APIs only; stay within rate limits
Edge decay (others build similar systems) MEDIUM First-mover advantage is real; Polymarket on-chain data is underutilized today
Model risk (wallet scores are wrong) MEDIUM Paper-trade for 3 months before allocating real capital
Execution risk (can't get fills fast enough) LOW-MEDIUM Trade downstream assets (infinite liquidity) rather than prediction market contracts
[PANEL: Gemini 3.1 Pro] Wallet poisoning (whales fake positions to bait copy-traders) MEDIUM Only trigger on large size; track whale exits; if exit <2h after entry, flag as poison
[PANEL: Gemini 3.1 Pro] MEV front-running on copy-trades MEDIUM Use private RPC endpoints (Flashbots for Polygon); avoid public mempool

Desk Structure: Signal Intelligence Desk

New desk: "Signal Intelligence Desk" (SigInt). Here's why:

Structure:

Signal Intelligence Desk (SigInt)
  |-- On-Chain Module (Polymarket wallet tracking, cluster detection)
  |-- Order Book Module (Kalshi depth, volume, spread analysis)
  |-- News Correlation Module (pre-news detection, timestamp matching)
  |-- Signal Generator (combines modules, scores confidence, generates alerts)
  |-- Downstream Mapper (maps signals to tradeable instruments)

Consumer desks:


Cross-Desk Integration

Desk Signal Received Action
Kalshi Edge Bot Informed wallet bought YES on sports contract at 55c Increase confidence in that side; adjust position sizing up
Phoenix Forex Bot Tariff probability jumped 20 points on Polymarket Trigger USD/CNH review; potentially enter short CNH
Sports (manual/DFS) Player prop volume spike on Kalshi + whale sell on Polymarket Fade player in DFS; adjust prop model
Equity (manual initially) FDA approval whale cluster detected Alert boss; consider biotech stock position
Crypto Desk Weekend whale activity on BTC threshold contract Sentiment indicator for spot/futures positioning

Angles Not Covered Elsewhere

  1. Polymarket market maker profiling. Not all active wallets are directional bettors. Some are market makers. Identifying and EXCLUDING market makers from whale detection improves signal quality dramatically. Market makers show: symmetric trading (buy and sell), tight spread placement, high volume but low directional PnL.

  2. Reverse signal: when informed traders are WRONG. If a high-score wallet loses a bet, that's also information. It means their source was wrong, or the situation changed. Tracking informed-trader losses helps calibrate confidence weights.

  3. Polymarket liquidity as a sentiment indicator. When total liquidity on a contract increases sharply (both sides), it often precedes a major event. This is distinct from directional flow -- it's a volatility signal. You could trade options straddles on downstream assets when prediction market liquidity surges.

  4. Cross-platform arbitrage between Polymarket and Kalshi. When the same event is priced differently on both platforms, the platform with the informed flow (usually Polymarket for political events) is more likely correct. This is both a direct arbitrage (buy cheap side on one, sell expensive side on other) and a signal for downstream trading.

  5. International prediction markets. Metaculus, Manifold, and international betting exchanges (Betfair, Smarkets) also have informed flow. Betfair in particular has enormous liquidity on UK/European political events and publishes full order book data via API. This extends the signal surface beyond just Kalshi/Polymarket.


Part III Panel Credits

Prediction Market Signal Intelligence section reviewed by 4-model panel on Mar 15, 2026:

Sonar Reasoning Pro — Skipped (model could not answer the prompt).


Panel Credits

Base spec reviewed by 4-model panel on Mar 14, 2026:

Merged final produced by: Claude Code (Sonnet 4.6), 2026-03-15

Source: ~/edgeclaw/results/spec-panel/wallet-intelligence/spec-final.md