Stocks Desk Final Ruling — Complete Data Pipeline

Including Penny Stocks, Prediction Markets, and Advanced Metrics

Panel: Opus, Sonnet, Grok 3, Gemini 2.5 Pro (full 4/4) Judge: Opus (via synthesis) Date: 2026-03-25 Grade: A (strong consensus on all core components, excellent penny stock coverage)


The Vision

A complete stock intelligence engine covering ALL US equities — from mega-caps to penny stocks. Not just Kalshi binary bets. We want to:


1. Stock Price Data — Store EVERYTHING (ALL 4 AGREED)

Collection: Polygon Grouped Daily

Storage Tiers (Sonnet + Gemini):

Penny Stock Definition:


2. Earnings & Corporate Actions (ALL 4 AGREED)

From Polygon (included in $29/mo):

Earnings Calendar/v3/reference/tickers/{ticker}/events

Dividends/v3/reference/dividends

Stock Splits/v3/reference/splits

Analyst Ratings (Gemini)


3. Short Interest & Flow Data (ALL 4 AGREED)

Short Interest:

Flow Proxy (Gemini — clever):


4. Kalshi/Prediction Market Stock Series (ALL 4 AGREED)

Must-Track:

Cross-Venue Arb (ALL 4):

Compare Kalshi implied probability vs options-implied probability (Black-Scholes d2) for the same event. When gap > 10% after transaction costs = trade signal.


5. Advanced Metrics — THE EDGE (ALL 4 AGREED on core set)

Metric 1: Insider Cluster Score (ALL 4 — HIGHEST PRIORITY)

Already have EDGAR Form 4 data. Pure SQL computation.

Score per company (rolling 30 days):
+3 points: CEO/CFO buy
+2 points: VP/COO buy
+1 point: Director/Officer buy
+1 bonus: Buy size > 50% of current holdings
x1.5 multiplier: 3+ insiders buy within 7 days
x2.0 multiplier: Buys near 52-week low

Alert threshold: Score > 8
High-conviction threshold: Score > 12

Metric 2: Relative Volume (RVOL) Scanner (ALL 4)

RVOL = Today's Volume / 20-Day Average Volume

Alerts:
- RVOL > 3.0 + price up > 5% = "In Play" (find 90% of day's biggest movers)
- RVOL > 5.0 + penny stock = potential runner
- RVOL > 10.0 = extreme event (news, earnings, squeeze)

Metric 3: Penny Stock Breakout Probability Score (ALL 4)

Composite score combining:

Metric 4: Smart Money Velocity (Gemini + Opus)

From existing 13F data. Track top 20-30 high-performing funds.

Metric 5: Options-Prediction Market Arb Score (ALL 4)

Already building this in options-metrics.ts. Extend to all Kalshi stock markets:

arb_score = |Prob(Kalshi) - Prob(Options d2)| - transaction_costs
If arb_score > 0.10 (10%) = trade signal

Metric 6: Dilution Risk Score (Gemini — penny stock defense)

Scan SEC EDGAR 8-K and S-1/S-3 filings for keywords:

Metric 7: Sector Momentum Lag (Sonnet)

When a large-cap moves >5% in a day, find penny stocks in the same sector that haven't moved yet. The laggards often follow within 1-3 days.

Metric 8: Earnings Surprise Predictor (Gemini + Sonnet)

Before earnings, combine:


6. Penny Stock Specific Data (Gemini + Grok strongest here)

Must-Collect:

Pre-Runner Signals:

  1. Ignition Signal (Gemini): Price crosses above 50-day MA for first time in 30+ days AND RVOL > 5
  2. Consecutive momentum: 3+ days of >5% gains with increasing volume each day (Grok)
  3. Float rotation >50%: massive turnover = retail piling in
  4. Insider buy + no dilution: cluster score > 5 AND no recent S-3 filing

7. Additional Data Sources

FREE (build now):

Source What Why
SEC EDGAR RSS feed Real-time 8-K/S-3 filing alerts Penny stock catalysts + dilution detection. Free.
QuiverQuant Congress trades, govt contracts Congress members beat the market. Free tier.
Polygon reference data Float, shares outstanding, market cap Already paid for. Not collecting yet.

BOSS DECISION NEEDED:

Source Cost What Impact
Social sentiment (StockTwits/Reddit API) $0-100/mo Penny stock mention velocity HIGH for penny stocks
Benzinga Pro API ~$99/mo Breaking news, unusual activity HIGH for all stocks
FINVIZ API ~$40/mo Screener data, analyst ratings MEDIUM

SKIP for now:

Source Why Skip
Dark pool data (SqueezeMetrics) $50-100/mo, can proxy from options flow
Bloomberg $24K/yr, overkill
Satellite/alt data Expensive, narrow use case

8. Strategies Enabled

Tier 1 (Build First):

  1. Penny Stock Runner Scanner — RVOL + float rotation + breakout score = catch runners early
  2. Insider Cluster Following — buy what C-suite is buying, especially near 52-week lows
  3. Short Squeeze Engine — SI% >20% + Days to Cover >10 + catalyst = explosive
  4. Kalshi-Options Probability Arb — systematic mispricing exploitation

Tier 2 (Build After):

  1. Smart Money Mimicry — follow top 20 fund new positions from 13F
  2. Earnings Surprise Predictor — combine insider + options + prediction market signals
  3. Sector Momentum Lag — large-cap moves, penny stock follows
  4. Macro Regime Overlay — use VIX/FRED regime classifier to weight strategies

Build Order

Phase 1 — Data Collection (Week 1):

  1. stock-collector.ts — Polygon grouped daily for ALL stocks + reference data (float, market cap, shares outstanding)
  2. earnings-collector.ts — Polygon earnings calendar + actuals + dividends + splits + analyst ratings
  3. short-interest-collector.ts — Polygon daily (top 1000) + FINRA biweekly (all)
  4. Cron wiring — grouped daily 2x/day, earnings daily 7AM, short interest daily 8AM + biweekly

Phase 2 — Metrics & Scanners (Week 2):

  1. stock-metrics.ts — insider cluster score, RVOL scanner, penny stock breakout score, smart money velocity, dilution risk score, sector lag detector
  2. Cron wiring — metrics after each data collection run

Phase 3 — Cross-Venue (Week 3):

  1. Extend options-metrics.ts — add Kalshi-options arb score for stock events
  2. SEC 8-K RSS parser — real-time filing alerts for penny stocks

Architecture Note

"Penny stocks are the most mispriced securities on earth. They have the least analyst coverage, the worst information efficiency, and the highest retail participation. Every data signal matters more here because fewer people are looking." — Synthesis principle

Source: ~/edgeclaw/results/panel-results/stocks-data-final-ruling.md