This is the complete data collection specification for the WNCAAB (Women's College Basketball) desk. An AI builder should be able to read this and know exactly what data to collect, from where, how often, and why.
We trade on Kalshi and Polymarket — prediction markets where you buy/sell contracts priced 0-100 cents. We find mispriced lines by comparing prediction market prices to Pinnacle (the sharpest traditional sportsbook). When Kalshi/Polymarket prices are wrong relative to Pinnacle's fair value, we buy the cheap side.
WNCAAB has thinner markets than NCAAB. Fewer traders, wider spreads, slower price updates. This means mispricings are often LARGER when they exist, but liquidity is lower and Pinnacle may not always offer lines.
Only bet when Pinnacle anchor exists. No Pinnacle line = no reliable fair value = no trade.
WNCAAB is 2-way only (no draw). Standard multiplicative de-vig applies to all markets.
Pull EVERY line offered for every game — every alternate spread, every alternate total, every player prop.
What to pull per market/line:
Two snapshots: Early (morning ~11 AM ET) and Closing (~10 min before game).
What to pull:
How to get Pinnacle data:
Note: Pinnacle coverage for WNCAAB is limited. Not all games will have lines. Only proceed with edge detection when Pinnacle anchor exists.
After each game: final score, total points, winner, margin, ATS results, O/U results. All derived metrics calculated from score + stored lines.
After each Kalshi snapshot, all alt lines for a game are grouped by market type (spread, total) and converted into a probability curve. Less alt line coverage for WNCAAB, but still valuable when available.
What gets stored per curve: Sport, game key, home/away teams, market type (spread or total), snapshot type (early/closing), array of threshold values, array of implied probabilities, number of points, mean probability, curve slope.
DB Table: market_implied_curves (in research-pipeline.db)
Frequency: Runs automatically after every Kalshi snapshot (every 30 minutes).
Minimum: 3+ alt lines required to form a curve.
Game state + all Kalshi/Polymarket prices every minute during live games. No Pinnacle live tracking.
Thinner than NCAAB. Monitor star players on major games only.
DraftKings salaries + ownership projections when WNCAAB slates are offered.
| Category | Stats | Source |
|---|---|---|
| Efficiency | Adjusted offensive efficiency (AdjOE), adjusted defensive efficiency (AdjDE), adjusted tempo | HerHoopStats + BartTorvik WNCAAB |
| Ratings | Power ratings, SOS (strength of schedule) | HerHoopStats + BartTorvik |
| Four Factors | eFG%, ORB%, TOV%, FTR + defensive versions | HerHoopStats |
| Situational | Overall/conference/home-away win% | ESPN + calculated |
| Model | Source |
|---|---|
| HerHoopStats | herhoopstats.com (scrape) |
| BartTorvik WNCAAB | barttorvik.com (scrape) |
| ESPN rankings | ESPN API (when available) |
Fewer models available than NCAAB. Average what's available, exclude NULL silently.
How teams perform in the final 5 minutes of close games (within 5 points). College women's basketball has significant variance here — free throw shooting under pressure is especially variable.
| Stat | What It Measures |
|---|---|
| Clutch scoring differential | Points scored minus points allowed in final 5 min of close games |
| Clutch turnover rate | Turnovers per possession in clutch situations |
| Clutch foul rate | Fouls committed in clutch situations |
| Clutch FT% | Free throw shooting in clutch situations |
| Clutch win rate | Win % when game is within 5 points with 5 min left |
Source: ESPN API play-by-play data (free). Filter for score margin <= 5 with <= 5:00 remaining.
Frequency: Once daily, calculated from rolling season game logs.
DB Table: wncaab_clutch_stats
Why it creates edge: Same principle as NCAAB — teams that choke late cover tight spreads but fail on fat alt lines. Thinner WNCAAB markets make this even more exploitable because fewer traders are watching.
Track prop line movement from FanDuel. Less applicable to WNCAAB due to thin prop markets, but monitor star players in major games.
Monitor breaking injury/lineup news faster than prediction markets react.
Real-time X monitoring for injury leaks and lineup news.
Track momentum indicators during live games: scoring runs, foul trouble, pace variance.
Full bid/ask depth from Kalshi/Polymarket. Especially important for WNCAAB where liquidity is thinner.
Calculate scoring volatility. WNCAAB has larger talent gaps = more extreme outcomes = more fat tail opportunities.
Track travel, timezone crossings, schedule density.
Model full distribution of scoring margins. WNCAAB talent gaps make fat tails more common.
Break down scoring by half. Some teams are strong first-half starters, others are strong closers.
| Source | What | Access |
|---|---|---|
| Kalshi API | ML, spread, O/U, all alt lines, player props, volume | API key (have it) |
| Polymarket | Event markets, prices, volume | Free API |
| ESPN API | Scores, game state, play-by-play, BPI | Free, no key |
| Odds API | Pinnacle odds (fallback only) | API key, rate-limited |
| Source | What |
|---|---|
| Pinnacle | Sharp lines (when available for WNCAAB) |
| HerHoopStats | Efficiency ratings, team stats |
| BartTorvik | WNCAAB team rankings |
| DraftKings | Opening/current lines |
| Data Type | Frequency | Source |
|---|---|---|
| Team statistics | Once daily | HerHoopStats, BartTorvik, ESPN |
| Model predictions | Once daily | HerHoopStats, BartTorvik, ESPN |
| Injury/lineup | Continuous | ESPN API, team feeds |
| Results | After games | ESPN API + settlement |
Pinnacle: When available. Check Odds API. Kalshi + Polymarket (FREE): Every 30 minutes from lines open until game time.