Player Props Desk Build Template

Purpose: This document is a universal template for building a new player props desk in EdgeClaw. An AI reading this should know exactly what to build, how to wire it up, and what questions to ask before starting.

Reference implementations: MLB Player Props Desk (docs/mlb-player-props-desk-construction-spec.md), MLB Desk (docs/mlb-desk-construction-spec.md)

The Pipeline Pattern

Every player props desk follows the same 4-table pipeline:

Table 1: Kalshi Prop Data (soft book — what we trade against)
    ↓
Table 2: Anchor Book Prop Data (sharp book — our fair value reference)
    ↓
Table 3: Prop Probability Curves (book curve + model curve vs Kalshi price)
    ↓
Table 4: Prop Edge Scanner Output (mispricings found)

The rule: Each table is sport-specific. No mixing data across sports. No mixing props data with game-level data. Each props desk gets its own isolated databases, separate from the game desk for the same sport.

Table 1 — Kalshi Prop Data

What: Raw player prop contract prices pulled from the Kalshi API.

Database: kalshi-{sport}-props.db Table name: kalshi_{sport}_props

Standard columns:

Column	Type	Description
ticker	TEXT	Full Kalshi ticker (e.g., KXMLBHR-26APR07-A.BREGMAN-O1)
player_name	TEXT	Parsed player name (standardized)
prop_type	TEXT	What stat: pitcher_strikeouts, batter_hits, batter_home_runs, etc.
threshold	REAL	The line (e.g., 6.5 strikeouts, 1.5 hits)
yes_bid	INTEGER	Cents (0-100)
yes_ask	INTEGER	Cents (0-100)
yes_exec	REAL	Midpoint of yes bid/ask
no_bid	INTEGER	Cents (0-100)
no_ask	INTEGER	Cents (0-100)
no_exec	REAL	Midpoint of no bid/ask
spread	INTEGER	Ask minus bid
volume	INTEGER	Contracts traded
scan_type	TEXT	Collection window: 6am, 8am, 10am, 2pm, 6pm, close
snapshot_type	TEXT	scheduled / event_triggered / closing
captured_at	TEXT	ISO timestamp

Data quality rules:

Same-day only — only store today's props
No settled prices — filter out bid <= 5 or bid >= 95
No live/in-game data — skip props for games that have already started
Tag every row with scan_type
Low-confidence flag: volume = 0 AND spread > 30 cents
Use a MINIMUM 1-cent filter (not 5 — too aggressive, drops valid alt lines)

Player name parsing: Kalshi tickers encode player names in abbreviated format (e.g., "A.BREGMAN"). You need a parser to extract this and a crosswalk table to map it to the full name used by other sources. This is critical — without it, you can't match Kalshi props to anchor book props.

Table 2 — Anchor Book Prop Data

What: Sharp book prop lines that serve as our fair value reference.

For player props desks, the anchor is typically FanDuel (sharpest prop book). SBR multi-book lines serve as alternative reference for cross-validation.

Database: fd-{sport}-props.db (FanDuel) Table names: fd_{sport}_prop_lines

Standard columns:

Column	Type	Description
player_name	TEXT	Full player name as listed by the book
market	TEXT	Base prop type: pitcher_strikeouts, batter_hits, batter_home_runs, etc.
threshold	REAL	The rung: 1 (1+), 2 (2+), 3 (3+), etc.
side	TEXT	Yes (always for FD direct data)
price	INTEGER	American odds
implied_prob	REAL	Implied probability for Yes (0-1)
no_implied_prob	REAL	Implied probability for No = 1 - implied_prob
line	REAL	Raw handicap from API (0 for batter props, not used)
event_id	TEXT	FD event ID
home_team	TEXT
away_team	TEXT
game_date	TEXT	YYYY-MM-DD
scan_type	TEXT	Collection window
captured_at	TEXT	ISO timestamp

Key insight from MLB build: FanDuel's direct API gives you a full prop ladder as separate Yes/No markets at each threshold. "To Record A Hit" (1+), "To Record 2+ Hits" (2+), etc. The FD price IS the implied probability at each threshold — no distribution math needed for the book side. Store with market = batter_hits and threshold = 1, 2, 3, 4 — NOT as separate market types per threshold.

FD Direct API pattern: sbapi.{state}.sportsbook.fanduel.com/api/event-page?_ak={key}&eventId={id}&tab=batter-props (and tab=pitcher-props). No auth needed. Returns markets with runners, each runner has American odds. Use tab=batter-props and tab=pitcher-props per event.

Cross-validation rule: If FanDuel and SBR multi-book consensus both offer the same prop and their de-vigged probabilities agree within 2%, that edge gets tagged HIGH confidence. If they disagree by more than 10%, tag it LOW confidence.

Table 3 — Prop Probability Curves

What: For each player, for each prop type, for each game, build TWO probability curves and compare both against every Kalshi threshold:

Book curve — Derived from FanDuel's de-vigged prop ladder. FD gives you the probability at each threshold directly. De-vig their over/under prices to get true probability.
Model curve — Derived from the desk's own player model. This uses player baselines (EWMA of recent performance), matchup adjustments (opponent quality, venue factors), and a statistical distribution to generate an independent probability at each threshold.

Both curves are compared against Kalshi's price at each threshold. When either curve says a Kalshi contract is mispriced by more than the fee (typically 7%), that's an edge.

Database: {sport}-prop-edges.db Table name: {sport}_prop_probability_curves

Standard columns:

Column	Type	Description
scan_type	TEXT	Collection window
game_date	TEXT	YYYY-MM-DD
game_time	TEXT	HH:MM ET
player_name	TEXT	Standardized player name
player_id	TEXT	Crosswalk ID
prop_type	TEXT	pitcher_strikeouts, batter_hits, etc.
threshold	REAL	The alt-line value (e.g., 5.5, 6.5, 7.5 for strikeouts)
fd_anchor	REAL	FanDuel's posted line for this prop
fd_yes	REAL	FD de-vigged probability of Over at this threshold
fd_no	REAL	FD de-vigged probability of Under at this threshold
model_yes	REAL	Model probability of Over at this threshold
model_no	REAL	Model probability of Under at this threshold
kalshi_yes	REAL	Kalshi exec price for YES
kalshi_no	REAL	Kalshi exec price for NO
rung	INTEGER	0 = main line, positive = further from 50/50
is_main_line	INTEGER	1 if this threshold is the book's headline line (closest to 50/50), 0 otherwise
actual_stat	REAL	What actually happened (filled by settlement after game ends)
outcome	TEXT	"over" or "under" relative to this row's threshold
fd_was_right	INTEGER	1 if FD's implied probability favored the correct side, 0 if not
model_was_right	INTEGER	1 if the model favored the correct side, 0 if not
fd_error	REAL	How far off FD was (positive = underconfident on correct side, negative = wrong side)
model_error	REAL	Same for model
settled_at	TEXT	Timestamp when result was recorded
captured_at	TEXT	ISO timestamp

Distribution Selection for Model Curve

The model curve needs a statistical distribution to convert a player baseline into probabilities at each threshold. The right distribution depends on the stat type:

General rules:

Stat characteristic	Distribution	Why
Counting stats, low mean (0-2 range)	Poisson or Zero-Inflated Poisson	Many zeros, rare events (home runs, stolen bases)
Counting stats, medium mean (2-8 range)	Negative Binomial	Overdispersed counts — variance > mean (strikeouts, hits)
Sum of multiple stats	Normal approximation	Central limit theorem (H+R+RBI, PRA)
High-count stats (20+ range)	Normal	Large enough for Normal to work (points, fantasy score)

MLB examples (for reference):

Prop type	Distribution	Parameters
Pitcher strikeouts	Negative Binomial	mean from EWMA, dispersion from game log variance
Batter hits	Negative Binomial	mean from EWMA
Batter home runs	Zero-Inflated Poisson	lambda from per-PA rate × projected PA
Batter total bases	Negative Binomial	mean from EWMA
H+R+RBI combo	Normal	mean and sigma from component sums
Stolen bases	Zero-Inflated Negative Binomial	Very rare event, many zeros
Runs scored	Poisson	Low-count event
Pitcher outs	Normal	High enough count for Normal

Model Inputs

The model curve is built from player-specific baselines adjusted for matchup context. Standard inputs:

Player baselines:

EWMA (Exponentially Weighted Moving Average) of recent performance — weights recent games more
Career stats as Bayesian prior — stabilizes projections for players with small sample sizes
Bayesian shrinkage blends career and season: blended = (career × k + season × n) / (k + n) where k varies by stat (HR needs ~170 PA to stabilize, K needs ~60 PA, Hits needs ~800 PA)
Player-specific variance/CV (coefficient of variation) — some players are consistent, others are boom-bust

Matchup adjustments (sport-specific):

Opponent quality (e.g., opposing pitcher K-rate for batter K props)
Venue/park factors
Referee/umpire tendencies
Platoon splits (left vs right matchups)
Fatigue / rest / travel
Lineup position / projected playing time

The adjustment formula:

adjusted_rate = blended_rate × opponent_factor × venue_factor × other_factors

Cap adjustments to reasonable range (e.g., 0.75 to 1.30) to prevent extreme outputs.

Table 4 — Prop Edge Scanner Output

What: The final output — player prop mispricings found by comparing Kalshi prices against both the book curve and model curve.

Database: {sport}-prop-edges.db (same DB as curves) Table name: player_prop_edges

Standard columns:

Column	Type	Description
player_name	TEXT
prop_type	TEXT	pitcher_strikeouts, batter_hits, etc.
side	TEXT	over / under
line	REAL	Kalshi threshold
anchor_line	REAL	FD's posted line
fd_prob	REAL	FD de-vigged fair probability
sbr_prob	REAL	SBR multi-book consensus probability (cross-validation)
model_prob	REAL	Model probability
kalshi_price	REAL	What Kalshi is offering
execution_price	REAL	What you'd actually pay
raw_edge_book	REAL	fd_prob - kalshi_price
raw_edge_model	REAL	model_prob - kalshi_price
net_edge	REAL	Edge after fee (7% Kalshi fee)
confidence_tier	TEXT	HIGH / MEDIUM / LOW
executable	BOOLEAN	Spread tight enough to trade?
distribution_type	TEXT	Which distribution was used
player_sigma	REAL	Player-specific variance parameter
scan_type	TEXT	Collection window
detected_at	TEXT	ISO timestamp
actual_outcome	TEXT	win / loss / push (filled after settlement)
settled_at	TEXT
closing_price	REAL	For CLV tracking
clv	REAL	Closing line value

Confidence tier rules:

HIGH: FD and SBR consensus agree within 2%, edge > 10%
MEDIUM: FD only (no SBR cross-validation), or FD+SBR disagree by 2-10%
LOW: FD and SBR disagree by > 10%, or only model edge (no book edge)

Steam Detection

In addition to edges, track sharp line movement:

Table: {sport}_prop_steam

Signals to detect:

Anchor book line move > 0.5 units within 1 hour
FD + SBR multi-book move in same direction (consensus signal)
Prop removed entirely (player scratched or lineup change)
Prop added (player confirmed in lineup)

Scheduling

Everything runs on Eastern Time (ET).

Layer 1 — Prop Data Collection (top of hour)

Minute	What fires	Cron pattern
:00	FanDuel prop lines pull	`0 6,8,10,14,18 * * *`
:00	Kalshi prop contracts pull	`0 6,8,10,14,18 * * *`
:00	Game-level anchor (Pinnacle) for matchup context	`0 6,8,10,14,18 * * *`

Game-day windows: 6 AM, 8 AM, 10 AM, 2 PM, 6 PM ET Closing snapshot: 1 minute before each game's start time (staggered per game)

Layer 2 — Player Stats & Baselines (daily morning)

Three staggered groups:

Group	Time	What runs
Group 1 — Raw Data	9:00 AM	Stats API pulls: player game logs, season stats, external scrapes
Group 2 — Baselines	9:05 AM	EWMA baselines, career stats, blended rates, player crosswalk
Group 3 — Derived	9:10 AM	Matchup context, player variance, adjusted rates, correlations

Layer 3 — Curves & Edge Detection (after data lands)

Minute	What fires	Cron pattern
:03	Steam detection (compare consecutive snapshots)	`3 6,8,10,14,18 * * *`
:10	Prop probability curves rebuild	`10 6,8,10,14,18 * * *`
:12	Prop edge scanner	`12 6,8,10,14,18 * * *`

Critical: Same rule as game desks — edge scanner runs AFTER curves, curves run AFTER data collection. The timing chain is: data (:00) → steam (:03) → curves (:10) → edges (:12).

Layer 4 — Season Props & Awards (adaptive)

If the sport has season-long player props or award futures:

Phase	Frequency	When to transition
Phase 1	Weekly (Monday)	Start of season → ~2 months before end
Phase 2	Every 3 days	~2 months before end → ~1 month before end
Phase 3	Daily	~1 month before end → season end
Expired	Stop scanning	After season ends

Database: kalshi-{sport}-prop-futures.db (season props), kalshi-{sport}-awards.db (awards)

Dashboard Wiring

desk-config.ts

{
  name: '{Sport} Player Props',
  slug: '{sport}-player-props',
  kalshi: [{
    category: 'Sports / {Category} / {Sport} Props',
    hasScraper: true,
    scraperName: 'collector.ts (Kalshi sports cron)',
    freshnessKey: 'kalshi-{sport}-props',
    series: [
      { ticker: 'KXSPORTPROPTYPE', label: 'Human Label', dataViewKey: 'kalshi-{sport}-prop-type' },
      // one per Kalshi prop series
    ],
  }],
  sources: [
    // Sources organized by group — see Standard Groups below
  ],
}

Standard Groups for Player Props Desks

Every player props desk should have these groups on the dashboard:

Edge Detection — Prop edge scanner results (per prop type), steam detection, edge summary
Game-Day Props — FanDuel prop lines, SBR multi-book lines, Kalshi player props, anchor book (game context), prop probability curves (book + model vs Kalshi per prop type)
Season Props — Season-long player prop futures (if applicable)
Awards — Individual award futures (if applicable)
Player Data (sport-specific subgroups) — Player game logs, baselines, splits, career stats, crosswalk
Game Context — Lineups/rosters, schedule, venue factors, weather, official tendencies
Computed Analytics — Blended rates, matchup-adjusted rates, player variance, correlations, matchup context

source-tables.ts

Same pattern as game desks. Every freshnessKey needs a mapping:

'freshnessKey': {
  db: 'database-name',
  tables: ['table_name'],
  filter?: { column: 'col', value: 'val' },
}

Naming convention for freshnessKeys:

kalshi-{sport}-props — all Kalshi props for this sport
kalshi-{sport}-prop-{type} — filtered by prop type (hr, hits, ks, etc.)
fd-{sport}-props — FanDuel prop lines
prop-edge-{sport}-{type} — edges per prop type
{sport}-prop-curves-{type} — probability curves per prop type
{sport}-prop-model — player baselines and model data
{sport}-prop-matchup-context — matchup adjustments
{sport}-batter-baselines / {sport}-player-baselines — player stat baselines
{sport}-blended-rates — Bayesian shrinkage output
{sport}-adjusted-rates — matchup-adjusted rates

scheduler.ts

Same pattern as game desks. Register each cron job, update freshness after each run.

Freshness Thresholds

Source type	Yellow (stale)	Red (alert)
FD prop lines (6/8/10/2/6)	30 min after window	90 min after window
Kalshi props (6/8/10/2/6)	30 min after window	90 min after window
Season props/awards (adaptive)	2 days after expected	7 days after expected
Prop edge scanner	30 min after window	90 min after window
Player baselines (9 AM)	60 min after expected	180 min after expected
Matchup context (9 AM)	60 min after expected	180 min after expected
Steam detection	60 min after last signal	180 min after last signal

Shared Data (Read-Only from Parent Game Desk)

Player props desks READ from the parent game desk's databases but don't write to them. This gives props access to game-level context without duplicating scrapers.

What the props desk reads from the game desk:

Schedule & lineups (to know which players are active)
Opponent quality metrics (SP matchup, defensive ratings)
Venue/park factors
Weather
Official/referee/umpire tendencies

The props desk does NOT duplicate these scrapers. It just reads the tables that the game desk already populates.

Isolation Checklist

#	Item	What to check
1	Databases isolated	Props in own .db files, separate from game desk
2	Scraper queue independent	Props scrapers don't depend on game desk timing
3	Recovery queue	Props scrapers in recovery queue
4	Freshness tracking	Every props source has freshnessKey on dashboard
5	Dashboard views	Every freshnessKey resolves to working view
6	Edge scanner	Props scanner separate from game scanner
7	Scan windows	Every row tagged with scan_type
8	Data cleanliness	No live, no settled, same-day for game-day props
9	Column formatting	Human-readable on dashboard
10	Filters	Column filters on all views
11	No cross-desk dependencies	Props desk works even if game desk scraper fails
12	Player name crosswalk	Names standardized across all sources
13	Alerts	Freshness alerts for all props sources
14	Season props isolated	Separate DB from game-day props
15	Awards isolated	Separate DB from game-day props
16	Probability curves	Both book and model curves, per prop type
17	Matchup context	Daily refresh wired to Group 3 cron

Things to Watch Out For

Hard-won lessons from building the MLB Player Props desk. Read these before starting any new props desk.

Player name mismatch is the #1 problem. Kalshi uses abbreviated tickers ("A. BREGMAN"), FanDuel uses full names ("Alex Bregman"), and stats APIs use yet another format. Without a crosswalk table that maps names across ALL sources, you can't match Kalshi props to FD props, and your curves table will have nulls everywhere. Build the crosswalk FIRST. Handle edge cases: Jr./Sr. suffixes, accented characters, misspellings (Kalshi had "SUREZ" for Suarez), nicknames.
Wrong scanner wired to cron. The MLB prop cron was calling the generic scanner (scanPropEdges('mlb') which read from an empty player_props table) instead of the dedicated MLB scanner (scanMlbPropEdges() which read from mlb_prop_lines). The edges table was empty for days. Always test the cron manually after wiring.
Filter too aggressive on minimum price. Initial 5-cent minimum filter was dropping valid alt lines. Lowered to 1 cent. Be conservative with data-side filters — you can always filter tighter on the dashboard.
Kalshi API pagination. Some sports have 5+ pages of prop contracts. The initial pull only fetched page 1 and missed most data. Always paginate fully. Add 500ms+ delays between pages to avoid rate limits.
FD price IS the probability (for book curve). FanDuel gives you a full ladder of over/under lines at different thresholds. You de-vig each line and that's your book probability. No distribution math needed for the book side. The distribution math is only for the MODEL curve.
FanDuel direct API vs Odds API — use the direct API. The Odds API only returned 3 prop types for MLB FanDuel (stolen bases, strikeouts, pitcher outs). FanDuel's own public API (sbapi.il.sportsbook.fanduel.com/api/event-page) returns 25 prop types including all hits, HR, TB, RBI, runs, doubles, triples, singles, combos, and alt strikeout lines. No auth needed — just the public _ak key. The Odds API is a backup, not the primary source. Build a direct scraper first.
FD stores one market name per prop type with a threshold column. FanDuel's data uses a single market value per prop type (e.g. batter_hits, pitcher_strikeouts, batter_total_bases) with a separate threshold column (1, 2, 3, 4, etc.) for each rung. Do NOT create separate market names per threshold like batter_hits_1plus, batter_hits_2plus — that creates 25 market types instead of 11 and makes filtering/grouping impossible. When building probability curves, query by market and group by threshold. The line column is always 0 for props — the real value is in threshold.
Filter out combined player props. FanDuel offers combined pitcher props like "Zac Gallen & Freddy Peralta" for total strikeouts between both starters. Skip any player name containing " & " — there's no model rate for pairs, Kalshi doesn't offer combined markets, and they produce meaningless curve rows.
Store both Yes and No implied probabilities. MLB props on Kalshi have both Yes and No contracts. Store implied_prob (Yes) and no_implied_prob (1 - Yes) for FD data. For Kalshi, store yes_bid, yes_ask, no_bid (= 100 - yes_ask), no_ask (= 100 - yes_bid). The edge scanner needs both sides to find the best trade.
FD runner name parsing is tricky. For batter props, the runner name IS the player name ("Alex Bregman"). For pitcher K alt lines, the runner name includes the threshold ("Gavin Williams 3+ Strikeouts") — strip the "3+ Strikeouts" part. For pitcher K O/U, the runner name includes Over/Under ("Mike Burrows Over") — if you don't catch this, "Mike Burrows Over" becomes a player name. Handle all three cases in the parser.
Edge scanner must run AFTER curves. Curves must run AFTER data collection. The timing chain is data (:00) → curves (:10) → edges (:12). If the scanner runs at :00, it reads yesterday's curves.
Staggered morning schedule matters. Raw stats (Group 1, 9:00) → baselines (Group 2, 9:05) → derived metrics & matchup context (Group 3, 9:10). If you compute matchup adjustments before baselines are updated, you get yesterday's matchup context.
Live data leaks in without explicit checks. Always check game start time. Skip props for games that have already started. Filter settled contracts (bid <= 5 or bid >= 95). This isn't just about cleanliness — live data corrupts your curves and produces fake edges.
Kalshi API rate limits are aggressive. Kalshi returns 429 (too many requests) if you hit their API too fast. When scanning multiple series in a loop, use at least 2-second delays between calls. The collector already handles this with sleep between series, but one-off scripts and new scrapers need it too. If you're scanning 15+ series (like league leaders), expect it to take 30+ seconds. Don't retry 429s immediately — wait and try on the next cron cycle. The cadence system handles this automatically for scheduled runs.

Questions to Ask Before Building

An AI starting a new player props desk MUST ask and get answers for these before writing any code:

Props-Specific

What Kalshi prop series exist for this sport? (e.g., KXMLBHR, KXMLBHIT, KXNBAPTS, KXNHLSOG)
What prop types does Kalshi offer? (strikeouts, hits, HR, points, assists, rebounds, shots on goal, saves, etc.)
What thresholds does Kalshi typically offer per prop type? (e.g., HR: 1+, 2+; K: 4.5, 5.5, 6.5, 7.5, etc.)
Are there combo props? (H+R+RBI for MLB, PRA for NBA, etc.)

Anchor Book

Which book is the sharp anchor for this sport's props? (FanDuel for most, but confirm)
Which book is the cross-validation? (SBR multi-book for alt reference)
Does the anchor book offer the same prop types as Kalshi?
How do you get the anchor book's data? (Direct API, Odds API, scrape?)
Does the Odds API cover this sport's props? (It has monthly quota limits)

Player Model

What player stats are available from official APIs? (Game logs, season stats, advanced metrics)
What external data sources exist? (Statcast for MLB, PBP for NBA, etc.)
What's the right distribution for each prop type in this sport? (See distribution selection guide above)
What matchup adjustments matter for this sport? (Opponent defense, venue, referee, rest, platoon, etc.)
What's the Bayesian shrinkage k-value for each stat? (How many games/PA/possessions until season data outweighs career?)
Are there sport-specific factors that change prop distributions? (Pace for NBA, power play for NHL, surface for tennis, etc.)

Roster & Lineup

How do you know which players are active? (Confirmed lineups, injury reports, game-time decisions)
When are lineups typically confirmed? (Affects when props become reliable)
Are there position-specific quirks? (Goalies in NHL, starting pitchers in MLB, etc.)

Season Props & Awards

Does Kalshi offer season-long player props? (Season HR totals, season point totals, etc.)
Does Kalshi offer individual awards? (MVP, Cy Young, DPOY, etc.)
What are the season dates for adaptive scheduling?

Dashboard

What groups should appear on the dashboard? (Use standard groups as starting point)
What prop-type sub-views are needed? (One filtered view per prop type)
Are there sport-specific data groups to add? (Pitcher Data and Batter Data for MLB, etc.)

Document Control

This template is updated as new desks are built and new lessons are learned. The "Things to Watch Out For" section grows with every desk.

Last updated: 2026-04-08

Changelog

Running log of changes and decisions for the player props pipeline. Each entry has two sections:

Changes — what was done technically (bugs, fixes, new features, files touched)
Boss Notes — what the boss wants, likes, doesn't like, and why he made certain decisions. This is the vision for the system. Follow it.

If you're building this from scratch, read every Boss Notes section first — that's how the boss thinks about this system.

2026-04-08 — Threshold bug fix (curves showing 0 for all thresholds)

Changes:

The probability curves table had threshold = 0 for all pitcher strikeout rows, and batter prop rows were mostly missing entirely. Three bugs:

Curve builder was reading fdData.line (always 0) instead of fdData.threshold (the actual number like 5, 6, 7). Fix: use fdData.threshold - 0.5 for exceedance calculation.
Dedupe keyed by market name only — for pitcher strikeouts, every threshold shares the same market name, so only 1 threshold per pitcher was kept. Fix: key by market + threshold.
PROP_CONFIGS listed market names that don't exist (batter_hits_1plus, pitcher_alt_strikeouts, etc.). Actual FD data uses single market names (batter_hits, pitcher_strikeouts) with a threshold column. Fix: updated all 5 prop type configs.

Also filtered out combined pitcher props ("Zac Gallen & Freddy Peralta") — useless for curves (no model rate for pairs, Kalshi doesn't offer them).

Result: 424 broken rows → 4,257 correct rows. Lessons #7 and #8 in "Things to Watch Out For" rewritten.

Files: src/pipeline/data/scrapers/mlb-prop-probability-curves.ts

Boss Notes:

Wants clean individual player names only — no combined/pair entries. If FanDuel offers "Player A & Player B" combos, skip them.
Reviews data by eyeballing dashboard columns at /data-status/view/. If a column looks wrong (all zeros, weird names, missing data), he'll flag it. The dashboard is his primary QA tool.
Asked about merging fd-mlb-props with probability curves into one DB. Decided against it after learning they're pipeline stages — raw input should stay separate from calculated output. Don't merge pipeline stages.
Wants these changelogs maintained so a future builder can replicate everything in one pass without asking questions.

2026-04-08 — Settlement system + main line tracking + beat-the-books scorecard

Changes:

Built MLB prop settlement and added tracking columns to the probability curves table.

New columns on mlb_prop_probability_curves:

is_main_line — flags which threshold is FD's headline line (closest to 50/50 implied probability)
actual_stat — what actually happened (e.g., 7 strikeouts)
outcome — "over" or "under" relative to each threshold
fd_was_right / model_was_right — who called the correct side
fd_error / model_error — how far off each probability was from reality
settled_at — when result was recorded

New file: src/pipeline/data/scrapers/settle-mlb-props.ts

Checks MLB Stats API for completed games, fetches box scores directly
Settles both curves table (beat-books tracking) and prop edges table (beat-Kalshi tracking) in one pass
Maps 5 prop types to box score fields: pitcher K, batter hits, HR, TB, H+R+RBI composite

Cron: 0 15,17,19,21,23,1 * * * (every 2h from 3PM-1AM ET) — settles as games finish, not waiting until morning.

Schema update added to ensureTable() with migration logic (ALTERs if columns missing).

Updated Table 3 standard schema in this template with all settlement columns.

Files: src/pipeline/data/scrapers/mlb-prop-probability-curves.ts, src/pipeline/data/scrapers/settle-mlb-props.ts (new), src/cron/scheduler.ts

Boss Notes:

The probability curves database is for beating the books. The edge scanner database is for beating Kalshi. Two different purposes, both need settlement, but they measure different things.
Wants FD's main line flagged (is_main_line = 1) so he can filter the dashboard to just headline lines and see the scorecard at a glance. Not a value on every row — just a flag on the one threshold closest to 50/50.
Wants settlement to happen as games finish (every 2 hours 3PM-1AM), not in a morning batch. No reason to sit on stale data overnight.
Wants this built into every desk template so all future sports desks get settlement tracking from day one.
Decided against a separate fd_main_threshold column — redundant, just filter by is_main_line = 1 and the threshold column tells you what it is.

2026-04-08 — Name normalization fix (42 → 14 missing players)

Changes:

Added normalizeName() to the curve builder and settlement function. Strips accents (ñ→n, é→e), removes Jr./Sr. suffixes, lowercases. Applied on both sides — when building the model rates map and when looking up rates for FD players. Same normalizer added to settlement for matching box score names.

Players like "Ronald Acuña Jr." (model) now match "Ronald Acuna Jr." (FanDuel). Missing model data went from 42 players to 14. Remaining 14 are genuinely not in the model (rookies, bench players, or unusual name formats like "C.J. Abrams", "Max P. Muncy").

Files: src/pipeline/data/scrapers/mlb-prop-probability-curves.ts, src/pipeline/data/scrapers/settle-mlb-props.ts

Boss Notes:

Spotted the missing data by eyeballing the dashboard (dashes in edge_yes column). Wants the fix done immediately, not deferred.
Doesn't care about the remaining 14 missing players — they're bench/rookie guys. Not worth building a full crosswalk table for right now.

2026-04-08 — Dashboard 125% display bug

Changes:

The dashboard's percentage formatter was multiplying edge_yes/edge_no by 100, but these columns are already stored as 0-100 percentages. A value of 1.25 (meaning 1.25%) was displayed as 125%. Removed edge_yes and edge_no from the explicit ×100 formatting list in data-view.ts. These columns only exist in probability curves tables, which all store values as 0-100.

Files: src/pipeline/data-status/data-view.ts

Boss Notes:

Caught this by eyeballing the home runs curves page. Dashboard is his primary QA tool — if numbers look wrong, they probably are.

2026-04-10 — SP hand, platoon splits, batter Statcast, settlement fixes

Changes:

SP throwing hand wired into tier2 derived metrics — mlb-tier2-metrics.ts now reads actual SP hand from sp_baselines and passes it through to matchup context. Was missing, so platoon adjustments had no hand to work with.
Batter platoon splits wired into matchup context — mlb-prop-matchup-context.ts now reads individual batter splits (vs LHP/RHP) from mlb-batting.db and computes a platoon_factor. Team-level platoon splits moved off the props desk (too noisy).
Platoon factor wired into prop edge scanners — Both mlb-prop-edge-scanner.ts and mlb-props-edge-scanner.ts now apply the platoon factor to adjust model probability before computing edges.
Individual batter platoon splits — Replaced team-level splits with per-batter splits from batter_game_logs. Column naming fixed: vsSplit→statSplits, LHP/RHP→L/R.
Player-level batter Statcast table — New savant_batter_stats table in mlb-batting.db. Scrapes Statcast leaderboard for barrel%, exit velo, hard hit%, xBA, xSLG, etc. Replaced old team batting scraper that was pulling wrong data.
Settlement fixes for mlb-edges.db — final_total, result, and settlement cron now work against mlb-edges.db (was only running against research-pipeline.db, which was empty for MLB).
Dashboard fixes — Platoon splits view reads from correct DB (mlb-pitching.db). Cron schedules fixed to match actual run times.

Files: mlb-tier2-metrics.ts, mlb-prop-matchup-context.ts, mlb-prop-edge-scanner.ts, mlb-props-edge-scanner.ts, scrape-baseball-savant.ts, data-view.ts, desk-config.ts, source-tables.ts, scheduler.ts

Boss Notes:

Boss wants platoon adjustments at the individual batter level, not team level. Team averages are too noisy — a switch hitter's platoon split is nothing like a pure lefty's.
Settlement was broken for MLB because the cron only ran against research-pipeline.db. Boss caught it by looking at the dashboard and seeing empty result columns.

2026-04-08 — Kalshi MLB league leader markets (15 series)

Changes:

Added all 15 KXLEADERMLB series to the collector, dual-writer, and dashboard. New DB: kalshi-mlb-leaders.db with kalshi_mlb_leaders table. Columns: ticker, leader_type, player_name, yes_bid, yes_ask, volume, snapshot_type, captured_at. Initial scan pulled 951 contracts across 13 active series (Saves and Batter Strikeouts have 0 markets).

Series: HR, hits, RBI, runs, steals, doubles, triples, batting avg, OPS, ERA, pitcher wins, saves, pitcher K, batter K, WAR.

Dashboard: new "League Leaders" group with filtered views for HR, hits, ERA, WAR.

Adaptive schedule via existing cadence system: weekly when 3+ months out, every 3 days at 1-3 months, daily in final month.

Added lesson #9 to "Things to Watch Out For": Kalshi API rate limits — 2-second delays between series scans.

Files: src/pipeline/sports/collector.ts, src/pipeline/sports/mlb-dual-write.ts, src/pipeline/mlb-db.ts, src/pipeline/data-status/source-tables.ts, src/pipeline/data-status/desk-config.ts, scripts/scan-mlb-leaders.ts (new)

Boss Notes:

Spotted the missing league leader markets on the Kalshi website and wanted all 11 visible categories tracked immediately — not "add it to the backlog."
Wants the first scan to happen now, not wait for the next scheduled cron cycle. Built a standalone script for manual scans.
Hit Kalshi rate limits during the first scan — noted that we need 2-second delays between series. This applies to all future scrapers.
The adaptive schedule (weekly → daily) is fine for season-long markets. Boss didn't ask for more frequent than that.

Source: ~/edgeclaw/docs/desk-template-player-props.md