UFC/MMA Desk — Data Collection Spec (Mar 14, 2026)

Full design document: /home/ubuntu/edgeclaw/UFC_MMA_Analytics_Full_Design.txt (1310 lines, SQL schema included)

What This Document Is

Summary of the UFC/MMA desk specification. The full design document (linked above) is self-contained — an AI builder can read it and build the entire module from scratch. This inventory covers what data we collect, from where, and the key differences from team sports.

The Business Model

Same as all other desks — trade on Kalshi + Polymarket, compare to Pinnacle fair value, find mispriced lines. See sports-desk-data-inventory.md for full explanation.

How UFC Is Fundamentally Different From Team Sports

Individual vs team — one "player" IS the team. No teammates to compensate.
Multiple win conditions — KO/TKO, submission, decision (unanimous/split/majority), DQ, doctor stoppage. Method of Victory markets are often MORE exploitable than moneyline.
Tiny sample sizes — fighters fight 2-4x/year, many have only 5-10 UFC fights total. Rolling averages are volatile.
Nonlinear decline — chin degradation is permanent and cumulative. Age 35 cliff below 185 lbs is a step function, not gradual.
Style geometry > raw talent — wrestler beats striker, BJJ beats wrestler, striker beats BJJ. Matchup matrix matters more than rankings.
Weight classes = separate ecosystems — all stats must be calculated WITHIN weight class.
Judging is subjective — judge bias tracking (aggressor vs control vs damage) is a real edge vector.
Cage size matters — UFC Apex 25ft cage vs standard 30ft. Smaller cage increases finish rates, favors wrestlers/pressure fighters.
Weight cuts — severity of missed weight (how many lbs over) is predictive, not just if they missed.
No season — event-driven, ~45 events/year. No standings, no playoffs.

Data We Collect

Fighter Statistics (from UFCStats.com/FightMetric — FREE)

Striking: SLpM, SApM, accuracy %, defense %, striking differential, distance/clinch/ground strikes, head/body/leg distribution, knockdowns scored/received per 15 min
Grappling: TD accuracy %, TD defense %, TDs per 15 min, control time per 15 min, submission attempts, reversals
Finishing: KO/TKO win rate, sub win rate, decision rate, finish rate, avg fight duration, KO losses (with nonlinear chin degradation penalty)
Cardio: Round-by-round output (R1 through R5), output decline %, cardio archetype (Sustainer / Gradual Decliner / Cardio Cliff)
Physical: Reach, height, leg reach, stance (orthodox/southpaw/switch), age, days since last fight
Weight: Weight class history, weigh-in weights, missed weight history + severity (lbs over), rehydration estimates
Style: Primary + secondary style tags (Boxer, Pressure Fighter, Counter Striker, Wrestler, Grappler/BJJ, Clinch Fighter, Kickboxer, Well-Rounded)
Camp: Current gym, previous gym, gym change date

All stats tracked as career averages + rolling last 3 + rolling last 5 fights, computed WITHIN weight class.

Style Matchup Matrix

Historical win rates for each style pairing within each weight class (e.g., Wrestler vs Boxer at Lightweight). Used as 10% of composite formula.

Betting Data (same pipeline as team sports)

Kalshi: ML, method (KO/Sub/Dec), rounds (O/U), props — early + closing + live 1-min
Polymarket: Same markets — cross-market arbitrage when prices diverge
Pinnacle: Sharpest book — early + closing for fair value calculation
DraftKings: Opening + closing lines, method props, round props
Odds API: Fallback for multi-book aggregation

Prediction Models (5-7 sources)

Source	Type	Access
BetMMA.tips	Community predictions with verified track records	Scrape
Tapology	Community + staff picks	Scrape
FightMatrix	ELO-based fighter rankings	Scrape
Action Network	Staff model predictions	Scrape
Bloody Elbow	Expert panel picks (5-8 writers)	Scrape
Market-implied (Pinnacle)	No-vig closing line baseline	Calculated
Internal ELO/Glicko-2	EdgeClaw model trained on UFCStats backfill	Built in-house

Judge Data (from MMADecisions.com — FREE)

Per-judge tendencies: % unanimous, % split, % favoring aggressor vs control vs damage, agreement rate with other judges, controversial decision rate.

Data Sources Summary

Source	What	Cost
UFCStats.com (FightMetric)	All fighter stats, per-round breakdowns, 67+ categories	FREE (scrape)
Sherdog.com	Fighter records, results, event history, pre-UFC record	FREE (scrape)
Tapology.com	Records, rankings, weigh-in results, regional data	FREE (scrape)
MMADecisions.com	Judge scorecards, judge tendencies, bias analysis	FREE (scrape)
BestFightOdds.com / FightOdds.io	Historical odds, line movement, multi-book comparison	FREE (scrape)
Kalshi API	Prediction market prices, volume, order book	FREE (API key)
Polymarket	Prediction market prices, volume, on-chain data	FREE
Odds API	Multi-book aggregation (Pinnacle, DK, FanDuel, BetMGM)	FREE tier
MMA Junkie / MMA Fighting	News, camp reports, injury updates, camp changes	RSS (free)

Quality Composite Formula (0-100, matchup-specific)

Component	Weight	What It Captures
Striking Differential	25%	Percentile rank of SLpM - SApM within weight class. Bonuses for reach advantage >3" and knockdown rate.
Grappling Advantage	20%	Weighted blend: 40% TD accuracy + 30% TD defense + 20% control time + 10% sub attempts
Finishing Ability	15%	Career finish rate + knockdown rate + opponent's chin vulnerability (3+ KO losses = +20)
Cardio/Pace	10%	100 minus output decline %. Bonus for championship round history. Penalty for Cardio Cliff archetype.
Durability/Chin	10%	Nonlinear KO loss penalty: 0/-15/-30/-50 for 0/1/2/3+ KO losses. Doubled if KO loss in last 2 fights.
Style Matchup	10%	Historical win rate for this style pairing in this weight class from matchup matrix.
Experience/Recency	5%	UFC fights * 5 (capped at 50). Layoff penalties: -10 short camp, -15 ring rust, -25 severe.
Situational	5%	Title fight bonus, PPV bonus, missed weight penalty, weight class change, cage size modifier, age 35 cliff (step function below 185 lbs).

Compound Boosters (stack on top of base composite)

Compound Decline (up to -15): 3+ KO losses + age 35+ + 365+ day layoff
Weight Cut Severity (up to -15): Based on lbs over limit (0.5-1 = -5, 1.1-3 = -10, 3.1+ = -15)
Camp Change: +5 for move to elite gym, -5 for leaving elite gym, -3 if change < 6 months ago
Market Resistance Filter: Downgrade by one tier when big model edge but line doesn't move for 3+ hours

Edge Tiers (stricter than team sports — smaller samples)

Tier	Composite	Model Edge	Action
S	85-100	+7% vs market	Max unit
A	70-84	+5% vs market	Standard unit
B	55-69	+3% vs market	Half unit
C	40-54	Any positive	Track only
D	0-39	N/A	Skip

Method of Victory Modeling (THE unique UFC edge)

Method markets (KO/Sub/Decision) are less efficient than moneyline because books must price three-way probabilities. Public systematically overvalues KOs and underprices decisions.

Method probability calculation:

Start with each fighter's historical method rates (win by KO/Sub/Dec)
Blend with opponent's loss profile (how they lose — KO/Sub/Dec)
Weight by sample size for stability
Apply adjustments: chin damage (+KO), TD defense <60% (+Sub), both >60% decision rate (+Dec), weight class finish rate priors, cardio disparity, reach advantage
Compare to DraftKings method prop prices — edge = model prob minus market implied

Two approaches specified: sample-size weighted blend vs geometric mean. Both to be backtested; use whichever performs better.

Round-by-Round Modeling

Cardio curves (output per round) feed into:

P(finish in round N) = base finish rate * cardio multipliers * cumulative damage factor
P(goes to decision) = 1 - sum of all round finish probabilities
Over/under 1.5, 2.5, 3.5, 4.5 rounds pricing from these probabilities
The 2.5 round market in title fights is often mispriced (market anchors on 3-round base rates)

Live Tracking (1-minute intervals)

Kalshi + Polymarket: Every ML, method, rounds, and prop contract — price + volume + liquidity, every 1 minute during fights
Track through entire fight (no halftime cutoff like team sports)
Round breaks are natural recalibration points
After knockdowns, prices swing 10-30 cents — 1-minute capture catches the aftermath

Database

Full SQL schema (PostgreSQL, 16 tables) in the design document. No Google Sheets. Tables: weight_classes, fighters, events, fights, fight_results, judges, bout_judges, fighter_fight_stats, fighter_round_stats, fighter_rolling_stats, fighter_weight_history, style_matchup_matrix, betting_odds, predictions, calculated_edges, live_tracking.

Implementation Roadmap (12 weeks, 6 phases)

Phase 1 (Weeks 1-3): SQL schema, UFCStats scraper, Sherdog/Tapology scrapers, historical backfill (2015+), ETL for rolling stats
Phase 2 (Weeks 3-5): Odds API, Kalshi + Polymarket scrapers, Pinnacle historical, live tracking pipeline (1-min), betting odds snapshots
Phase 3 (Weeks 5-7): Prediction ensemble scrapers, internal ELO/Glicko-2 model, Brier scoring
Phase 4 (Weeks 7-9): Composite formula, percentile rankings per weight class, style classification, matchup matrix, edge calculation, MoV model, round-by-round model
Phase 5 (Weeks 9-12): Automation, fight card detection, pre-event reports, post-event grading, judge alerts, closing odds automation
Phase 6 (Ongoing): Composite tuning, inverse-MSE weighting, camp change pipeline, prop bet scanner, weight class specific tuning, MoV approach backtest

Source: ~/.claude/projects/-home-ubuntu-edgeclaw/memory/ufc-desk-data-inventory.md