UFC/MMA Desk — Data Collection Spec (Mar 14, 2026)
Full design document: /home/ubuntu/edgeclaw/UFC_MMA_Analytics_Full_Design.txt (1310 lines, SQL schema included)
What This Document Is
Summary of the UFC/MMA desk specification. The full design document (linked above) is self-contained — an AI builder can read it and build the entire module from scratch. This inventory covers what data we collect, from where, and the key differences from team sports.
The Business Model
Same as all other desks — trade on Kalshi + Polymarket, compare to Pinnacle fair value, find mispriced lines. See sports-desk-data-inventory.md for full explanation.
How UFC Is Fundamentally Different From Team Sports
- Individual vs team — one "player" IS the team. No teammates to compensate.
- Multiple win conditions — KO/TKO, submission, decision (unanimous/split/majority), DQ, doctor stoppage. Method of Victory markets are often MORE exploitable than moneyline.
- Tiny sample sizes — fighters fight 2-4x/year, many have only 5-10 UFC fights total. Rolling averages are volatile.
- Nonlinear decline — chin degradation is permanent and cumulative. Age 35 cliff below 185 lbs is a step function, not gradual.
- Style geometry > raw talent — wrestler beats striker, BJJ beats wrestler, striker beats BJJ. Matchup matrix matters more than rankings.
- Weight classes = separate ecosystems — all stats must be calculated WITHIN weight class.
- Judging is subjective — judge bias tracking (aggressor vs control vs damage) is a real edge vector.
- Cage size matters — UFC Apex 25ft cage vs standard 30ft. Smaller cage increases finish rates, favors wrestlers/pressure fighters.
- Weight cuts — severity of missed weight (how many lbs over) is predictive, not just if they missed.
- No season — event-driven, ~45 events/year. No standings, no playoffs.
Data We Collect
Fighter Statistics (from UFCStats.com/FightMetric — FREE)
- Striking: SLpM, SApM, accuracy %, defense %, striking differential, distance/clinch/ground strikes, head/body/leg distribution, knockdowns scored/received per 15 min
- Grappling: TD accuracy %, TD defense %, TDs per 15 min, control time per 15 min, submission attempts, reversals
- Finishing: KO/TKO win rate, sub win rate, decision rate, finish rate, avg fight duration, KO losses (with nonlinear chin degradation penalty)
- Cardio: Round-by-round output (R1 through R5), output decline %, cardio archetype (Sustainer / Gradual Decliner / Cardio Cliff)
- Physical: Reach, height, leg reach, stance (orthodox/southpaw/switch), age, days since last fight
- Weight: Weight class history, weigh-in weights, missed weight history + severity (lbs over), rehydration estimates
- Style: Primary + secondary style tags (Boxer, Pressure Fighter, Counter Striker, Wrestler, Grappler/BJJ, Clinch Fighter, Kickboxer, Well-Rounded)
- Camp: Current gym, previous gym, gym change date
All stats tracked as career averages + rolling last 3 + rolling last 5 fights, computed WITHIN weight class.
Style Matchup Matrix
Historical win rates for each style pairing within each weight class (e.g., Wrestler vs Boxer at Lightweight). Used as 10% of composite formula.
Betting Data (same pipeline as team sports)
- Kalshi: ML, method (KO/Sub/Dec), rounds (O/U), props — early + closing + live 1-min
- Polymarket: Same markets — cross-market arbitrage when prices diverge
- Pinnacle: Sharpest book — early + closing for fair value calculation
- DraftKings: Opening + closing lines, method props, round props
- Odds API: Fallback for multi-book aggregation
Prediction Models (5-7 sources)
| Source |
Type |
Access |
| BetMMA.tips |
Community predictions with verified track records |
Scrape |
| Tapology |
Community + staff picks |
Scrape |
| FightMatrix |
ELO-based fighter rankings |
Scrape |
| Action Network |
Staff model predictions |
Scrape |
| Bloody Elbow |
Expert panel picks (5-8 writers) |
Scrape |
| Market-implied (Pinnacle) |
No-vig closing line baseline |
Calculated |
| Internal ELO/Glicko-2 |
EdgeClaw model trained on UFCStats backfill |
Built in-house |
Judge Data (from MMADecisions.com — FREE)
Per-judge tendencies: % unanimous, % split, % favoring aggressor vs control vs damage, agreement rate with other judges, controversial decision rate.
Data Sources Summary
| Source |
What |
Cost |
| UFCStats.com (FightMetric) |
All fighter stats, per-round breakdowns, 67+ categories |
FREE (scrape) |
| Sherdog.com |
Fighter records, results, event history, pre-UFC record |
FREE (scrape) |
| Tapology.com |
Records, rankings, weigh-in results, regional data |
FREE (scrape) |
| MMADecisions.com |
Judge scorecards, judge tendencies, bias analysis |
FREE (scrape) |
| BestFightOdds.com / FightOdds.io |
Historical odds, line movement, multi-book comparison |
FREE (scrape) |
| Kalshi API |
Prediction market prices, volume, order book |
FREE (API key) |
| Polymarket |
Prediction market prices, volume, on-chain data |
FREE |
| Odds API |
Multi-book aggregation (Pinnacle, DK, FanDuel, BetMGM) |
FREE tier |
| MMA Junkie / MMA Fighting |
News, camp reports, injury updates, camp changes |
RSS (free) |
Quality Composite Formula (0-100, matchup-specific)
| Component |
Weight |
What It Captures |
| Striking Differential |
25% |
Percentile rank of SLpM - SApM within weight class. Bonuses for reach advantage >3" and knockdown rate. |
| Grappling Advantage |
20% |
Weighted blend: 40% TD accuracy + 30% TD defense + 20% control time + 10% sub attempts |
| Finishing Ability |
15% |
Career finish rate + knockdown rate + opponent's chin vulnerability (3+ KO losses = +20) |
| Cardio/Pace |
10% |
100 minus output decline %. Bonus for championship round history. Penalty for Cardio Cliff archetype. |
| Durability/Chin |
10% |
Nonlinear KO loss penalty: 0/-15/-30/-50 for 0/1/2/3+ KO losses. Doubled if KO loss in last 2 fights. |
| Style Matchup |
10% |
Historical win rate for this style pairing in this weight class from matchup matrix. |
| Experience/Recency |
5% |
UFC fights * 5 (capped at 50). Layoff penalties: -10 short camp, -15 ring rust, -25 severe. |
| Situational |
5% |
Title fight bonus, PPV bonus, missed weight penalty, weight class change, cage size modifier, age 35 cliff (step function below 185 lbs). |
Compound Boosters (stack on top of base composite)
- Compound Decline (up to -15): 3+ KO losses + age 35+ + 365+ day layoff
- Weight Cut Severity (up to -15): Based on lbs over limit (0.5-1 = -5, 1.1-3 = -10, 3.1+ = -15)
- Camp Change: +5 for move to elite gym, -5 for leaving elite gym, -3 if change < 6 months ago
- Market Resistance Filter: Downgrade by one tier when big model edge but line doesn't move for 3+ hours
Edge Tiers (stricter than team sports — smaller samples)
| Tier |
Composite |
Model Edge |
Action |
| S |
85-100 |
+7% vs market |
Max unit |
| A |
70-84 |
+5% vs market |
Standard unit |
| B |
55-69 |
+3% vs market |
Half unit |
| C |
40-54 |
Any positive |
Track only |
| D |
0-39 |
N/A |
Skip |
Method of Victory Modeling (THE unique UFC edge)
Method markets (KO/Sub/Decision) are less efficient than moneyline because books must price three-way probabilities. Public systematically overvalues KOs and underprices decisions.
Method probability calculation:
- Start with each fighter's historical method rates (win by KO/Sub/Dec)
- Blend with opponent's loss profile (how they lose — KO/Sub/Dec)
- Weight by sample size for stability
- Apply adjustments: chin damage (+KO), TD defense <60% (+Sub), both >60% decision rate (+Dec), weight class finish rate priors, cardio disparity, reach advantage
- Compare to DraftKings method prop prices — edge = model prob minus market implied
Two approaches specified: sample-size weighted blend vs geometric mean. Both to be backtested; use whichever performs better.
Round-by-Round Modeling
Cardio curves (output per round) feed into:
- P(finish in round N) = base finish rate * cardio multipliers * cumulative damage factor
- P(goes to decision) = 1 - sum of all round finish probabilities
- Over/under 1.5, 2.5, 3.5, 4.5 rounds pricing from these probabilities
- The 2.5 round market in title fights is often mispriced (market anchors on 3-round base rates)
Live Tracking (1-minute intervals)
- Kalshi + Polymarket: Every ML, method, rounds, and prop contract — price + volume + liquidity, every 1 minute during fights
- Track through entire fight (no halftime cutoff like team sports)
- Round breaks are natural recalibration points
- After knockdowns, prices swing 10-30 cents — 1-minute capture catches the aftermath
Database
Full SQL schema (PostgreSQL, 16 tables) in the design document. No Google Sheets. Tables: weight_classes, fighters, events, fights, fight_results, judges, bout_judges, fighter_fight_stats, fighter_round_stats, fighter_rolling_stats, fighter_weight_history, style_matchup_matrix, betting_odds, predictions, calculated_edges, live_tracking.
Implementation Roadmap (12 weeks, 6 phases)
- Phase 1 (Weeks 1-3): SQL schema, UFCStats scraper, Sherdog/Tapology scrapers, historical backfill (2015+), ETL for rolling stats
- Phase 2 (Weeks 3-5): Odds API, Kalshi + Polymarket scrapers, Pinnacle historical, live tracking pipeline (1-min), betting odds snapshots
- Phase 3 (Weeks 5-7): Prediction ensemble scrapers, internal ELO/Glicko-2 model, Brier scoring
- Phase 4 (Weeks 7-9): Composite formula, percentile rankings per weight class, style classification, matchup matrix, edge calculation, MoV model, round-by-round model
- Phase 5 (Weeks 9-12): Automation, fight card detection, pre-event reports, post-event grading, judge alerts, closing odds automation
- Phase 6 (Ongoing): Composite tuning, inverse-MSE weighting, camp change pipeline, prop bet scanner, weight class specific tuning, MoV approach backtest
Source: ~/.claude/projects/-home-ubuntu-edgeclaw/memory/ufc-desk-data-inventory.md