UFC/MMA Data Collection Desk — Merged Final Spec

Status: MERGED FINAL — Panel-reviewed, Opus as base, unique additions cherry-picked from all other models Source: UFC_MMA_Analytics_Full_Design.txt (primary), panel review from Opus, Gemini 3.1 Pro, Grok 4.1 Fast, Grok 4.1, DeepSeek V3.1, Sonar Reasoning Pro Date: March 2026 Purpose: Data collection spec for EdgeClaw's UFC/MMA Sports Desk. Feeds the Sports Desk analyst. Primary data source: UFCStats.com (FightMetric backend). Output format: SQLite (WAL mode).


Table of Contents

  1. System Context & Architecture
  2. Why UFC/MMA Is Fundamentally Different
  3. UFC-Specific Statistics (The Predictive Engine)
  4. Data Sources (APIs, Scrapers, Cost Tiers)
  5. Prediction Model Sources (Ensemble Inputs)
  6. Full SQL Database Schema (SQLite, 16+ Tables)
  7. Quality Composite Formula (0-100 Score)
  8. Edge Tier Classification (S/A/B/C/D)
  9. Method of Victory (MoV) Modeling
  10. Round-by-Round Modeling & Cardio Curves
  11. Live Tracking Architecture
  12. Closing Odds & Results Grading
  13. Implementation Roadmap (10 Steps, Build Order)
  14. Collection Cadence & Storage Estimates
  15. Key Queries (SQL)
  16. Implementation Checklist
  17. Critical Flaws & Mitigations (Panel)
  18. Over-Engineered Items (Defer/Simplify)
  19. Cross-Desk Reusable Components
  20. What Would Make This A+ (Panel)
  21. Appendices

1. System Context & Architecture

EdgeClaw is an automated, multi-sport betting analytics platform. It currently runs four sport modules: NHL, NBA, NCAAB (college basketball), and MLB. Each module tracks ~200-215 data points per game/matchup using the same core architecture:

SHARED ARCHITECTURE (all sports):

INFRASTRUCTURE:


2. Why UFC/MMA Is Fundamentally Different

Every design decision flows from these structural realities:

INDIVIDUAL vs TEAM: In team sports, regression to the mean is your friend — an injured star alters the line, but the team still functions. In MMA, the "team" is a single biological entity. There are no teammates to compensate.

MULTIPLE WIN CONDITIONS: Team sports have one outcome type. MMA has KO/TKO, submission, decision (unanimous/split/majority), DQ, doctor stoppage, draw, and no contest. This creates METHOD OF VICTORY markets that are often MORE exploitable than moneyline because they require three-way probability modeling that books struggle to price efficiently.

TINY SAMPLE SIZES: An NBA team plays 82+ games/season. A UFC fighter fights 2-4 times per year. Many fighters have only 5-10 UFC fights total. Career stats are noisy, rolling averages are volatile, and style matchups matter more than raw aggregate skill.

STEEP NONLINEAR DECLINE: Fighter decline is not gradual like team sports. Chin degradation is permanent and cumulative — once a fighter has been KO'd, they are statistically more likely to be KO'd again. Below 185 lbs (Middleweight), there is a sharp statistical cliff near age 35. [See Section 7.5 for improved decay model from panel review.]

STYLE GEOMETRY > RAW TALENT: A great wrestler often beats a great striker, but a great BJJ fighter often beats a great wrestler, and a great striker often beats a great BJJ fighter. The matchup matrix is more predictive than individual rankings.

WEIGHT CLASSES = SEPARATE ECOSYSTEMS: Heavyweight (265 lbs) is essentially a different sport than Flyweight (125 lbs). All statistics and composite scores must be calculated WITHIN weight class.

JUDGING IS SUBJECTIVE: Unlike team sports, MMA decisions depend on three judges with known biases — some favor aggression/damage, others favor takedowns/control time. Judge assignment is a real edge vector.

CAGE SIZE MATTERS: The UFC Apex facility uses a 25-foot diameter cage. Standard arenas use a 30-foot cage. The smaller Apex cage dramatically increases finish rates and favors wrestlers/pressure fighters who can cut off the cage.

WEIGHT CUTS: Fighters dehydrate to make weight, then rehydrate. Severe weight cuts impair performance. Missed weight history, moving up/down divisions, and rehydration weight are all predictive variables.

NO SEASON: The UFC runs ~45 events per year, roughly weekly. Each card is independent. Data ingestion is event-driven, not season-driven.

FIRST 5 / ROUND PROPS: Over/under rounds, exact round finish, and "fight to go the distance" (GTD) props are major UFC betting markets with significant inefficiency.


3. UFC-Specific Statistics to Track

All stats derived from UFCStats.com (FightMetric). Stored as both per-fight snapshots and rolling career averages (career, last 3, last 5 fights). All rolling stats computed WITHIN weight class.

3.1 Striking (~40% of predictive variance)

Metric Source Predictive Value
Sig. strikes landed/min (SLpM) UFCStats High — primary offensive output
Sig. strikes absorbed/min (SApM) UFCStats High — defensive efficiency
Striking accuracy % UFCStats Medium — quality over volume
Striking defense % UFCStats High — evasiveness, longevity
Striking differential (SLpM - SApM) Calculated VERY HIGH — single best striking metric
Distance strikes landed % UFCStats Medium — range management
Clinch strikes landed % UFCStats Medium — infighting style
Ground strikes landed % UFCStats Medium — ground-and-pound capability
Head strikes landed % UFCStats High — KO probability proxy
Body strikes landed % UFCStats Medium — body work wears down opponent
Leg strikes landed % UFCStats Medium — mobility degradation
Knockdowns scored per 15 min UFCStats VERY HIGH — best KO predictor
Knockdowns received per 15 min UFCStats High — chin durability
First strike landed % Calculated Medium — fast starter indicator

[PANEL: Opus] Sig. strike accuracy BY DISTANCE (range/clinch/ground breakdown): A fighter who is 55% accurate at range but 30% in the clinch has a completely different profile than one who is 40%/50%. This directly feeds the style matchup matrix. UFCStats already provides this breakdown per fight — parse at the range level rather than aggregating. Priority: MUST-HAVE.

3.2 Grappling (~25-30% of predictive variance)

Metric Source Predictive Value
Takedown accuracy % UFCStats High — offensive wrestling success
Takedown defense % UFCStats VERY HIGH — keeps fight in preferred range
Takedowns landed per 15 min UFCStats High — wrestling volume
Control time per 15 min (sec) UFCStats High — top position dominance
Submission attempts per 15 min UFCStats Medium — submission threat level
Reversals per 15 min UFCStats Medium — scramble ability
Time in guard vs mount vs back Calculated High — positional quality of control
Clinch control time Calculated Medium — clinch fighting ability

[PANEL: Grok 4.1 Fast] Opponent-specific historical performance: Fighters perform differently against similar styles (e.g., a wrestler might have 70% TD success vs strikers but 40% vs grapplers). Calculate from existing UFCStats per-fight breakdowns by filtering fights by opponent style tags — no new scraping needed. Priority: MUST-HAVE.

3.3 Finishing Ability & Durability

Metric Source Predictive Value
KO/TKO win rate (career) UFCStats VERY HIGH — primary MoV input
Submission win rate (career) UFCStats VERY HIGH — primary MoV input
Finish rate (non-decision %) Calculated High — over/under rounds input
Decision rate Calculated High — inverse of finish rate
Average fight duration (sec) Calculated High — rounds prop pricing
Times knocked down (career) UFCStats High — chin durability proxy
KO/TKO losses (count) UFCStats VERY HIGH — chin degradation (nonlinear)
Absorbed strikes before finish Calculated High — damage tolerance trend
Late finish rate (Round 3+) Calculated Medium — cardio-dependent finishing

3.4 Pace, Cardio & Physical Attributes

Metric Source Predictive Value
Output by round (strikes/min R1-R5) UFCStats VERY HIGH — cardio curve modeling
Output decline % (R1 to final) Calculated VERY HIGH — cardio cliff detection
Championship round performance Calculated High — title fight specific
Reach advantage (inches) UFCStats High — structural striking edge
Height advantage (inches) UFCStats Medium — correlated with reach
Leg reach (inches) UFCStats Medium — kicks/distance management
Age at fight time Calculated High — decline curve input
Days since last fight Calculated High — ring rust / recovery
Weight class change history Scraped High — weight cut impact
Missed weight history Scraped Medium — discipline / stress
Stance (orthodox/southpaw/switch) UFCStats Medium — stylistic matchup factor

[PANEL: Gemini 3.1 Pro] Stance Matchup Matrix (Southpaw vs Orthodox): The spec tracks stance, but not the matchup. Open stance (Southpaw vs Orthodox) fundamentally changes striking geometry (body kicks, lead hand control) and significantly alters SLpM/SApM baselines. Calculated from existing UFCStats data. Priority: MUST-HAVE.

3.5 Style Classification

Each fighter is tagged with a primary and secondary style based on their statistical profile:

Style Tag Definition Key Indicators
Boxer Strikes at distance High SLpM, high distance %, low TD attempts
Pressure Fighter High-volume forward pressure Very high SLpM, high SApM, low striking defense
Counter Striker Low volume, high accuracy Low SLpM, high accuracy %, high defense %
Wrestler Takedown-centric control High TD attempts, high control time, low SLpM
Grappler/BJJ Submission-focused High sub attempts, moderate TDs, finish via sub
Clinch Fighter Inside fighting / dirty boxing High clinch strikes %, high clinch control time
Kickboxer Diverse striking with kicks High distance %, balanced head/body/leg strikes
Well-Rounded No dominant dimension Balanced across all categories

A style_matchup_matrix stores historical win rates for each style pairing within each weight class (e.g., Wrestler vs Boxer at Lightweight has a known edge historically).

[PANEL: Opus] Style classification should be CONTINUOUS, not categorical: Replace categorical tags with a continuous style vector — [striking_offense, striking_defense, wrestling_offense, wrestling_defense, submission_offense, submission_defense, clinch_work, pressure_rate] where each dimension is 0-100 based on actual stats. The matchup matrix then operates on vector distances rather than category lookups. This eliminates the subjective "who decides the tag" problem. Implement as v2 after categorical tags are working.

3.6 Cardio Archetypes

Based on output decline percentage from Round 1 to final round:

Cardio Cliff fighters are where over/under rounds and late-round finish props are most exploitable.

[PANEL: DeepSeek V3.1] Output under duress: Current round-by-round output doesn't account for opponent pressure. Track "output under pressure" flag — strikes output when opponent has attempted a takedown or is in chase mode. Source: FightMetric/UFCStats granular data if available. Tag fights where one fighter was significantly ahead on output vs. when being pressured.


4. Data Sources

4.1 Fighter Stats & Career Data (Primary)

Source Data Access Cost
UFCStats.com (FightMetric) Official UFC stats: strikes, TDs, control time, per-round breakdowns, 67+ categories Scrape (HTML) FREE
Sherdog.com Fighter records, results, event history, pre-UFC record Scrape (HTML) FREE
Tapology.com Fighter records, rankings, event cards, weigh-in results, regional/pre-UFC data Scrape (HTML) FREE
ESPN MMA / UFC.com Fighter profiles, confirmed cards, official rankings Scrape/RSS FREE

[PANEL: Opus] UFCStats.com scraping is the SINGLE POINT OF FAILURE. Its HTML structure changes periodically and it has no API. MUST build secondary scraper for Sherdog/Tapology capturing at minimum: records, results, basic striking/grappling. Cache aggressively — once a fight's stats are scraped, they never change. Implement hash-based change detection on page structure to alert on breaking changes within minutes, not days.

[PANEL: DeepSeek V3.1] UFCStats uses Cloudflare protection. Budget for rotating residential proxies (~$120/month) if you hit blocks. Rate-limit to 1 req/sec, exponential backoff on 429/503.

4.2 Advanced Metrics & Judging

Source Data Access Cost
MMADecisions.com Judge scorecards, judge-by-judge tendencies, split decision history, bias analysis Scrape FREE
Verdict MMA Community global scorecards, fan consensus vs judges API/Scrape FREE tier
FightOdds.io Historical odds, line movement Scrape FREE
BestFightOdds.com Multi-book odds comparison, historical closing lines Scrape FREE

[PANEL: Sonar Reasoning Pro] MMADecisions.com may be incomplete or poorly maintained for recent fights. This is a single-point dependency for judge data. Add: (1) UFC.com scorecards (scrape if possible), (2) backup from MMAJunkie if they republish scores, (3) manual spot-check for data freshness on build start.

4.3 Betting Data (Same Pipeline as Team Sports)

Source Data Access Cost
The Odds API Multi-book aggregation: DK, Pinnacle, FanDuel, BetMGM REST API FREE tier (500 req/mo)
Pinnacle Sharpest book. Early + closing moneyline, method, rounds Historical API FREE
Kalshi Prediction market: ML, method, rounds. Candlestick API for historical closing prices REST API FREE (account)
Polymarket Prediction market: event markets, prices, volume, on-chain order data Free API FREE (no account)
DraftKings Opening moneyline, method, round, props Manual/Scrape FREE (account)

[PANEL: Grok 4.1 Fast] Free tier of Odds API limits to 500 calls/month. Upgrade to paid ($10-20/mo) or build custom scrapers for Pinnacle/DK as primary; treat Odds API as backup only.

[PANEL: Opus] Betting line movement history / steam moves: Track line movement velocity, not just opening and closing. Scrape BestFightOdds every 6 hours pre-fight week, every 1 hour fight week, every 15 min post-weigh-in. Store as time series in odds_line_movement table. The Market Resistance Filter requires this underlying movement data to function. Priority: MUST-HAVE.

4.4 Situational & News

Source Data Access Cost
Sherdog / Tapology Confirmed fight cards, weigh-in results, cancellations Scrape FREE
MMA Junkie / MMA Fighting News, camp reports, sparring reports, injury updates RSS/Scrape FREE
RotoWire / FantasyPros Injury reports, fight previews Scrape FREE

[PANEL: Opus] Fighter injury / medical suspension history: UFC mandates medical suspensions after fights (30-180 days). A fighter returning from a 180-day medical suspension for a broken orbital bone is materially different from routine 30-day. The spec tracks "days since last fight" but not WHY the layoff occurred. Source: State athletic commission reports (public, ABC MMA website), MMA Junkie injury RSS, UFC event medical suspension lists. Priority: MUST-HAVE.

[PANEL: Opus] Late replacements / short-notice fights: Short-notice replacements (<4 weeks camp) are one of the most exploitable edges in UFC betting. The spec mentions "short camp" as a penalty but doesn't track WHO replaced whom, notice period in days, or whether the replacement was moving up/down in weight. Source: Sherdog/Tapology event pages (scrape bout history changes), MMA Junkie news RSS. Cross-reference announced date vs event date to calculate camp length. Priority: MUST-HAVE.

[PANEL: Sonar Reasoning Pro] Pre-fight press conference quotes and opponent game plan signals provide directional signal for method props and round overs that historical style win rates miss. Scrape pre-fight quotes from MMA Junkie, The Athletic. Process with basic intent extraction. Priority: MUST-HAVE.

[PANEL: Opus] Pre-UFC / regional record quality adjustment: A 10-0 fighter whose wins came against 2-5 regional opponents is fundamentally different from a 10-0 fighter who beat ranked Bellator/ONE fighters. Apply Bayesian shrinkage toward weight class average for fighters with <5 UFC bouts. Source: Sherdog full records, Tapology regional promotion quality tiers. Priority: NICE-TO-HAVE.


5. Prediction Model Sources (Ensemble Inputs)

Unlike team sports (Sagarin, Massey, DRatings, Dimers), MMA has LIMITED free prediction sources. Target: 5-7 independent sources minimum.

Source Type Outputs Access
BetMMA.tips Community prediction platform with verified track records Win %, method %, crowd consensus Scrape
Tapology.com predictions Community + staff picks Win %, method predictions Scrape
FightMatrix.com ELO-based fighter rankings Implied win prob from ELO diff Scrape
Action Network UFC Staff model predictions Win %, best bets Scrape
Bloody Elbow staff picks Expert panel picks (5-8 writers) Win % from pick consensus Scrape
MMABettingOdds.com / FightOdds.io Market-implied from closing lines No-vig probability baseline Scrape
Internal ELO/Glicko-2 model EdgeClaw internal model trained on UFCStats backfill Win probability from rating diff Built in-house

[PANEL: Opus] Remove Bloody Elbow from prediction sources: Bloody Elbow's editorial staff has turned over significantly and the site's reliability as a prediction source is low. Individual writer picks from media outlets have poor calibration and no accountability. Replace with: BetMMA.tips (which has verified, tracked records) and Tapology community consensus. Two well-calibrated external sources plus Pinnacle closing line plus the internal model is sufficient. Adding noisy sources degrades the ensemble.

The ensemble average (model_avg_prob) is compared to Pinnacle closing no-vig probability to find edge. After 100+ fights of tracking, switch from equal-weight averaging to inverse-MSE weighting (better models get more weight based on Brier score accuracy).

[PANEL: Opus] The internal ELO/Glicko-2 model should NOT be blended naively with the composite. The model's predictions are partially derived from the same inputs as the composite — this creates circular dependency. Either: (1) use ELO/Glicko-2 as a REPLACEMENT for the composite, or (2) keep the composite as the primary model and use external predictions only for calibration/validation.


6. Full SQL Database Schema (SQLite with WAL Mode)

SQLite setup note: Enable WAL mode immediately on DB creation: PRAGMA journal_mode=WAL;. This allows concurrent reads during writes. All 16 tables below use SQLite-compatible types (INTEGER, REAL, TEXT, BLOB) instead of PostgreSQL ENUMs.

[PANEL: Gemini 3.1 Pro] Fighter entity disambiguation: Implement unique fighter IDs (UFCStats/Tapology/Sherdog crosswalk) immediately. Always use IDs, never fighter names, as join keys. Fighter name collisions and misspellings will corrupt data if not handled at schema level.

PRAGMA journal_mode=WAL;
PRAGMA foreign_keys=ON;

-- ============================================================
-- 6.1 weight_classes
-- ============================================================
CREATE TABLE IF NOT EXISTS weight_classes (
    weight_class_id  INTEGER PRIMARY KEY AUTOINCREMENT,
    name             TEXT NOT NULL UNIQUE,
    upper_limit_lbs  REAL NOT NULL,
    gender           TEXT DEFAULT 'male' CHECK(gender IN ('male','female'))
);
-- Seed:
-- Strawweight 115 female, Flyweight 125 male/female, Bantamweight 135 male/female,
-- Featherweight 145 male, Lightweight 155 male, Welterweight 170 male,
-- Middleweight 185 male, Light Heavyweight 205 male, Heavyweight 265 male

-- ============================================================
-- 6.2 fighters
-- ============================================================
CREATE TABLE IF NOT EXISTS fighters (
    fighter_id        INTEGER PRIMARY KEY AUTOINCREMENT,
    -- Cross-platform IDs (entity disambiguation)
    ufcstats_id       TEXT UNIQUE,            -- UFCStats internal fighter ID
    sherdog_id        TEXT,                   -- Sherdog URL slug
    tapology_id       TEXT,                   -- Tapology URL slug
    -- Identity
    name              TEXT NOT NULL,
    nickname          TEXT,
    date_of_birth     TEXT,                   -- ISO 8601 date
    nationality       TEXT,
    home_city         TEXT,                   -- [PANEL: Opus] for travel distance calc
    -- Physical
    height_inches     REAL,
    reach_inches      REAL,
    leg_reach_inches  REAL,
    stance            TEXT CHECK(stance IN ('orthodox','southpaw','switch')),
    -- Style (categorical for v1, continuous vector for v2)
    primary_style     TEXT,
    secondary_style   TEXT,
    -- Style vector (v2 — leave NULL for now)
    style_striking_offense    REAL,           -- 0-100
    style_striking_defense    REAL,
    style_wrestling_offense   REAL,
    style_wrestling_defense   REAL,
    style_submission_offense  REAL,
    style_submission_defense  REAL,
    style_clinch_work         REAL,
    style_pressure_rate       REAL,
    -- Camp / Training
    team_gym          TEXT,
    previous_gym      TEXT,
    gym_change_date   TEXT,                   -- ISO 8601 date
    is_elite_gym      INTEGER DEFAULT 0,      -- 1 = top camp (ATT, CKB, Sanford, etc.)
    -- Record aggregates
    wins              INTEGER DEFAULT 0,
    losses            INTEGER DEFAULT 0,
    draws             INTEGER DEFAULT 0,
    no_contests       INTEGER DEFAULT 0,
    ko_wins           INTEGER DEFAULT 0,
    sub_wins          INTEGER DEFAULT 0,
    dec_wins          INTEGER DEFAULT 0,
    ko_losses         INTEGER DEFAULT 0,
    sub_losses        INTEGER DEFAULT 0,
    dec_losses        INTEGER DEFAULT 0,
    -- UFC-specific record
    ufc_wins          INTEGER DEFAULT 0,
    ufc_losses        INTEGER DEFAULT 0,
    ufc_debut_date    TEXT,
    last_fight_date   TEXT,
    is_active         INTEGER DEFAULT 1,
    -- Source URLs
    sherdog_url       TEXT,
    ufcstats_url      TEXT,
    tapology_url      TEXT,
    -- Timestamps
    created_at        TEXT DEFAULT (datetime('now')),
    updated_at        TEXT DEFAULT (datetime('now'))
);

CREATE INDEX IF NOT EXISTS idx_fighters_name ON fighters(name);
CREATE INDEX IF NOT EXISTS idx_fighters_active ON fighters(is_active);
CREATE INDEX IF NOT EXISTS idx_fighters_ufcstats_id ON fighters(ufcstats_id);

-- ============================================================
-- 6.3 events
-- ============================================================
CREATE TABLE IF NOT EXISTS events (
    event_id          INTEGER PRIMARY KEY AUTOINCREMENT,
    ufcstats_event_id TEXT UNIQUE,
    name              TEXT NOT NULL,
    event_type        TEXT CHECK(event_type IN ('ppv','fight_night','contender_series','dwcs')),
    event_date        TEXT NOT NULL,           -- ISO 8601 date
    venue             TEXT,
    city              TEXT,
    state_country     TEXT,
    elevation_ft      INTEGER,                 -- [PANEL: Opus/Gemini] altitude affects cardio
    is_apex           INTEGER DEFAULT 0,       -- 1 = UFC Apex (25ft cage)
    cage_size_ft      INTEGER DEFAULT 30,      -- 25 for Apex, 30 standard
    total_bouts       INTEGER,
    attendance        INTEGER,                 -- [PANEL: Sonar] crowd size effect
    ufcstats_url      TEXT,
    created_at        TEXT DEFAULT (datetime('now'))
);

CREATE INDEX IF NOT EXISTS idx_events_date ON events(event_date);
CREATE INDEX IF NOT EXISTS idx_events_type ON events(event_type);

-- ============================================================
-- 6.4 fights
-- ============================================================
CREATE TABLE IF NOT EXISTS fights (
    fight_id              INTEGER PRIMARY KEY AUTOINCREMENT,
    ufcstats_fight_id     TEXT UNIQUE,
    event_id              INTEGER NOT NULL REFERENCES events(event_id),
    fighter1_id           INTEGER NOT NULL REFERENCES fighters(fighter_id),
    fighter2_id           INTEGER NOT NULL REFERENCES fighters(fighter_id),
    weight_class_id       INTEGER REFERENCES weight_classes(weight_class_id),
    scheduled_rounds      INTEGER CHECK(scheduled_rounds IN (3,5)) DEFAULT 3,
    is_title_fight        INTEGER DEFAULT 0,
    is_main_event         INTEGER DEFAULT 0,
    is_co_main            INTEGER DEFAULT 0,
    card_position         TEXT CHECK(card_position IN ('main_card','prelim','early_prelim')),
    bout_order            INTEGER,
    catchweight_lbs       REAL,
    fighter1_weigh_in     REAL,
    fighter2_weigh_in     REAL,
    fighter1_missed_weight INTEGER DEFAULT 0,
    fighter2_missed_weight INTEGER DEFAULT 0,
    -- Referee (tracked for stoppage tendency analysis)
    referee_name          TEXT,               -- [PANEL: Gemini/GPT-4.1/Sonar]
    -- Replacement tracking
    is_late_replacement   INTEGER DEFAULT 0,  -- [PANEL: Opus/GPT-4.1]
    replacement_fighter_id INTEGER REFERENCES fighters(fighter_id), -- who replaced
    original_fighter_name TEXT,               -- who was originally scheduled
    notice_days           INTEGER,            -- days of camp (announcement to fight)
    fight_status          TEXT CHECK(fight_status IN ('scheduled','completed','cancelled','no_contest')) DEFAULT 'scheduled',
    created_at            TEXT DEFAULT (datetime('now'))
);

CREATE INDEX IF NOT EXISTS idx_fights_event ON fights(event_id);
CREATE INDEX IF NOT EXISTS idx_fights_f1 ON fights(fighter1_id);
CREATE INDEX IF NOT EXISTS idx_fights_f2 ON fights(fighter2_id);
CREATE INDEX IF NOT EXISTS idx_fights_wc ON fights(weight_class_id);
CREATE INDEX IF NOT EXISTS idx_fights_status ON fights(fight_status);

-- ============================================================
-- 6.5 fight_results
-- ============================================================
CREATE TABLE IF NOT EXISTS fight_results (
    result_id             INTEGER PRIMARY KEY AUTOINCREMENT,
    fight_id              INTEGER NOT NULL UNIQUE REFERENCES fights(fight_id),
    winner_id             INTEGER REFERENCES fighters(fighter_id),
    loser_id              INTEGER REFERENCES fighters(fighter_id),
    method                TEXT NOT NULL,
        -- 'KO/TKO', 'Submission', 'Decision - Unanimous',
        -- 'Decision - Split', 'Decision - Majority', 'DQ',
        -- 'Doctor Stoppage', 'Draw', 'No Contest'
    method_detail         TEXT,    -- 'rear naked choke', 'head kick', 'body shot', etc
    finish_round          INTEGER,
    finish_time_sec       INTEGER,
    total_fight_time_sec  INTEGER,
    went_to_decision      INTEGER DEFAULT 0,
    performance_bonus     TEXT,    -- 'FOTN', 'POTN', NULL
    created_at            TEXT DEFAULT (datetime('now'))
);

CREATE INDEX IF NOT EXISTS idx_results_fight ON fight_results(fight_id);
CREATE INDEX IF NOT EXISTS idx_results_winner ON fight_results(winner_id);
CREATE INDEX IF NOT EXISTS idx_results_method ON fight_results(method);

-- ============================================================
-- 6.6 judges
-- ============================================================
CREATE TABLE IF NOT EXISTS judges (
    judge_id              INTEGER PRIMARY KEY AUTOINCREMENT,
    name                  TEXT NOT NULL UNIQUE,
    home_state            TEXT,
    total_fights_judged   INTEGER DEFAULT 0,
    pct_unanimous         REAL,
    pct_split             REAL,
    pct_majority          REAL,
    pct_favor_aggressor   REAL,   -- % of close rounds given to higher-output fighter
    pct_favor_control     REAL,   -- % given to fighter with more control time
    pct_favor_damage      REAL,   -- % given to fighter with more knockdowns
    pct_favor_wrestler    REAL,   -- tendency to reward takedowns + control
    agreement_rate        REAL,   -- % agreement with other two judges
    controversial_rate    REAL,   -- % of fights with media/community disagreement
    created_at            TEXT DEFAULT (datetime('now')),
    updated_at            TEXT DEFAULT (datetime('now'))
);

-- ============================================================
-- 6.7 bout_judges (junction table)
-- ============================================================
CREATE TABLE IF NOT EXISTS bout_judges (
    fight_id              INTEGER NOT NULL REFERENCES fights(fight_id),
    judge_id              INTEGER NOT NULL REFERENCES judges(judge_id),
    score_fighter1        TEXT,   -- e.g. '29-28'
    score_fighter2        TEXT,   -- e.g. '28-29'
    PRIMARY KEY (fight_id, judge_id)
);

-- [PANEL: Sonar] Pre-fight judge panel snapshot table
-- When judges are announced (1-2 days before event), store for pre-fight edge calc
CREATE TABLE IF NOT EXISTS pre_fight_judges (
    snapshot_id           INTEGER PRIMARY KEY AUTOINCREMENT,
    fight_id              INTEGER NOT NULL REFERENCES fights(fight_id),
    judge_id              INTEGER NOT NULL REFERENCES judges(judge_id),
    captured_at           TEXT NOT NULL,
    UNIQUE(fight_id, judge_id)
);

-- ============================================================
-- 6.8 fighter_fight_stats
-- ============================================================
-- Per-fighter, per-fight statistics. Two rows per fight (one per fighter).
CREATE TABLE IF NOT EXISTS fighter_fight_stats (
    stat_id                   INTEGER PRIMARY KEY AUTOINCREMENT,
    fight_id                  INTEGER NOT NULL REFERENCES fights(fight_id),
    fighter_id                INTEGER NOT NULL REFERENCES fighters(fighter_id),
    -- Striking
    sig_strikes_landed        INTEGER,
    sig_strikes_attempted     INTEGER,
    sig_strikes_absorbed      INTEGER,
    total_strikes_landed      INTEGER,
    total_strikes_attempted   INTEGER,
    knockdowns_scored         INTEGER DEFAULT 0,
    knockdowns_received       INTEGER DEFAULT 0,
    head_strikes_landed       INTEGER,
    head_strikes_attempted    INTEGER,
    body_strikes_landed       INTEGER,
    body_strikes_attempted    INTEGER,
    leg_strikes_landed        INTEGER,
    leg_strikes_attempted     INTEGER,
    distance_strikes_landed   INTEGER,
    distance_strikes_att      INTEGER,
    clinch_strikes_landed     INTEGER,
    clinch_strikes_att        INTEGER,
    ground_strikes_landed     INTEGER,
    ground_strikes_att        INTEGER,
    -- [PANEL: Opus] Accuracy by range (MUST-HAVE — already in UFCStats, just parse it)
    distance_accuracy_pct     REAL,   -- sig strikes at range: landed/attempted
    clinch_accuracy_pct       REAL,   -- sig strikes in clinch: landed/attempted
    ground_accuracy_pct       REAL,   -- sig strikes on ground: landed/attempted
    -- Grappling
    takedowns_landed          INTEGER,
    takedowns_attempted       INTEGER,
    takedowns_defended        INTEGER,
    submission_attempts       INTEGER DEFAULT 0,
    reversals                 INTEGER DEFAULT 0,
    control_time_sec          INTEGER DEFAULT 0,
    -- Calculated per-minute rates (denormalized for query speed)
    sig_strikes_per_min       REAL,
    sig_strikes_abs_per_min   REAL,
    striking_accuracy_pct     REAL,
    striking_defense_pct      REAL,
    takedown_accuracy_pct     REAL,
    takedown_defense_pct      REAL,
    control_time_per_min      REAL,
    -- Meta
    fight_time_sec            INTEGER,
    created_at                TEXT DEFAULT (datetime('now')),
    UNIQUE(fight_id, fighter_id)
);

CREATE INDEX IF NOT EXISTS idx_ffs_fight ON fighter_fight_stats(fight_id);
CREATE INDEX IF NOT EXISTS idx_ffs_fighter ON fighter_fight_stats(fighter_id);

-- ============================================================
-- 6.9 fighter_round_stats
-- ============================================================
CREATE TABLE IF NOT EXISTS fighter_round_stats (
    round_stat_id             INTEGER PRIMARY KEY AUTOINCREMENT,
    fight_id                  INTEGER NOT NULL REFERENCES fights(fight_id),
    fighter_id                INTEGER NOT NULL REFERENCES fighters(fighter_id),
    round_number              INTEGER NOT NULL,
    sig_strikes_landed        INTEGER,
    sig_strikes_attempted     INTEGER,
    sig_strikes_absorbed      INTEGER,
    knockdowns_scored         INTEGER DEFAULT 0,
    knockdowns_received       INTEGER DEFAULT 0,
    takedowns_landed          INTEGER,
    takedowns_attempted       INTEGER,
    submission_attempts       INTEGER DEFAULT 0,
    reversals                 INTEGER DEFAULT 0,
    control_time_sec          INTEGER DEFAULT 0,
    head_strikes_landed       INTEGER,
    body_strikes_landed       INTEGER,
    leg_strikes_landed        INTEGER,
    distance_strikes_landed   INTEGER,
    clinch_strikes_landed     INTEGER,
    ground_strikes_landed     INTEGER,
    round_duration_sec        INTEGER DEFAULT 300,
    created_at                TEXT DEFAULT (datetime('now')),
    UNIQUE(fight_id, fighter_id, round_number)
);

CREATE INDEX IF NOT EXISTS idx_frs_fight ON fighter_round_stats(fight_id);
CREATE INDEX IF NOT EXISTS idx_frs_fighter ON fighter_round_stats(fighter_id);
CREATE INDEX IF NOT EXISTS idx_frs_round ON fighter_round_stats(round_number);

-- ============================================================
-- 6.10 fighter_rolling_stats
-- ============================================================
-- Pre-computed rolling career averages for each fighter AT THE TIME of
-- each fight. Avoids recalculating from raw stats on every query.
-- Updated BEFORE each fight via ETL. Enables point-in-time backtesting
-- without future data leak.
CREATE TABLE IF NOT EXISTS fighter_rolling_stats (
    rolling_id                INTEGER PRIMARY KEY AUTOINCREMENT,
    fighter_id                INTEGER NOT NULL REFERENCES fighters(fighter_id),
    as_of_fight_id            INTEGER NOT NULL REFERENCES fights(fight_id),
    as_of_date                TEXT NOT NULL,
    num_fights                INTEGER,
    num_ufc_fights            INTEGER,
    -- Rolling averages (career, last 3, last 5)
    career_slpm               REAL,
    career_sapm               REAL,
    career_str_acc            REAL,
    career_str_def            REAL,
    career_td_acc             REAL,
    career_td_def             REAL,
    career_sub_att_per15      REAL,
    career_ctrl_per15         REAL,
    career_kd_per15           REAL,
    last3_slpm                REAL,
    last3_sapm                REAL,
    last3_str_acc             REAL,
    last3_str_def             REAL,
    last3_td_acc              REAL,
    last3_td_def              REAL,
    last5_slpm                REAL,
    last5_sapm                REAL,
    -- [PANEL: Opus] Bayesian shrinkage applied values (shrunk toward weight class mean)
    -- Formula: adj_stat = (n * fighter_stat + k * class_mean) / (n + k), k=5
    adj_slpm                  REAL,   -- shrinkage-adjusted SLpM
    adj_sapm                  REAL,
    adj_str_acc               REAL,
    adj_td_acc                REAL,
    adj_td_def                REAL,
    adj_finish_rate           REAL,
    -- Finishing rates
    career_ko_rate            REAL,
    career_sub_rate           REAL,
    career_dec_rate           REAL,
    career_finish_rate        REAL,
    -- Durability
    career_kd_absorbed_per15  REAL,
    career_ko_losses          INTEGER,
    -- Cardio (output decline)
    avg_r1_output             REAL,
    avg_r2_output             REAL,
    avg_r3_output             REAL,
    avg_r4_output             REAL,   -- NULL if never fought 4+ rounds
    avg_r5_output             REAL,   -- NULL if never fought 5 rounds
    output_decline_pct        REAL,
    -- Physical at fight time
    age_at_fight              REAL,
    days_since_last_fight     INTEGER,
    weight_class_id           INTEGER REFERENCES weight_classes(weight_class_id),
    created_at                TEXT DEFAULT (datetime('now')),
    UNIQUE(fighter_id, as_of_fight_id)
);

CREATE INDEX IF NOT EXISTS idx_rolling_fighter ON fighter_rolling_stats(fighter_id);
CREATE INDEX IF NOT EXISTS idx_rolling_date ON fighter_rolling_stats(as_of_date);

-- ============================================================
-- 6.11 fighter_weight_history
-- ============================================================
CREATE TABLE IF NOT EXISTS fighter_weight_history (
    history_id        INTEGER PRIMARY KEY AUTOINCREMENT,
    fighter_id        INTEGER NOT NULL REFERENCES fighters(fighter_id),
    fight_id          INTEGER NOT NULL REFERENCES fights(fight_id),
    weight_class_id   INTEGER REFERENCES weight_classes(weight_class_id),
    weigh_in_lbs      REAL,
    missed_weight     INTEGER DEFAULT 0,
    missed_weight_lbs REAL,
    rehydration_est   REAL,
    moved_up          INTEGER DEFAULT 0,
    moved_down        INTEGER DEFAULT 0,
    UNIQUE(fighter_id, fight_id)
);

CREATE INDEX IF NOT EXISTS idx_wh_fighter ON fighter_weight_history(fighter_id);

-- ============================================================
-- 6.12 style_matchup_matrix
-- ============================================================
CREATE TABLE IF NOT EXISTS style_matchup_matrix (
    matchup_id        INTEGER PRIMARY KEY AUTOINCREMENT,
    style_a           TEXT NOT NULL,
    style_b           TEXT NOT NULL,
    weight_class_id   INTEGER REFERENCES weight_classes(weight_class_id),
    weight_class_group TEXT,     -- [PANEL: Gemini] 'lower'(125-145), 'middle'(155-170), 'upper'(185-265)
    style_a_win_rate  REAL,
    sample_size       INTEGER,
    last_updated      TEXT,
    UNIQUE(style_a, style_b, weight_class_id)
);

-- [PANEL: Gemini] Group weight classes for style matchup to get statistically significant sample sizes.
-- Calculating 'Wrestler vs Boxer at Lightweight' alone yields tiny samples (8-12 fights).
-- Group: lower (125-145), middle (155-170), upper (185-265).

-- ============================================================
-- 6.13 betting_odds
-- ============================================================
CREATE TABLE IF NOT EXISTS betting_odds (
    odds_id           INTEGER PRIMARY KEY AUTOINCREMENT,
    fight_id          INTEGER NOT NULL REFERENCES fights(fight_id),
    fighter_id        INTEGER NOT NULL REFERENCES fighters(fighter_id),
    book              TEXT NOT NULL CHECK(book IN ('draftkings','pinnacle','kalshi','polymarket','fanduel','betmgm','betmma','tapology')),
    market_type       TEXT NOT NULL,
        -- 'moneyline', 'method_ko', 'method_sub', 'method_dec',
        -- 'over_rounds', 'under_rounds', 'round_1', 'round_2', 'round_3',
        -- 'fight_gtd', 'itd'
    odds_american     INTEGER,
    odds_decimal      REAL,
    implied_prob      REAL,
    line_value        REAL,          -- for over/under (e.g., 2.5 rounds)
    snapshot_type     TEXT CHECK(snapshot_type IN ('opening','early','midweek','closing')),
    volume            INTEGER,       -- Kalshi contract volume
    captured_at       TEXT NOT NULL,
    created_at        TEXT DEFAULT (datetime('now'))
);

CREATE INDEX IF NOT EXISTS idx_odds_fight ON betting_odds(fight_id);
CREATE INDEX IF NOT EXISTS idx_odds_fighter ON betting_odds(fighter_id);
CREATE INDEX IF NOT EXISTS idx_odds_book ON betting_odds(book);
CREATE INDEX IF NOT EXISTS idx_odds_market ON betting_odds(market_type);
CREATE INDEX IF NOT EXISTS idx_odds_snapshot ON betting_odds(snapshot_type);
CREATE INDEX IF NOT EXISTS idx_odds_captured ON betting_odds(captured_at);

-- [PANEL: Opus] Line movement history table (MUST-HAVE)
-- Track line velocity, not just opening/closing.
CREATE TABLE IF NOT EXISTS odds_line_movement (
    movement_id       INTEGER PRIMARY KEY AUTOINCREMENT,
    fight_id          INTEGER NOT NULL REFERENCES fights(fight_id),
    fighter_id        INTEGER NOT NULL REFERENCES fighters(fighter_id),
    book              TEXT NOT NULL,
    market_type       TEXT NOT NULL,
    odds_american     INTEGER,
    implied_prob      REAL,
    movement_magnitude REAL,        -- delta from previous snapshot
    captured_at       TEXT NOT NULL,
    hours_to_event    REAL,         -- negative = post-event
    created_at        TEXT DEFAULT (datetime('now'))
);

CREATE INDEX IF NOT EXISTS idx_lm_fight ON odds_line_movement(fight_id);
CREATE INDEX IF NOT EXISTS idx_lm_captured ON odds_line_movement(captured_at);

-- ============================================================
-- 6.14 predictions
-- ============================================================
CREATE TABLE IF NOT EXISTS predictions (
    prediction_id     INTEGER PRIMARY KEY AUTOINCREMENT,
    fight_id          INTEGER NOT NULL REFERENCES fights(fight_id),
    fighter_id        INTEGER NOT NULL REFERENCES fighters(fighter_id),
    source            TEXT NOT NULL,
        -- 'betmma', 'tapology', 'fightmatrix', 'action_network',
        -- 'edgeclaw_elo', 'market_implied'
    win_prob          REAL,
    ko_prob           REAL,
    sub_prob          REAL,
    dec_prob          REAL,
    confidence        REAL,
    captured_at       TEXT NOT NULL,
    created_at        TEXT DEFAULT (datetime('now'))
);

CREATE INDEX IF NOT EXISTS idx_pred_fight ON predictions(fight_id);
CREATE INDEX IF NOT EXISTS idx_pred_source ON predictions(source);

-- ============================================================
-- 6.15 calculated_edges
-- ============================================================
CREATE TABLE IF NOT EXISTS calculated_edges (
    edge_id              INTEGER PRIMARY KEY AUTOINCREMENT,
    fight_id             INTEGER NOT NULL REFERENCES fights(fight_id),
    fighter_id           INTEGER NOT NULL REFERENCES fighters(fighter_id),
    -- No-vig probabilities
    pinnacle_novig_prob  REAL,
    dk_novig_prob        REAL,
    kalshi_novig_prob    REAL,
    -- Model ensemble
    model_avg_prob       REAL,
    model_median_prob    REAL,
    model_count          INTEGER,
    model_std_dev        REAL,
    -- Edge calculations
    edge_vs_pinnacle     REAL,
    edge_vs_kalshi       REAL,
    edge_vs_dk           REAL,
    cross_book_edge      REAL,
    -- Sharp action indicators
    reverse_line_movement INTEGER DEFAULT 0,
    steam_move           INTEGER DEFAULT 0,
    vig_squeeze          INTEGER DEFAULT 0,
    kalshi_volume_surge  INTEGER DEFAULT 0,
    line_move_direction  TEXT,
    line_move_magnitude  REAL,
    line_move_velocity   REAL,       -- [PANEL: Opus] rate of change per hour
    -- Method edges
    model_ko_prob        REAL,
    model_sub_prob       REAL,
    model_dec_prob       REAL,
    market_ko_prob       REAL,
    market_sub_prob      REAL,
    market_dec_prob      REAL,
    ko_edge              REAL,
    sub_edge             REAL,
    dec_edge             REAL,
    -- Composite & tier
    quality_composite    REAL,
    edge_tier            TEXT CHECK(edge_tier IN ('S','A','B','C','D')),
    -- Post-fight grading
    clv_pinnacle         REAL,
    clv_kalshi           REAL,
    prediction_correct   INTEGER,
    method_correct       INTEGER,
    brier_score          REAL,       -- per-prediction Brier score
    created_at           TEXT DEFAULT (datetime('now')),
    updated_at           TEXT DEFAULT (datetime('now')),
    UNIQUE(fight_id, fighter_id)
);

CREATE INDEX IF NOT EXISTS idx_edges_fight ON calculated_edges(fight_id);
CREATE INDEX IF NOT EXISTS idx_edges_fighter ON calculated_edges(fighter_id);
CREATE INDEX IF NOT EXISTS idx_edges_tier ON calculated_edges(edge_tier);
CREATE INDEX IF NOT EXISTS idx_edges_composite ON calculated_edges(quality_composite);

-- ============================================================
-- 6.16 live_tracking
-- ============================================================
-- WAL mode handles concurrent writes. Batch writes every 1 min for ML,
-- every 5 min for method/round props to reduce contention.
CREATE TABLE IF NOT EXISTS live_tracking (
    tracking_id       INTEGER PRIMARY KEY AUTOINCREMENT,
    fight_id          INTEGER NOT NULL REFERENCES fights(fight_id),
    fighter_id        INTEGER NOT NULL REFERENCES fighters(fighter_id),
    book              TEXT DEFAULT 'kalshi',
    market_type       TEXT DEFAULT 'moneyline',
    price             REAL,
    implied_prob      REAL,
    volume            INTEGER,
    volume_delta      INTEGER,
    captured_at       TEXT NOT NULL,
    minutes_to_event  INTEGER,       -- negative = event started
    round_number      INTEGER,       -- which round is live (NULL if between rounds)
    data_latency_ms   INTEGER,       -- [PANEL: Sonar] SLA monitoring
    is_stale          INTEGER DEFAULT 0,  -- 1 if capture missed SLA window
    created_at        TEXT DEFAULT (datetime('now'))
);

CREATE INDEX IF NOT EXISTS idx_live_fight ON live_tracking(fight_id);
CREATE INDEX IF NOT EXISTS idx_live_time ON live_tracking(captured_at);

-- ============================================================
-- 6.17 fighter_injuries (PANEL ADDITION — MUST-HAVE)
-- ============================================================
-- [PANEL: Opus, GPT-4.1, Grok 4.1, DeepSeek, Grok 4.1 Fast]
-- Medical suspensions and injury history. All panel models flagged as MUST-HAVE.
CREATE TABLE IF NOT EXISTS fighter_injuries (
    injury_id         INTEGER PRIMARY KEY AUTOINCREMENT,
    fighter_id        INTEGER NOT NULL REFERENCES fighters(fighter_id),
    fight_id          INTEGER REFERENCES fights(fight_id),  -- fight that caused it
    injury_type       TEXT,           -- 'knee', 'shoulder', 'orbital', 'hand', 'back', etc
    suspension_days   INTEGER,        -- mandatory medical suspension length
    suspension_end    TEXT,           -- ISO date when cleared to return
    is_cleared        INTEGER DEFAULT 0,
    clearance_type    TEXT,           -- 'time_served', 'medical_clearance'
    source            TEXT,           -- 'nsac', 'csac', 'mmajunkie', 'twitter'
    notes             TEXT,
    created_at        TEXT DEFAULT (datetime('now'))
);

CREATE INDEX IF NOT EXISTS idx_injuries_fighter ON fighter_injuries(fighter_id);

-- ============================================================
-- 6.18 referees
-- ============================================================
-- [PANEL: Gemini, GPT-4.1, Sonar, Grok 4.1 Fast]
-- Referee stoppage tendencies affect method/round market pricing.
CREATE TABLE IF NOT EXISTS referees (
    referee_id        INTEGER PRIMARY KEY AUTOINCREMENT,
    name              TEXT NOT NULL UNIQUE,
    total_fights      INTEGER DEFAULT 0,
    early_stoppage_rate REAL,         -- % of fights stopped before most refs would
    late_stoppage_rate  REAL,         -- % of fights allowed to continue longer
    standup_rate        REAL,         -- standups per fight (favors strikers)
    finish_rate_as_ref  REAL,         -- KO+Sub rate in fights they ref
    -- Apex vs arena tendency (referees may behave differently in smaller cage)
    apex_finish_rate    REAL,
    arena_finish_rate   REAL,
    created_at          TEXT DEFAULT (datetime('now')),
    updated_at          TEXT DEFAULT (datetime('now'))
);

-- Junction: which referee is assigned to a fight (pre-event, scraped from UFC.com)
CREATE TABLE IF NOT EXISTS fight_referees (
    fight_id          INTEGER NOT NULL REFERENCES fights(fight_id),
    referee_id        INTEGER NOT NULL REFERENCES referees(referee_id),
    confirmed         INTEGER DEFAULT 0,   -- 0 = expected, 1 = confirmed on fight card
    captured_at       TEXT NOT NULL,
    PRIMARY KEY (fight_id, referee_id)
);

-- ============================================================
-- 6.19 data_quality_log
-- ============================================================
-- [PANEL: Sonar] Sanity checks on scraped data. Flag and quarantine bad rows.
CREATE TABLE IF NOT EXISTS data_quality_log (
    log_id            INTEGER PRIMARY KEY AUTOINCREMENT,
    table_name        TEXT NOT NULL,
    record_id         INTEGER,
    field_name        TEXT,
    raw_value         TEXT,
    issue             TEXT,           -- 'out_of_range', 'impossible_value', 'missing_required'
    resolution        TEXT,           -- 'quarantined', 'corrected', 'accepted_with_flag'
    created_at        TEXT DEFAULT (datetime('now'))
);

7. Quality Composite Formula (0-100 Score)

Computed FRESH for each fighter in each specific matchup. Matchup-dependent — the same fighter will have different composite scores against different opponents.

quality_composite = (
    0.25 * striking_differential_score
  + 0.20 * grappling_advantage_score
  + 0.15 * finishing_ability_score
  + 0.10 * cardio_pace_score
  + 0.10 * durability_score
  + 0.10 * style_matchup_score
  + 0.05 * experience_recency_score
  + 0.05 * situational_score
)

[PANEL: Opus — CRITICAL] These weights (25/20/15/10/10/10/5/5) are educated guesses with zero empirical backing. The plan: launch with these weights as starting point, then implement automated weight optimization using logistic regression on historical fight outcomes. The composite should be a learned model, not a hand-tuned formula. At minimum, run a backtest on 2018-2024 data before going live and adjust weights based on actual predictive power per component. [PANEL: Sonar] Use inverse-MSE weighted averaging as an intermediate approach until logistic regression has enough data.

[PANEL: Opus] The optimal weights will vary BY WEIGHT CLASS (grappling matters more at 170+, striking differential matters more at 135-155) and BY ERA (wrestling dominance has declined since 2018). Per-class tuning deferred to v2 (need 12+ months of data per class).

7.1 Striking Differential Score (25%)

Percentile rank of (last3_slpm - last3_sapm) within the fighter's weight class. A fighter at the 80th percentile in striking differential gets 80. Bonuses: +5 for reach advantage >3 inches. +5 for knockdowns_per_15 in top quartile of division.

Use Bayesian-shrinkage-adjusted stats (adj_slpm, adj_sapm) for fighters with <5 UFC fights.

7.2 Grappling Advantage Score (20%)

Weighted blend within weight class:

Use adjusted stats for small-sample fighters.

7.3 Finishing Ability Score (15%)

career_finish_rate * 50
+ knockdowns_per_15_percentile * 30
+ (opponent career_ko_losses > 2 ? +20 : 0)

Capped at 100.

7.4 Cardio / Pace Score (10%)

100 - (output_decline_pct * 100)

A fighter who maintains 90% of R1 output in R3 scores 90. Bonus: +10 for positive championship round history (win % in R4-R5 fights). Penalty: -10 if categorized as "Cardio Cliff" archetype.

7.5 Durability / Chin Score (10%)

100 - (career_kd_absorbed_per15_percentile * 0.5 + ko_losses_penalty)

KO losses penalty — improved decay model [PANEL: Opus]:

The original step function (0/-15/-30/-50) ignores context. Use a decay function:

chin_penalty = sum over each KO loss:
    base_penalty * (1 - recovery_rate * months_since_KO)

where:
    base_penalty = 20 per KO loss
    recovery_rate = 0.02 per month
    minimum recovered penalty = 5 per KO loss (never fully goes away)
    recency flag: if KO loss within last 2 fights, apply 1.5x multiplier

For v1 implementation simplicity, the original step function (0/-15/-30/-50) with recency doubling is acceptable. Upgrade to the decay model in v2.

7.6 Style Matchup Score (10%)

Lookup style_a_win_rate from style_matchup_matrix for this fighter's primary style vs opponent's primary style within weight class GROUP (lower/middle/upper). win_rate * 100 = score. If wrestler vs boxer and wrestler wins 62% historically, score = 62. If no data, default to 50.

[PANEL: Gemini] Group weight classes (lower 125-145, middle 155-170, upper 185-265) to get statistically significant sample sizes. Per-class data is too sparse for most style pairings.

[PANEL: Gemini] Add Stance Matchup modifier: Open-stance (Southpaw vs Orthodox) fights have measurably different SLpM/SApM baselines. Apply ±3 points modifier based on historical open vs closed stance win rates at this weight class group.

7.7 Experience / Recency Score (5%)

Base: min(ufc_fights * 5, 50)
Layoff adjustment:
  60-180 days since last fight = 0   (optimal)
  30-59 days = -10                   (short camp)
  181-365 days = -15                 (ring rust)
  365+ days = -25                    (severe ring rust)

Short notice replacement: If this fighter is a late replacement with notice_days < 28, apply an additional -8 penalty. [PANEL: Opus/GPT-4.1]

7.8 Situational Score (5%)

Base: 50
Modifiers:
  Title fight / main event: +10
  PPV main card: +5
  Missed weight: -15
  Moving up in weight: -5
  Moving down in weight: +5 (if successful history)
  Altitude >5000ft: -5 (both fighters equally)
  Apex cage (25ft): +5 for wrestlers/pressure fighters, -5 for counter strikers
  Returning from medical suspension: -8 if cleared from 90+ day suspension  [PANEL: Opus]
  Camp change to elite gym (within 12 months): +5                           [PANEL: Opus/GPT-4.1]
  Camp change within 6 months (still adjusting): -3
  Left elite gym for smaller camp: -5

CRITICAL — AGE CLIFF (improved) [PANEL: Opus/Gemini/Sonar]:

The original spec uses a step function at exactly age 35 for fighters below 185 lbs. This creates discontinuities (34.9 vs 35.1 year olds treated wildly differently).

Improved model: Use a sigmoid decay curve that starts at 32 and reaches full penalty by 38. Curve steepness by division:

For v1 implementation simplicity, the original step function with correct division groupings is acceptable. Upgrade to sigmoid in v2.

7.9 Compound Situation Boosters (additive modifiers)

These stack on top of the 0-100 base. Apply a floor of 10 — the composite cannot drop below 10 after all boosters. [PANEL: Opus]

Compound Decline Penalty (up to -15):

Weight Cut Severity Penalty (up to -15):

Camp Change Boost/Penalty:

Elite Gym List [PANEL: GPT-4.1]: Maintain as a config object (not hardcoded strings), updated annually with win rates:

{
  "elite_gyms": [
    "American Top Team", "City Kickboxing", "Sanford MMA",
    "Jackson-Wink MMA", "AKA", "Tristar Gym", "Glory MMA",
    "10th Planet Jiu-Jitsu", "Rafael Lovato Jr MMA"
  ]
}

Market Resistance Filter: When model shows big edge (S or A tier) but the line doesn't move for 3+ hours while other fight lines are moving, flag and downgrade by one tier.


8. Edge Tier Classification

BOTH composite threshold AND model edge must be met simultaneously.

Tier Composite Model Edge Required Action
S 85-100 +7% vs market Strong play — max unit
A 70-84 +5% vs market Standard play — 1 unit
B 55-69 +3% vs market Lean play — half unit
C 40-54 Any positive edge Track only — no bet
D 0-39 N/A Skip — insufficient data

A fight with a great composite but no market edge is still C-tier. A fight with a huge market edge but weak composite is suspicious.

[PANEL: Opus] S-tier threshold of +7% may be too strict for moneyline. On Kalshi/Polymarket, UFC markets often have wide spreads (5-10 cents). The real edges are in METHOD and ROUNDS props where pricing is less efficient. Consider: S-tier at +5% for method/round props (higher inefficiency) and +7% only for moneyline.


9. Method of Victory (MoV) Modeling

The biggest UFC-specific edge vector. Method markets are less efficient than moneyline because books must price three-way probabilities. Public bettors systematically overvalue knockouts and underprice decisions.

9.1 Base Rates

fighter_A_ko_rate    = A.ko_wins / A.total_wins
fighter_A_sub_rate   = A.sub_wins / A.total_wins
fighter_A_dec_rate   = A.dec_wins / A.total_wins

fighter_B_ko_loss_rate  = B.ko_losses / B.total_losses
fighter_B_sub_loss_rate = B.sub_losses / B.total_losses
fighter_B_dec_loss_rate = B.dec_losses / B.total_losses

9.2 Combined Method Probability (sample-size weighted blend)

-- For Fighter A winning by KO:
raw_ko_prob = win_prob_A * (w1 * fighter_A_ko_rate + w2 * fighter_B_ko_loss_rate)

-- Sample-size weights:
w1 = A.total_wins / (A.total_wins + B.total_losses)
w2 = B.total_losses / (A.total_wins + B.total_losses)

-- Normalize so KO + Sub + Dec = win_prob_A

[PANEL: Sonar — CRITICAL] Method outcomes are NOT independent: High TD accuracy correlates with submission wins; weak chin correlates with KO losses. The sample-size weighted blend treats these as independent probabilities. In v2, use multinomial logistic regression or categorical outcome trees to model the full joint distribution. For v1, the blend is adequate.

Alternative approach (geometric mean) — deferred to v2 per panel consensus. Build sample-size weighted first, backtest, then compare.

9.3 Adjustments

9.4 Market Comparison

Compare model method probabilities to DraftKings method props. Edge = model_ko_prob - market_ko_implied_prob. Method markets typically have 15-25% vig, so significant edges exist. Biggest inefficiencies: decision markets (underpriced) and submission markets (mispriced in grappling matchups).


10. Round-by-Round Modeling & Cardio Curves

10.1 Cardio Curves

-- Compute from fighter_round_stats:
SELECT
    fighter_id,
    round_number,
    AVG(sig_strikes_landed * 1.0 / (round_duration_sec / 60.0)) AS avg_strikes_per_min
FROM fighter_round_stats
GROUP BY fighter_id, round_number
ORDER BY fighter_id, round_number;

10.2 Round Finish Probability

P(finish_in_round_N) = base_finish_rate
    * cardio_multiplier(fighter_A, N)
    * cardio_multiplier(fighter_B, N)
    * cumulative_damage_factor(N)

base_finish_rate = (A.career_finish_rate + B.career_finish_rate) / 2

cardio_multiplier(fighter, N) = fighter.cardio_curve[N] / fighter.cardio_curve[1]
  -- The WEAKER fighter's decline INCREASES opponent's finish probability

cumulative_damage_factor(N) = 1.0 + (0.05 * N)
  -- 5% increase per round (accumulated damage)

P(goes_to_decision) = 1 - SUM(P(finish_in_round_N) for N in 1..scheduled_rounds)

[PANEL: Sonar] Round probabilities are path-dependent, not independent. If Fighter A is damaged in R2, R3 finish probability is conditional on surviving R2 damaged. In v2, use Markov chain or hidden-state model (HMM). For v1, the naive independence model is acceptable.

10.3 Over/Under Rounds Pricing

-- Standard 3-round fight:
P(under_1.5) = P(finish_R1)
P(over_1.5)  = 1 - P(under_1.5)
P(under_2.5) = P(finish_R1) + P(finish_R2)
P(over_2.5)  = 1 - P(under_2.5)

-- 5-round title fights: extend to 2.5, 3.5, 4.5

The 2.5 round market in title fights is often mispriced because the market anchors on 3-round base rates.


11. Live Tracking Architecture

1-minute cadence for ML prices; 5-minute cadence for method/round props. This cuts data volume by ~70% vs tracking all markets every minute. [PANEL: Opus]

11.1 What to Capture

Every 1 minute (ML only):

Every 5 minutes (method/round):

Round breaks (critical snapshots):

11.2 UFC-Specific Considerations

[PANEL: Sonar — HIGH] Live tracking needs a data latency SLA. Define: (1) max acceptable staleness per data point (e.g., 90 seconds for ML), (2) fallback hierarchy: Kalshi > Polymarket > Pinnacle, (3) capture timestamp vs target timestamp in data_latency_ms column. A 10-second latency can be the difference between catching a 0.30 cent swing and being late.

11.3 SQLite WAL Mode for Live Tracking

SQLite in WAL mode handles this use case adequately at the scale of 1-2 UFC events. If contention occurs, write live tracking to a separate SQLite file (live_tracking.db) to isolate write contention from the main database. [PANEL: Opus]


12. Closing Odds & Results Grading

Closing Odds Snapshot:

Results Grading (post-fight):


13. Implementation Roadmap (10-Step Build Order)

Panel consensus on build order: backfill and backtest first, live tracking last.

Step 1 — UFCStats scraper + fighter database + historical backfill (2018+)

Step 2 — Rolling stats calculator + Bayesian shrinkage + percentile rankings

Step 3 — Pinnacle closing line scraper + Odds API + historical odds backfill (2018+)

Step 4 — Composite formula v1 + backtesting framework

Step 5 — Method of Victory model + round probability model

Step 6 — Kalshi + Polymarket scrapers (pre-event snapshots only, no live tracking yet)

Step 7 — External prediction scrapers (BetMMA.tips + Tapology) + ensemble

Step 8 — Judge bias database (MMADecisions.com) + referee tendencies + decision probability adjustment

Step 9 — Automation: event detection, pre-event reports, post-event grading, Brier scoring

Step 10 — Live 1-minute tracking (ML prices only) + late replacement alerts + line movement alerts

[PANEL: Sonar] Pre-launch production simulation (week 0): Before building anything, run the entire pipeline on a historical fight card (e.g., 5 fights from Jan 2026). Scrape all sources, calculate edges, generate a pre-event report, grade against actual outcomes. Catches broken scrapers, API limits, and schema surprises before real money is involved.

[PANEL: Sonar] Explicit launch gate: Launch when moneyline Brier score > 0.52 on last 50 backtested fights AND Method of Victory model backtest > 0.51. If Phase 10 is reached without these benchmarks, either the model doesn't work or it's untested.


14. Collection Cadence & Storage Estimates

Collection Schedule

Source Cadence Notes
UFCStats fighter profiles Weekly (Monday) Detect new fighters, update active rosters
UFCStats fight stats After each event Cache-first: check if fight exists before scraping
UFCStats round stats After each event Same event trigger as fight stats
Sherdog records Weekly Fallback / supplemental to UFCStats
Tapology fight cards Daily Detect upcoming fights, late replacements
MMA Junkie RSS Hourly Injury reports, camp news, replacement announcements
UFC.com event pages Daily Judge/referee assignments (appear 1-2 days before event)
The Odds API 4x/day baseline, hourly fight week Opening/early/midweek/closing snapshots
BestFightOdds 6h pre-fight week, 1h fight week, 15min post-weigh-in Line movement tracking
Kalshi / Polymarket 4x/day baseline, 6h pre-fight week Pre-event snapshots
Live: ML prices Every 1 minute during event Fight window only
Live: method/round Every 5 minutes during event Fight window only
MMADecisions.com One-time bulk + after each event Judge scorecards
BetMMA.tips 24h before event Prediction ensemble input
FightMatrix 48h before event ELO-based predictions

Storage Estimates

Table Rows/year Row size Annual storage
fighters ~200 new ~500 bytes ~100 KB
events ~45 ~200 bytes ~9 KB
fights ~600 ~400 bytes ~240 KB
fight_results ~600 ~200 bytes ~120 KB
fighter_fight_stats ~1200 ~600 bytes ~720 KB
fighter_round_stats ~3600 ~400 bytes ~1.4 MB
fighter_rolling_stats ~1200 ~800 bytes ~960 KB
fighter_weight_history ~1200 ~200 bytes ~240 KB
betting_odds ~15000 ~200 bytes ~3 MB
odds_line_movement ~50000 ~200 bytes ~10 MB
predictions ~6000 ~200 bytes ~1.2 MB
calculated_edges ~1200 ~500 bytes ~600 KB
live_tracking ~180000 ~200 bytes ~36 MB
fighter_injuries ~300 ~200 bytes ~60 KB
judges / bout_judges ~500 ~300 bytes ~150 KB
referees / fight_referees ~200 ~300 bytes ~60 KB
Total (year 1) ~55 MB

Historical backfill (2018-2026, ~8 years): ~400 MB. Well within SQLite's practical limits. After 5 years of live operation: ~650 MB total — still fine for SQLite.


15. Key Queries (SQL)

Fighter composite inputs at time of fight

-- Get all rolling stats for a specific fighter going into a specific fight
SELECT
    f.name,
    frs.age_at_fight,
    frs.days_since_last_fight,
    frs.career_finish_rate,
    frs.career_ko_losses,
    frs.adj_slpm,        -- Bayesian-shrunk SLpM
    frs.adj_sapm,
    frs.adj_str_acc,
    frs.adj_td_acc,
    frs.adj_td_def,
    frs.output_decline_pct,
    frs.career_ko_rate,
    frs.career_sub_rate,
    frs.career_dec_rate,
    f.primary_style,
    f.team_gym,
    f.gym_change_date
FROM fighters f
JOIN fighter_rolling_stats frs ON f.fighter_id = frs.fighter_id
WHERE frs.as_of_fight_id = ?   -- specific fight ID
  AND frs.fighter_id = ?;       -- specific fighter ID

Weight-class percentile rankings (used in composite scoring)

-- Compute percentile rank of a fighter's striking differential within their weight class
WITH class_fighters AS (
    SELECT
        frs.fighter_id,
        (frs.adj_slpm - frs.adj_sapm) AS str_diff,
        frs.weight_class_id,
        COUNT(*) OVER (PARTITION BY frs.weight_class_id) AS class_count,
        RANK() OVER (PARTITION BY frs.weight_class_id ORDER BY (frs.adj_slpm - frs.adj_sapm)) AS rank_in_class
    FROM fighter_rolling_stats frs
    WHERE frs.as_of_date = (
        SELECT MAX(as_of_date) FROM fighter_rolling_stats WHERE fighter_id = frs.fighter_id
    )
)
SELECT
    fighter_id,
    str_diff,
    ROUND(1.0 * rank_in_class / class_count * 100, 1) AS percentile
FROM class_fighters
WHERE weight_class_id = ?;

Method of Victory edge vs market

-- Compare model method probs to market for all upcoming fights
SELECT
    f1.name AS fighter1,
    f2.name AS fighter2,
    ce.model_ko_prob,
    ce.market_ko_prob,
    ce.ko_edge,
    ce.model_sub_prob,
    ce.market_sub_prob,
    ce.sub_edge,
    ce.model_dec_prob,
    ce.market_dec_prob,
    ce.dec_edge,
    ce.edge_tier
FROM calculated_edges ce
JOIN fights fi ON ce.fight_id = fi.fight_id
JOIN fighters f1 ON fi.fighter1_id = f1.fighter_id
JOIN fighters f2 ON fi.fighter2_id = f2.fighter_id
JOIN events ev ON fi.event_id = ev.event_id
WHERE ev.event_date >= date('now')
  AND ce.fighter_id = fi.fighter1_id
ORDER BY ABS(ce.ko_edge) + ABS(ce.sub_edge) + ABS(ce.dec_edge) DESC;

Line movement velocity alert

-- Detect significant line moves (>10% shift) in last 6 hours
SELECT
    lm.fight_id,
    lm.fighter_id,
    f.name,
    lm.book,
    MIN(lm.implied_prob) AS prob_low,
    MAX(lm.implied_prob) AS prob_high,
    (MAX(lm.implied_prob) - MIN(lm.implied_prob)) AS movement_magnitude,
    COUNT(*) AS snapshot_count
FROM odds_line_movement lm
JOIN fighters f ON lm.fighter_id = f.fighter_id
WHERE lm.captured_at >= datetime('now', '-6 hours')
  AND lm.market_type = 'moneyline'
GROUP BY lm.fight_id, lm.fighter_id, lm.book
HAVING movement_magnitude > 0.10
ORDER BY movement_magnitude DESC;

S and A tier picks for upcoming event

-- All S and A tier picks for the next event
SELECT
    ev.name AS event,
    ev.event_date,
    f.name AS fighter,
    ce.quality_composite,
    ce.edge_tier,
    ce.edge_vs_pinnacle,
    ce.edge_vs_kalshi,
    ce.model_avg_prob,
    ce.pinnacle_novig_prob
FROM calculated_edges ce
JOIN fights fi ON ce.fight_id = fi.fight_id
JOIN events ev ON fi.event_id = ev.event_id
JOIN fighters f ON ce.fighter_id = f.fighter_id
WHERE ev.event_date = (SELECT MIN(event_date) FROM events WHERE event_date >= date('now'))
  AND ce.edge_tier IN ('S','A')
ORDER BY ce.quality_composite DESC;

Pre-event judge panel impact

-- For upcoming fights where judges are announced, show judge tendencies
SELECT
    f1.name AS fighter1,
    f2.name AS fighter2,
    j.name AS judge,
    j.pct_favor_aggressor,
    j.pct_favor_control,
    j.pct_favor_damage,
    j.pct_split,
    j.controversial_rate
FROM pre_fight_judges pfj
JOIN fights fi ON pfj.fight_id = fi.fight_id
JOIN judges j ON pfj.judge_id = j.judge_id
JOIN fighters f1 ON fi.fighter1_id = f1.fighter_id
JOIN fighters f2 ON fi.fighter2_id = f2.fighter_id
JOIN events ev ON fi.event_id = ev.event_id
WHERE ev.event_date = (SELECT MIN(event_date) FROM events WHERE event_date >= date('now'))
ORDER BY fi.bout_order, j.name;

Cardio curve for fighter

-- Output per round for a specific fighter (cardio curve)
SELECT
    fighter_id,
    round_number,
    AVG(CASE WHEN round_duration_sec > 0
        THEN sig_strikes_landed * 1.0 / (round_duration_sec / 60.0)
        ELSE NULL END) AS avg_strikes_per_min,
    COUNT(*) AS sample_size
FROM fighter_round_stats
WHERE fighter_id = ?
GROUP BY fighter_id, round_number
ORDER BY round_number;

Brier score by prediction source (model accountability)

-- Brier score per source over last 100 fights
SELECT
    p.source,
    COUNT(*) AS fights_tracked,
    AVG(ce.brier_score) AS avg_brier,
    SUM(CASE WHEN ce.prediction_correct = 1 THEN 1 ELSE 0 END) * 1.0 / COUNT(*) AS accuracy
FROM predictions p
JOIN calculated_edges ce ON p.fight_id = ce.fight_id AND p.fighter_id = ce.fighter_id
WHERE ce.prediction_correct IS NOT NULL
GROUP BY p.source
ORDER BY avg_brier ASC;

Data freshness / staleness check

-- Alert if any data source hasn't been updated in >24 hours
SELECT
    'ufcstats_events' AS source,
    MAX(created_at) AS last_update,
    ROUND((julianday('now') - julianday(MAX(created_at))) * 24, 1) AS hours_stale
FROM events
UNION ALL
SELECT
    'betting_odds',
    MAX(captured_at),
    ROUND((julianday('now') - julianday(MAX(captured_at))) * 24, 1)
FROM betting_odds
UNION ALL
SELECT
    'live_tracking',
    MAX(captured_at),
    ROUND((julianday('now') - julianday(MAX(captured_at))) * 24, 1)
FROM live_tracking
WHERE captured_at >= date('now', '-7 days');

16. Implementation Checklist

Phase 1: Foundation (Weeks 1-3)

Phase 2: Betting Data Pipeline (Weeks 3-5)

Phase 3: Prediction Ensemble (Weeks 5-7)

Phase 4: Edge Calculation & Composite (Weeks 7-9)

Phase 5: Injury, Referee & Judge Data (Weeks 9-10)

Phase 6: Automation & Live Tracking (Weeks 10-12)


17. Critical Flaws & Mitigations (Panel Review)

CRITICAL: UFCStats.com Scraping is Fragile

Severity: CRITICAL Sources: Opus, DeepSeek, Gemini, all panel models Fix:

  1. Build Sherdog/Tapology as parallel fallback scrapers from day 1
  2. Cache aggressively — historical fight stats never change after completion
  3. Hash-based change detection on page structure with immediate alerts
  4. Rate-limit to 1 req/sec, exponential backoff on 429/503
  5. Rotating residential proxies if needed (~$120/month)

CRITICAL: Composite Formula Weights Are Untested

Severity: CRITICAL Sources: Opus, Sonar, DeepSeek Fix: Launch with hand-tuned weights, but immediately set up logistic regression backtest on 2018-2024 data. Weights should be learned, not guessed. The 25/20/15/10/10/10/5/5 split is an educated starting point. Adjust before going live based on backtest results.

HIGH: Rolling Averages Without Bayesian Shrinkage

Severity: HIGH Source: Opus Fix: All rolling stats shrunk toward weight-class mean using adj_stat = (n * fighter_stat + k * class_mean) / (n + k) where k=5. Implemented in fighter_rolling_stats table as adj_* columns.

HIGH: Live Tracking Latency SLA Missing

Severity: HIGH Source: Sonar Fix: Track data_latency_ms in live_tracking. Define max staleness per market (90s for ML, 5min for method/round). Implement fallback hierarchy: Kalshi > Polymarket > Pinnacle. Flag stale captures with is_stale = 1.

HIGH: No Data Validation Layer

Severity: HIGH Source: Sonar Fix: Sanity checks on all scraped data:

HIGH: Style Classification is Categorical — Lossy

Severity: HIGH Source: Opus Fix: V1 uses categorical tags (acceptable). V2 replaces with continuous style vector stored in fighters table (columns already included in schema: style_striking_offense, style_wrestling_offense, etc.).

HIGH: BestFightOdds.com Will Block Your IP

Severity: HIGH Source: Opus Fix: Do the historical backfill as a ONE-TIME operation with delays and rotating proxies. Once historical data is cached, losing access is survivable because Odds API covers current lines.

MEDIUM: Age Cliff is a Step Function

Severity: MEDIUM Source: Opus, Gemini, Sonar Fix: V1 uses the step function with correct division groupings (see Section 7.8). V2 upgrades to sigmoid decay curve per division.

MEDIUM: Internal ELO Creates Circular Dependency with Composite

Severity: MEDIUM Source: Opus Fix: Do NOT blend ELO as one of the ensemble sources while also using the composite formula. Either: use ELO/Glicko-2 as the model (replacing the composite), or use ELO only for calibration. For v1: the composite formula IS the model. ELO is a separate validation check.

MEDIUM: Chin Degradation Step Function is Too Simple

Severity: MEDIUM Source: Opus Fix: V1 uses the step function (0/-15/-30/-50) with recency doubling. V2 upgrades to the decay function detailed in Section 7.5. Schema is ready for either approach.


18. Over-Engineered Items (Defer or Simplify)

Item Recommendation Rationale
Live 1-minute tracking for ALL contract types SIMPLIFY: ML at 1min, method/round at 5min Cuts data volume 70%, captures 90% of signal
Geometric mean as alternative MoV approach DEFER to v2: Build sample-size weighted first Building two parallel systems doubles code debt
Per-weight-class composite formula tuning DEFER to v2: Need 12+ months of live data per class Too little per-class data in year 1
67+ statistical categories from UFCStats SIMPLIFY: Start with top 20 most predictive Feature importance analysis will show which add value
Bloody Elbow staff picks as ensemble source REMOVE: Editorial turnover, poor calibration BetMMA.tips + Tapology + internal model is sufficient
Compound booster stacking to -45 ADD FLOOR: Composite cannot drop below 10 after boosters Prevents extreme values that break prediction calibration
Judge bias with 14 bias vectors per judge SIMPLIFY for v1: Track 4 key tendencies per judge, add more in v2 Small edge (<3%); not worth full complexity at launch
Full cardio curves for all historical fights SIMPLIFY: Compute on-demand via query, don't pre-store all curves The query in Section 15 computes this efficiently
5-7 prediction site scrapers simultaneously PHASE: Build BetMMA + Tapology first, add FightMatrix + Action Network after baseline is working Ensemble only improves a working model
Cardio archetype classification (Sustainer/Decliner/Cliff) SIMPLIFY: Use raw output_decline_pct in formula directly Archetypes add interpretive complexity without clear gain over the raw metric

19. Cross-Desk Reusable Components

These are worth building as shared services, not per-desk implementations:

Component Reusable For
Scraper infrastructure (rate limiting, Cloudflare handling, exponential backoff, health checks, local caching) All desks — team sports, crypto, forex
Odds aggregation pipeline (Kalshi, Polymarket, Pinnacle, DraftKings standardization, closing line calculation) All desks
Bayesian shrinkage for small-sample rolling stats Tennis (individual sport, same problem), Boxing, Golf, any individual sport
ELO/Glicko-2 rating system Tennis, Boxing, Chess/Esports, any 1v1 competition
Brier scoring + settlement pipeline All desks — build as shared service
Line movement velocity detection + steam move alerts NFL, NBA, MLB, NHL, Tennis, Boxing
Inverse-MSE ensemble weighting (better models get more weight) All desks
Judge/referee bias tracking methodology Boxing (judge bias even more impactful), NBA (referee tendencies), MLB (umpire strike zones)
RSS-based news/injury alert pipeline All sports desks (different feeds, same infrastructure)
Backtest harness (predictions → outcomes → Brier score → edge tier ROI) All desks
Feature engineering templates (percentile ranking, rolling averages, composite normalization) All desks
Live tracking SLA architecture (prioritization, fallback hierarchy, staleness detection) All desks doing live markets
Data quality log + sanity checks All desks

20. What Would Make This A+ (Panel)

[PANEL: Sonar] — Three big moves:

  1. Explicit data quality SLAs for every source: Commit to: UFCStats (check daily, alert if >24h stale), MMADecisions (manual validation on 5 random fights every Friday), judge panel (scraped <24h before fight). When data fails SLA, document it and adjust weights or skip that bet. Right now there is no accountability for data freshness. The data_quality_log table supports this, but you need a daily cron that checks it.

  2. Pre-launch production simulation: Before building anything, run the entire pipeline on a historical fight card (e.g., 5 fights from Jan 2026). Scrape all sources, calculate all edges, generate a pre-event report, then grade against actual outcomes. This catches broken scrapers, API limits, and schema surprises before real money is on the line.

  3. Explicit launch gate with pass/fail criteria: Add a hard requirement: "Launch when moneyline Brier score > 0.52 on last 50 backtested fights AND Method of Victory model backtest > 0.51." If Phase 10 is reached without these benchmarks, either the model doesn't work or it was shipped untested.

[PANEL: Opus] — Replace the composite formula with logistic regression as primary model: Keep the composite formula as a human-readable "explainability layer" but use a logistic regression model trained on the same features for actual edge calculation. A logistic regression on 2018-2024 fight outcomes using the same 8 feature groups will: (1) find the actual optimal weights, (2) reveal which features are predictive and which are noise, (3) output calibrated probabilities directly without needing to map composite scores to win probabilities, (4) be trivially updatable as new data comes in.


21. Appendices

Appendix A: Weight Class Reference

Division Upper Limit (lbs) Gender
Strawweight 115 Female
Flyweight 125 Male / Female
Bantamweight 135 Male / Female
Featherweight 145 Male
Lightweight 155 Male
Welterweight 170 Male
Middleweight 185 Male
Light Heavyweight 205 Male
Heavyweight 265 Male

Weight class groups for style matchup matrix:

Appendix B: Method Classification Reference

Method Sub-Types Betting Market
KO/TKO Punch KO, head kick KO, body shot TKO, ground & pound TKO, doctor stoppage (strikes) KO/TKO prop
Submission RNC, guillotine, armbar, triangle, heel hook, D'Arce, anaconda, etc. Submission prop
Decision - Unanimous All 3 judges score same winner Decision prop
Decision - Split 2 of 3 judges score same winner Decision prop
Decision - Majority 2 judges score winner, 1 draws Decision prop
Draw Split draw, majority draw N/A (void most bets)
No Contest Accidental foul, failed drug test N/A (void)
DQ Illegal strikes, fouls Rare / N/A

Appendix C: Style Tags

Tag Key Statistical Indicators
Boxer High SLpM, high distance %, low TD attempts
Pressure Fighter Very high SLpM, high SApM, low striking defense
Counter Striker Low SLpM, high accuracy %, high defense %
Wrestler High TD attempts, high control time, low SLpM
Grappler/BJJ High sub attempts, moderate TDs, finish via sub
Clinch Fighter High clinch strikes %, high clinch control time
Kickboxer High distance %, balanced head/body/leg strikes
Well-Rounded Balanced across all categories

Appendix D: Entity Relationship Summary

fighters ──┬── fights (as fighter1 or fighter2)
           ├── fighter_fight_stats (per-fight stats, 2 rows/fight)
           ├── fighter_round_stats (per-round stats, 2×N rows/fight)
           ├── fighter_rolling_stats (career snapshot at each fight)
           ├── fighter_weight_history (weight class + weigh-in per fight)
           ├── fighter_injuries (injury + suspension history)
           ├── betting_odds (odds per fighter per fight per book)
           ├── odds_line_movement (timestamped line changes)
           ├── predictions (per source per fighter per fight)
           └── calculated_edges (final edge calculations per fighter per fight)

events ──── fights (1:many)

fights ──┬── fight_results (1:1)
         ├── fighter_fight_stats (1:2)
         ├── fighter_round_stats (1:many)
         ├── betting_odds (1:many)
         ├── odds_line_movement (1:many)
         ├── predictions (1:many)
         ├── calculated_edges (1:2)
         ├── live_tracking (1:many)
         ├── bout_judges (1:3, for decisions)
         ├── pre_fight_judges (1:3, pre-event snapshot)
         └── fight_referees (1:1 or 1:many during build)

weight_classes ──── fights (1:many)
                    fighter_rolling_stats (1:many)
                    fighter_weight_history (1:many)
                    style_matchup_matrix (optional FK)

judges ──── bout_judges (1:many)
judges ──── pre_fight_judges (1:many)

referees ──── fight_referees (1:many)

Total data points per fight: ~220-240 (fighter stats x2 + betting odds across books + line movement + predictions across sources + calculated edges + live tracking snapshots + judge + referee data).

Appendix E: API Endpoints Reference

Service Endpoint Pattern Auth Notes
The Odds API https://api.the-odds-api.com/v4/sports/mma_mixed_martial_arts/odds API key header 500 req/mo free
Kalshi REST https://api.elections.kalshi.com/trade-api/v2/markets RSA-PSS signature Series filter: MMA
Kalshi Candlestick https://api.elections.kalshi.com/trade-api/v2/markets/{ticker}/candlesticks RSA-PSS signature Historical closing prices
Polymarket https://gamma-api.polymarket.com/markets None No account needed
UFCStats.com http://www.ufcstats.com/statistics/events/completed None HTML scrape, rate-limit aggressively
Sherdog https://www.sherdog.com/fighter/{id} None HTML scrape
Tapology https://www.tapology.com/fightcenter None HTML scrape

Appendix F: Bayesian Shrinkage Parameters

Applied to all rolling stats for fighters with <10 UFC fights:

adj_stat = (n * fighter_stat + k * class_mean) / (n + k)

where:
  n = number of UFC fights (sample size)
  k = shrinkage coefficient (how much we pull toward class mean)
  class_mean = weight-class average for this stat

Recommended k values:
  k = 5  for most stats (SLpM, SApM, accuracy, defense)
  k = 3  for high-variance stats (KO rate, finish rate)
  k = 8  for very stable stats (reach, height — these don't change)

At n=5: adj_stat is weighted 50/50 between fighter and class mean (k=5)
At n=10: adj_stat is weighted 67/33 toward fighter (k=5)
At n=20: adj_stat is weighted 80/20 toward fighter (k=5)

This is the standard Marcel projection approach used in baseball analytics, adapted for MMA's sample-size problem.


End of merged final spec — UFC/MMA Data Collection Desk


TODO: Upgrade Kalshi REST to WebSocket Feed

Status: NOT BUILT — add when this desk goes live for execution

Current state: All Kalshi data (prices, order books, trades) is fetched via REST API polling on cron schedules. This is fine for edge detection and monitoring, but NOT sufficient for live trade execution.

Why WebSocket matters:

What to build when ready:

  1. Connect to Kalshi WebSocket feed (wss://trading-api.kalshi.com/trade-api/ws/v2)
  2. Subscribe to order book channels for active tickers
  3. Stream trade events for real-time freshness (replaces REST trades polling)
  4. Feed live prices into execution engine for precise entry/exit
  5. Update Fill Quality Monitoring to measure slippage vs quoted price

Note: Sports markets use RFQ (Request for Quote) so the visible order book is usually empty — but the trade feed still matters for freshness signals and the WebSocket is required for order submission. Weather markets DO have real visible order books where this upgrade is even more critical.

Added 2026-03-29 — upgrade REST to WS when desk moves to live execution

Source: ~/edgeclaw/results/spec-panel/ufc-desk/spec-final.md