UFC/MMA Data Collection Desk — Merged Final Spec

Status: MERGED FINAL — Panel-reviewed, Opus as base, unique additions cherry-picked from all other models Source: UFC_MMA_Analytics_Full_Design.txt (primary), panel review from Opus, Gemini 3.1 Pro, Grok 4.1 Fast, Grok 4.1, DeepSeek V3.1, Sonar Reasoning Pro Date: March 2026 Purpose: Data collection spec for EdgeClaw's UFC/MMA Sports Desk. Feeds the Sports Desk analyst. Primary data source: UFCStats.com (FightMetric backend). Output format: SQLite (WAL mode).

System Context & Architecture
Why UFC/MMA Is Fundamentally Different
UFC-Specific Statistics (The Predictive Engine)
Data Sources (APIs, Scrapers, Cost Tiers)
Prediction Model Sources (Ensemble Inputs)
Full SQL Database Schema (SQLite, 16+ Tables)
Quality Composite Formula (0-100 Score)
Edge Tier Classification (S/A/B/C/D)
Method of Victory (MoV) Modeling
Round-by-Round Modeling & Cardio Curves
Live Tracking Architecture
Closing Odds & Results Grading
Implementation Roadmap (10 Steps, Build Order)
Collection Cadence & Storage Estimates
Key Queries (SQL)
Implementation Checklist
Critical Flaws & Mitigations (Panel)
Over-Engineered Items (Defer/Simplify)
Cross-Desk Reusable Components
What Would Make This A+ (Panel)
Appendices

1. System Context & Architecture

EdgeClaw is an automated, multi-sport betting analytics platform. It currently runs four sport modules: NHL, NBA, NCAAB (college basketball), and MLB. Each module tracks ~200-215 data points per game/matchup using the same core architecture:

SHARED ARCHITECTURE (all sports):

Betting Data: DraftKings opening lines, Pinnacle early + closing odds (sharpest mainstream book), Kalshi prediction market early + closing prices with contract volume
No-Vig Probabilities: Remove bookmaker margin from Pinnacle closing lines to get "true" implied probabilities. Formula: trueProb = (1/decimal) / (1/price1 + 1/price2)
Cross-Book Edge: Delta between Pinnacle no-vig probability and Kalshi probability. When these diverge, one market is mispricing
Model Ensemble: 4-7 independent prediction sources averaged for a consensus win probability. Compare model_avg to market-implied probability to find edge
Sharp Action Indicators: Reverse Line Movement (RLM), Steam Moves, Vig Squeeze
Live Tracking: Kalshi in-game line movement tracked during events
Closing Odds Snapshot: Pinnacle historical API + Kalshi candlestick API at event_start - 20 minutes
Results Grading: Final outcomes graded against both Pinnacle and Kalshi closing lines. Closing Line Value (CLV) tracked as primary performance metric
Quality Composite: Sport-specific weighted formula producing a 0-100 score
Edge Tier: S/A/B/C/D classification based on composite score + model edge vs market
Brier Score Tracking: Each prediction source scored for accuracy over time

INFRASTRUCTURE:

Team sports modules run on Google Apps Script / Google Sheets
UFC module: SQLite with WAL mode (enables concurrent reads during writes). See Section 6 for full schema
Automation via VPS cron jobs
Kalshi API for prediction market data (REST, RSA-PSS auth)
The Odds API for Pinnacle/DraftKings odds (REST, API key)

2. Why UFC/MMA Is Fundamentally Different

Every design decision flows from these structural realities:

INDIVIDUAL vs TEAM: In team sports, regression to the mean is your friend — an injured star alters the line, but the team still functions. In MMA, the "team" is a single biological entity. There are no teammates to compensate.

MULTIPLE WIN CONDITIONS: Team sports have one outcome type. MMA has KO/TKO, submission, decision (unanimous/split/majority), DQ, doctor stoppage, draw, and no contest. This creates METHOD OF VICTORY markets that are often MORE exploitable than moneyline because they require three-way probability modeling that books struggle to price efficiently.

TINY SAMPLE SIZES: An NBA team plays 82+ games/season. A UFC fighter fights 2-4 times per year. Many fighters have only 5-10 UFC fights total. Career stats are noisy, rolling averages are volatile, and style matchups matter more than raw aggregate skill.

STEEP NONLINEAR DECLINE: Fighter decline is not gradual like team sports. Chin degradation is permanent and cumulative — once a fighter has been KO'd, they are statistically more likely to be KO'd again. Below 185 lbs (Middleweight), there is a sharp statistical cliff near age 35. [See Section 7.5 for improved decay model from panel review.]

STYLE GEOMETRY > RAW TALENT: A great wrestler often beats a great striker, but a great BJJ fighter often beats a great wrestler, and a great striker often beats a great BJJ fighter. The matchup matrix is more predictive than individual rankings.

WEIGHT CLASSES = SEPARATE ECOSYSTEMS: Heavyweight (265 lbs) is essentially a different sport than Flyweight (125 lbs). All statistics and composite scores must be calculated WITHIN weight class.

JUDGING IS SUBJECTIVE: Unlike team sports, MMA decisions depend on three judges with known biases — some favor aggression/damage, others favor takedowns/control time. Judge assignment is a real edge vector.

CAGE SIZE MATTERS: The UFC Apex facility uses a 25-foot diameter cage. Standard arenas use a 30-foot cage. The smaller Apex cage dramatically increases finish rates and favors wrestlers/pressure fighters who can cut off the cage.

WEIGHT CUTS: Fighters dehydrate to make weight, then rehydrate. Severe weight cuts impair performance. Missed weight history, moving up/down divisions, and rehydration weight are all predictive variables.

NO SEASON: The UFC runs ~45 events per year, roughly weekly. Each card is independent. Data ingestion is event-driven, not season-driven.

FIRST 5 / ROUND PROPS: Over/under rounds, exact round finish, and "fight to go the distance" (GTD) props are major UFC betting markets with significant inefficiency.

3. UFC-Specific Statistics to Track

All stats derived from UFCStats.com (FightMetric). Stored as both per-fight snapshots and rolling career averages (career, last 3, last 5 fights). All rolling stats computed WITHIN weight class.

3.1 Striking (~40% of predictive variance)

Metric	Source	Predictive Value
Sig. strikes landed/min (SLpM)	UFCStats	High — primary offensive output
Sig. strikes absorbed/min (SApM)	UFCStats	High — defensive efficiency
Striking accuracy %	UFCStats	Medium — quality over volume
Striking defense %	UFCStats	High — evasiveness, longevity
Striking differential (SLpM - SApM)	Calculated	VERY HIGH — single best striking metric
Distance strikes landed %	UFCStats	Medium — range management
Clinch strikes landed %	UFCStats	Medium — infighting style
Ground strikes landed %	UFCStats	Medium — ground-and-pound capability
Head strikes landed %	UFCStats	High — KO probability proxy
Body strikes landed %	UFCStats	Medium — body work wears down opponent
Leg strikes landed %	UFCStats	Medium — mobility degradation
Knockdowns scored per 15 min	UFCStats	VERY HIGH — best KO predictor
Knockdowns received per 15 min	UFCStats	High — chin durability
First strike landed %	Calculated	Medium — fast starter indicator

[PANEL: Opus] Sig. strike accuracy BY DISTANCE (range/clinch/ground breakdown): A fighter who is 55% accurate at range but 30% in the clinch has a completely different profile than one who is 40%/50%. This directly feeds the style matchup matrix. UFCStats already provides this breakdown per fight — parse at the range level rather than aggregating. Priority: MUST-HAVE.

3.2 Grappling (~25-30% of predictive variance)

Metric	Source	Predictive Value
Takedown accuracy %	UFCStats	High — offensive wrestling success
Takedown defense %	UFCStats	VERY HIGH — keeps fight in preferred range
Takedowns landed per 15 min	UFCStats	High — wrestling volume
Control time per 15 min (sec)	UFCStats	High — top position dominance
Submission attempts per 15 min	UFCStats	Medium — submission threat level
Reversals per 15 min	UFCStats	Medium — scramble ability
Time in guard vs mount vs back	Calculated	High — positional quality of control
Clinch control time	Calculated	Medium — clinch fighting ability

[PANEL: Grok 4.1 Fast] Opponent-specific historical performance: Fighters perform differently against similar styles (e.g., a wrestler might have 70% TD success vs strikers but 40% vs grapplers). Calculate from existing UFCStats per-fight breakdowns by filtering fights by opponent style tags — no new scraping needed. Priority: MUST-HAVE.

3.3 Finishing Ability & Durability

Metric	Source	Predictive Value
KO/TKO win rate (career)	UFCStats	VERY HIGH — primary MoV input
Submission win rate (career)	UFCStats	VERY HIGH — primary MoV input
Finish rate (non-decision %)	Calculated	High — over/under rounds input
Decision rate	Calculated	High — inverse of finish rate
Average fight duration (sec)	Calculated	High — rounds prop pricing
Times knocked down (career)	UFCStats	High — chin durability proxy
KO/TKO losses (count)	UFCStats	VERY HIGH — chin degradation (nonlinear)
Absorbed strikes before finish	Calculated	High — damage tolerance trend
Late finish rate (Round 3+)	Calculated	Medium — cardio-dependent finishing

3.4 Pace, Cardio & Physical Attributes

Metric	Source	Predictive Value
Output by round (strikes/min R1-R5)	UFCStats	VERY HIGH — cardio curve modeling
Output decline % (R1 to final)	Calculated	VERY HIGH — cardio cliff detection
Championship round performance	Calculated	High — title fight specific
Reach advantage (inches)	UFCStats	High — structural striking edge
Height advantage (inches)	UFCStats	Medium — correlated with reach
Leg reach (inches)	UFCStats	Medium — kicks/distance management
Age at fight time	Calculated	High — decline curve input
Days since last fight	Calculated	High — ring rust / recovery
Weight class change history	Scraped	High — weight cut impact
Missed weight history	Scraped	Medium — discipline / stress
Stance (orthodox/southpaw/switch)	UFCStats	Medium — stylistic matchup factor

[PANEL: Gemini 3.1 Pro] Stance Matchup Matrix (Southpaw vs Orthodox): The spec tracks stance, but not the matchup. Open stance (Southpaw vs Orthodox) fundamentally changes striking geometry (body kicks, lead hand control) and significantly alters SLpM/SApM baselines. Calculated from existing UFCStats data. Priority: MUST-HAVE.

3.5 Style Classification

Each fighter is tagged with a primary and secondary style based on their statistical profile:

Style Tag	Definition	Key Indicators
Boxer	Strikes at distance	High SLpM, high distance %, low TD attempts
Pressure Fighter	High-volume forward pressure	Very high SLpM, high SApM, low striking defense
Counter Striker	Low volume, high accuracy	Low SLpM, high accuracy %, high defense %
Wrestler	Takedown-centric control	High TD attempts, high control time, low SLpM
Grappler/BJJ	Submission-focused	High sub attempts, moderate TDs, finish via sub
Clinch Fighter	Inside fighting / dirty boxing	High clinch strikes %, high clinch control time
Kickboxer	Diverse striking with kicks	High distance %, balanced head/body/leg strikes
Well-Rounded	No dominant dimension	Balanced across all categories

A style_matchup_matrix stores historical win rates for each style pairing within each weight class (e.g., Wrestler vs Boxer at Lightweight has a known edge historically).

[PANEL: Opus] Style classification should be CONTINUOUS, not categorical: Replace categorical tags with a continuous style vector — [striking_offense, striking_defense, wrestling_offense, wrestling_defense, submission_offense, submission_defense, clinch_work, pressure_rate] where each dimension is 0-100 based on actual stats. The matchup matrix then operates on vector distances rather than category lookups. This eliminates the subjective "who decides the tag" problem. Implement as v2 after categorical tags are working.

3.6 Cardio Archetypes

Based on output decline percentage from Round 1 to final round:

Sustainers: <10% decline over 3 rounds (elite cardio)
Gradual Decliners: 10-25% decline (normal)
Cardio Cliffs: >25% decline (gas completely in R3+)

Cardio Cliff fighters are where over/under rounds and late-round finish props are most exploitable.

[PANEL: DeepSeek V3.1] Output under duress: Current round-by-round output doesn't account for opponent pressure. Track "output under pressure" flag — strikes output when opponent has attempted a takedown or is in chase mode. Source: FightMetric/UFCStats granular data if available. Tag fights where one fighter was significantly ahead on output vs. when being pressured.

4. Data Sources

4.1 Fighter Stats & Career Data (Primary)

Source	Data	Access	Cost
UFCStats.com (FightMetric)	Official UFC stats: strikes, TDs, control time, per-round breakdowns, 67+ categories	Scrape (HTML)	FREE
Sherdog.com	Fighter records, results, event history, pre-UFC record	Scrape (HTML)	FREE
Tapology.com	Fighter records, rankings, event cards, weigh-in results, regional/pre-UFC data	Scrape (HTML)	FREE
ESPN MMA / UFC.com	Fighter profiles, confirmed cards, official rankings	Scrape/RSS	FREE

[PANEL: Opus] UFCStats.com scraping is the SINGLE POINT OF FAILURE. Its HTML structure changes periodically and it has no API. MUST build secondary scraper for Sherdog/Tapology capturing at minimum: records, results, basic striking/grappling. Cache aggressively — once a fight's stats are scraped, they never change. Implement hash-based change detection on page structure to alert on breaking changes within minutes, not days.

[PANEL: DeepSeek V3.1] UFCStats uses Cloudflare protection. Budget for rotating residential proxies (~$120/month) if you hit blocks. Rate-limit to 1 req/sec, exponential backoff on 429/503.

4.2 Advanced Metrics & Judging

Source	Data	Access	Cost
MMADecisions.com	Judge scorecards, judge-by-judge tendencies, split decision history, bias analysis	Scrape	FREE
Verdict MMA	Community global scorecards, fan consensus vs judges	API/Scrape	FREE tier
FightOdds.io	Historical odds, line movement	Scrape	FREE
BestFightOdds.com	Multi-book odds comparison, historical closing lines	Scrape	FREE

[PANEL: Sonar Reasoning Pro] MMADecisions.com may be incomplete or poorly maintained for recent fights. This is a single-point dependency for judge data. Add: (1) UFC.com scorecards (scrape if possible), (2) backup from MMAJunkie if they republish scores, (3) manual spot-check for data freshness on build start.

4.3 Betting Data (Same Pipeline as Team Sports)

Source	Data	Access	Cost
The Odds API	Multi-book aggregation: DK, Pinnacle, FanDuel, BetMGM	REST API	FREE tier (500 req/mo)
Pinnacle	Sharpest book. Early + closing moneyline, method, rounds	Historical API	FREE
Kalshi	Prediction market: ML, method, rounds. Candlestick API for historical closing prices	REST API	FREE (account)
Polymarket	Prediction market: event markets, prices, volume, on-chain order data	Free API	FREE (no account)
DraftKings	Opening moneyline, method, round, props	Manual/Scrape	FREE (account)

[PANEL: Grok 4.1 Fast] Free tier of Odds API limits to 500 calls/month. Upgrade to paid ($10-20/mo) or build custom scrapers for Pinnacle/DK as primary; treat Odds API as backup only.

[PANEL: Opus] Betting line movement history / steam moves: Track line movement velocity, not just opening and closing. Scrape BestFightOdds every 6 hours pre-fight week, every 1 hour fight week, every 15 min post-weigh-in. Store as time series in odds_line_movement table. The Market Resistance Filter requires this underlying movement data to function. Priority: MUST-HAVE.

4.4 Situational & News

Source	Data	Access	Cost
Sherdog / Tapology	Confirmed fight cards, weigh-in results, cancellations	Scrape	FREE
MMA Junkie / MMA Fighting	News, camp reports, sparring reports, injury updates	RSS/Scrape	FREE
RotoWire / FantasyPros	Injury reports, fight previews	Scrape	FREE

[PANEL: Opus] Fighter injury / medical suspension history: UFC mandates medical suspensions after fights (30-180 days). A fighter returning from a 180-day medical suspension for a broken orbital bone is materially different from routine 30-day. The spec tracks "days since last fight" but not WHY the layoff occurred. Source: State athletic commission reports (public, ABC MMA website), MMA Junkie injury RSS, UFC event medical suspension lists. Priority: MUST-HAVE.

[PANEL: Opus] Late replacements / short-notice fights: Short-notice replacements (<4 weeks camp) are one of the most exploitable edges in UFC betting. The spec mentions "short camp" as a penalty but doesn't track WHO replaced whom, notice period in days, or whether the replacement was moving up/down in weight. Source: Sherdog/Tapology event pages (scrape bout history changes), MMA Junkie news RSS. Cross-reference announced date vs event date to calculate camp length. Priority: MUST-HAVE.

[PANEL: Sonar Reasoning Pro] Pre-fight press conference quotes and opponent game plan signals provide directional signal for method props and round overs that historical style win rates miss. Scrape pre-fight quotes from MMA Junkie, The Athletic. Process with basic intent extraction. Priority: MUST-HAVE.

[PANEL: Opus] Pre-UFC / regional record quality adjustment: A 10-0 fighter whose wins came against 2-5 regional opponents is fundamentally different from a 10-0 fighter who beat ranked Bellator/ONE fighters. Apply Bayesian shrinkage toward weight class average for fighters with <5 UFC bouts. Source: Sherdog full records, Tapology regional promotion quality tiers. Priority: NICE-TO-HAVE.

5. Prediction Model Sources (Ensemble Inputs)

Unlike team sports (Sagarin, Massey, DRatings, Dimers), MMA has LIMITED free prediction sources. Target: 5-7 independent sources minimum.

Source	Type	Outputs	Access
BetMMA.tips	Community prediction platform with verified track records	Win %, method %, crowd consensus	Scrape
Tapology.com predictions	Community + staff picks	Win %, method predictions	Scrape
FightMatrix.com	ELO-based fighter rankings	Implied win prob from ELO diff	Scrape
Action Network UFC	Staff model predictions	Win %, best bets	Scrape
Bloody Elbow staff picks	Expert panel picks (5-8 writers)	Win % from pick consensus	Scrape
MMABettingOdds.com / FightOdds.io	Market-implied from closing lines	No-vig probability baseline	Scrape
Internal ELO/Glicko-2 model	EdgeClaw internal model trained on UFCStats backfill	Win probability from rating diff	Built in-house

[PANEL: Opus] Remove Bloody Elbow from prediction sources: Bloody Elbow's editorial staff has turned over significantly and the site's reliability as a prediction source is low. Individual writer picks from media outlets have poor calibration and no accountability. Replace with: BetMMA.tips (which has verified, tracked records) and Tapology community consensus. Two well-calibrated external sources plus Pinnacle closing line plus the internal model is sufficient. Adding noisy sources degrades the ensemble.

The ensemble average (model_avg_prob) is compared to Pinnacle closing no-vig probability to find edge. After 100+ fights of tracking, switch from equal-weight averaging to inverse-MSE weighting (better models get more weight based on Brier score accuracy).

[PANEL: Opus] The internal ELO/Glicko-2 model should NOT be blended naively with the composite. The model's predictions are partially derived from the same inputs as the composite — this creates circular dependency. Either: (1) use ELO/Glicko-2 as a REPLACEMENT for the composite, or (2) keep the composite as the primary model and use external predictions only for calibration/validation.

6. Full SQL Database Schema (SQLite with WAL Mode)

SQLite setup note: Enable WAL mode immediately on DB creation: PRAGMA journal_mode=WAL;. This allows concurrent reads during writes. All 16 tables below use SQLite-compatible types (INTEGER, REAL, TEXT, BLOB) instead of PostgreSQL ENUMs.

[PANEL: Gemini 3.1 Pro] Fighter entity disambiguation: Implement unique fighter IDs (UFCStats/Tapology/Sherdog crosswalk) immediately. Always use IDs, never fighter names, as join keys. Fighter name collisions and misspellings will corrupt data if not handled at schema level.

PRAGMA journal_mode=WAL;
PRAGMA foreign_keys=ON;

-- ============================================================
-- 6.1 weight_classes
-- ============================================================
CREATE TABLE IF NOT EXISTS weight_classes (
    weight_class_id  INTEGER PRIMARY KEY AUTOINCREMENT,
    name             TEXT NOT NULL UNIQUE,
    upper_limit_lbs  REAL NOT NULL,
    gender           TEXT DEFAULT 'male' CHECK(gender IN ('male','female'))
);
-- Seed:
-- Strawweight 115 female, Flyweight 125 male/female, Bantamweight 135 male/female,
-- Featherweight 145 male, Lightweight 155 male, Welterweight 170 male,
-- Middleweight 185 male, Light Heavyweight 205 male, Heavyweight 265 male

-- ============================================================
-- 6.2 fighters
-- ============================================================
CREATE TABLE IF NOT EXISTS fighters (
    fighter_id        INTEGER PRIMARY KEY AUTOINCREMENT,
    -- Cross-platform IDs (entity disambiguation)
    ufcstats_id       TEXT UNIQUE,            -- UFCStats internal fighter ID
    sherdog_id        TEXT,                   -- Sherdog URL slug
    tapology_id       TEXT,                   -- Tapology URL slug
    -- Identity
    name              TEXT NOT NULL,
    nickname          TEXT,
    date_of_birth     TEXT,                   -- ISO 8601 date
    nationality       TEXT,
    home_city         TEXT,                   -- [PANEL: Opus] for travel distance calc
    -- Physical
    height_inches     REAL,
    reach_inches      REAL,
    leg_reach_inches  REAL,
    stance            TEXT CHECK(stance IN ('orthodox','southpaw','switch')),
    -- Style (categorical for v1, continuous vector for v2)
    primary_style     TEXT,
    secondary_style   TEXT,
    -- Style vector (v2 — leave NULL for now)
    style_striking_offense    REAL,           -- 0-100
    style_striking_defense    REAL,
    style_wrestling_offense   REAL,
    style_wrestling_defense   REAL,
    style_submission_offense  REAL,
    style_submission_defense  REAL,
    style_clinch_work         REAL,
    style_pressure_rate       REAL,
    -- Camp / Training
    team_gym          TEXT,
    previous_gym      TEXT,
    gym_change_date   TEXT,                   -- ISO 8601 date
    is_elite_gym      INTEGER DEFAULT 0,      -- 1 = top camp (ATT, CKB, Sanford, etc.)
    -- Record aggregates
    wins              INTEGER DEFAULT 0,
    losses            INTEGER DEFAULT 0,
    draws             INTEGER DEFAULT 0,
    no_contests       INTEGER DEFAULT 0,
    ko_wins           INTEGER DEFAULT 0,
    sub_wins          INTEGER DEFAULT 0,
    dec_wins          INTEGER DEFAULT 0,
    ko_losses         INTEGER DEFAULT 0,
    sub_losses        INTEGER DEFAULT 0,
    dec_losses        INTEGER DEFAULT 0,
    -- UFC-specific record
    ufc_wins          INTEGER DEFAULT 0,
    ufc_losses        INTEGER DEFAULT 0,
    ufc_debut_date    TEXT,
    last_fight_date   TEXT,
    is_active         INTEGER DEFAULT 1,
    -- Source URLs
    sherdog_url       TEXT,
    ufcstats_url      TEXT,
    tapology_url      TEXT,
    -- Timestamps
    created_at        TEXT DEFAULT (datetime('now')),
    updated_at        TEXT DEFAULT (datetime('now'))
);

CREATE INDEX IF NOT EXISTS idx_fighters_name ON fighters(name);
CREATE INDEX IF NOT EXISTS idx_fighters_active ON fighters(is_active);
CREATE INDEX IF NOT EXISTS idx_fighters_ufcstats_id ON fighters(ufcstats_id);

-- ============================================================
-- 6.3 events
-- ============================================================
CREATE TABLE IF NOT EXISTS events (
    event_id          INTEGER PRIMARY KEY AUTOINCREMENT,
    ufcstats_event_id TEXT UNIQUE,
    name              TEXT NOT NULL,
    event_type        TEXT CHECK(event_type IN ('ppv','fight_night','contender_series','dwcs')),
    event_date        TEXT NOT NULL,           -- ISO 8601 date
    venue             TEXT,
    city              TEXT,
    state_country     TEXT,
    elevation_ft      INTEGER,                 -- [PANEL: Opus/Gemini] altitude affects cardio
    is_apex           INTEGER DEFAULT 0,       -- 1 = UFC Apex (25ft cage)
    cage_size_ft      INTEGER DEFAULT 30,      -- 25 for Apex, 30 standard
    total_bouts       INTEGER,
    attendance        INTEGER,                 -- [PANEL: Sonar] crowd size effect
    ufcstats_url      TEXT,
    created_at        TEXT DEFAULT (datetime('now'))
);

CREATE INDEX IF NOT EXISTS idx_events_date ON events(event_date);
CREATE INDEX IF NOT EXISTS idx_events_type ON events(event_type);

-- ============================================================
-- 6.4 fights
-- ============================================================
CREATE TABLE IF NOT EXISTS fights (
    fight_id              INTEGER PRIMARY KEY AUTOINCREMENT,
    ufcstats_fight_id     TEXT UNIQUE,
    event_id              INTEGER NOT NULL REFERENCES events(event_id),
    fighter1_id           INTEGER NOT NULL REFERENCES fighters(fighter_id),
    fighter2_id           INTEGER NOT NULL REFERENCES fighters(fighter_id),
    weight_class_id       INTEGER REFERENCES weight_classes(weight_class_id),
    scheduled_rounds      INTEGER CHECK(scheduled_rounds IN (3,5)) DEFAULT 3,
    is_title_fight        INTEGER DEFAULT 0,
    is_main_event         INTEGER DEFAULT 0,
    is_co_main            INTEGER DEFAULT 0,
    card_position         TEXT CHECK(card_position IN ('main_card','prelim','early_prelim')),
    bout_order            INTEGER,
    catchweight_lbs       REAL,
    fighter1_weigh_in     REAL,
    fighter2_weigh_in     REAL,
    fighter1_missed_weight INTEGER DEFAULT 0,
    fighter2_missed_weight INTEGER DEFAULT 0,
    -- Referee (tracked for stoppage tendency analysis)
    referee_name          TEXT,               -- [PANEL: Gemini/GPT-4.1/Sonar]
    -- Replacement tracking
    is_late_replacement   INTEGER DEFAULT 0,  -- [PANEL: Opus/GPT-4.1]
    replacement_fighter_id INTEGER REFERENCES fighters(fighter_id), -- who replaced
    original_fighter_name TEXT,               -- who was originally scheduled
    notice_days           INTEGER,            -- days of camp (announcement to fight)
    fight_status          TEXT CHECK(fight_status IN ('scheduled','completed','cancelled','no_contest')) DEFAULT 'scheduled',
    created_at            TEXT DEFAULT (datetime('now'))
);

CREATE INDEX IF NOT EXISTS idx_fights_event ON fights(event_id);
CREATE INDEX IF NOT EXISTS idx_fights_f1 ON fights(fighter1_id);
CREATE INDEX IF NOT EXISTS idx_fights_f2 ON fights(fighter2_id);
CREATE INDEX IF NOT EXISTS idx_fights_wc ON fights(weight_class_id);
CREATE INDEX IF NOT EXISTS idx_fights_status ON fights(fight_status);

-- ============================================================
-- 6.5 fight_results
-- ============================================================
CREATE TABLE IF NOT EXISTS fight_results (
    result_id             INTEGER PRIMARY KEY AUTOINCREMENT,
    fight_id              INTEGER NOT NULL UNIQUE REFERENCES fights(fight_id),
    winner_id             INTEGER REFERENCES fighters(fighter_id),
    loser_id              INTEGER REFERENCES fighters(fighter_id),
    method                TEXT NOT NULL,
        -- 'KO/TKO', 'Submission', 'Decision - Unanimous',
        -- 'Decision - Split', 'Decision - Majority', 'DQ',
        -- 'Doctor Stoppage', 'Draw', 'No Contest'
    method_detail         TEXT,    -- 'rear naked choke', 'head kick', 'body shot', etc
    finish_round          INTEGER,
    finish_time_sec       INTEGER,
    total_fight_time_sec  INTEGER,
    went_to_decision      INTEGER DEFAULT 0,
    performance_bonus     TEXT,    -- 'FOTN', 'POTN', NULL
    created_at            TEXT DEFAULT (datetime('now'))
);

CREATE INDEX IF NOT EXISTS idx_results_fight ON fight_results(fight_id);
CREATE INDEX IF NOT EXISTS idx_results_winner ON fight_results(winner_id);
CREATE INDEX IF NOT EXISTS idx_results_method ON fight_results(method);

-- ============================================================
-- 6.6 judges
-- ============================================================
CREATE TABLE IF NOT EXISTS judges (
    judge_id              INTEGER PRIMARY KEY AUTOINCREMENT,
    name                  TEXT NOT NULL UNIQUE,
    home_state            TEXT,
    total_fights_judged   INTEGER DEFAULT 0,
    pct_unanimous         REAL,
    pct_split             REAL,
    pct_majority          REAL,
    pct_favor_aggressor   REAL,   -- % of close rounds given to higher-output fighter
    pct_favor_control     REAL,   -- % given to fighter with more control time
    pct_favor_damage      REAL,   -- % given to fighter with more knockdowns
    pct_favor_wrestler    REAL,   -- tendency to reward takedowns + control
    agreement_rate        REAL,   -- % agreement with other two judges
    controversial_rate    REAL,   -- % of fights with media/community disagreement
    created_at            TEXT DEFAULT (datetime('now')),
    updated_at            TEXT DEFAULT (datetime('now'))
);

-- ============================================================
-- 6.7 bout_judges (junction table)
-- ============================================================
CREATE TABLE IF NOT EXISTS bout_judges (
    fight_id              INTEGER NOT NULL REFERENCES fights(fight_id),
    judge_id              INTEGER NOT NULL REFERENCES judges(judge_id),
    score_fighter1        TEXT,   -- e.g. '29-28'
    score_fighter2        TEXT,   -- e.g. '28-29'
    PRIMARY KEY (fight_id, judge_id)
);

-- [PANEL: Sonar] Pre-fight judge panel snapshot table
-- When judges are announced (1-2 days before event), store for pre-fight edge calc
CREATE TABLE IF NOT EXISTS pre_fight_judges (
    snapshot_id           INTEGER PRIMARY KEY AUTOINCREMENT,
    fight_id              INTEGER NOT NULL REFERENCES fights(fight_id),
    judge_id              INTEGER NOT NULL REFERENCES judges(judge_id),
    captured_at           TEXT NOT NULL,
    UNIQUE(fight_id, judge_id)
);

-- ============================================================
-- 6.8 fighter_fight_stats
-- ============================================================
-- Per-fighter, per-fight statistics. Two rows per fight (one per fighter).
CREATE TABLE IF NOT EXISTS fighter_fight_stats (
    stat_id                   INTEGER PRIMARY KEY AUTOINCREMENT,
    fight_id                  INTEGER NOT NULL REFERENCES fights(fight_id),
    fighter_id                INTEGER NOT NULL REFERENCES fighters(fighter_id),
    -- Striking
    sig_strikes_landed        INTEGER,
    sig_strikes_attempted     INTEGER,
    sig_strikes_absorbed      INTEGER,
    total_strikes_landed      INTEGER,
    total_strikes_attempted   INTEGER,
    knockdowns_scored         INTEGER DEFAULT 0,
    knockdowns_received       INTEGER DEFAULT 0,
    head_strikes_landed       INTEGER,
    head_strikes_attempted    INTEGER,
    body_strikes_landed       INTEGER,
    body_strikes_attempted    INTEGER,
    leg_strikes_landed        INTEGER,
    leg_strikes_attempted     INTEGER,
    distance_strikes_landed   INTEGER,
    distance_strikes_att      INTEGER,
    clinch_strikes_landed     INTEGER,
    clinch_strikes_att        INTEGER,
    ground_strikes_landed     INTEGER,
    ground_strikes_att        INTEGER,
    -- [PANEL: Opus] Accuracy by range (MUST-HAVE — already in UFCStats, just parse it)
    distance_accuracy_pct     REAL,   -- sig strikes at range: landed/attempted
    clinch_accuracy_pct       REAL,   -- sig strikes in clinch: landed/attempted
    ground_accuracy_pct       REAL,   -- sig strikes on ground: landed/attempted
    -- Grappling
    takedowns_landed          INTEGER,
    takedowns_attempted       INTEGER,
    takedowns_defended        INTEGER,
    submission_attempts       INTEGER DEFAULT 0,
    reversals                 INTEGER DEFAULT 0,
    control_time_sec          INTEGER DEFAULT 0,
    -- Calculated per-minute rates (denormalized for query speed)
    sig_strikes_per_min       REAL,
    sig_strikes_abs_per_min   REAL,
    striking_accuracy_pct     REAL,
    striking_defense_pct      REAL,
    takedown_accuracy_pct     REAL,
    takedown_defense_pct      REAL,
    control_time_per_min      REAL,
    -- Meta
    fight_time_sec            INTEGER,
    created_at                TEXT DEFAULT (datetime('now')),
    UNIQUE(fight_id, fighter_id)
);

CREATE INDEX IF NOT EXISTS idx_ffs_fight ON fighter_fight_stats(fight_id);
CREATE INDEX IF NOT EXISTS idx_ffs_fighter ON fighter_fight_stats(fighter_id);

-- ============================================================
-- 6.9 fighter_round_stats
-- ============================================================
CREATE TABLE IF NOT EXISTS fighter_round_stats (
    round_stat_id             INTEGER PRIMARY KEY AUTOINCREMENT,
    fight_id                  INTEGER NOT NULL REFERENCES fights(fight_id),
    fighter_id                INTEGER NOT NULL REFERENCES fighters(fighter_id),
    round_number              INTEGER NOT NULL,
    sig_strikes_landed        INTEGER,
    sig_strikes_attempted     INTEGER,
    sig_strikes_absorbed      INTEGER,
    knockdowns_scored         INTEGER DEFAULT 0,
    knockdowns_received       INTEGER DEFAULT 0,
    takedowns_landed          INTEGER,
    takedowns_attempted       INTEGER,
    submission_attempts       INTEGER DEFAULT 0,
    reversals                 INTEGER DEFAULT 0,
    control_time_sec          INTEGER DEFAULT 0,
    head_strikes_landed       INTEGER,
    body_strikes_landed       INTEGER,
    leg_strikes_landed        INTEGER,
    distance_strikes_landed   INTEGER,
    clinch_strikes_landed     INTEGER,
    ground_strikes_landed     INTEGER,
    round_duration_sec        INTEGER DEFAULT 300,
    created_at                TEXT DEFAULT (datetime('now')),
    UNIQUE(fight_id, fighter_id, round_number)
);

CREATE INDEX IF NOT EXISTS idx_frs_fight ON fighter_round_stats(fight_id);
CREATE INDEX IF NOT EXISTS idx_frs_fighter ON fighter_round_stats(fighter_id);
CREATE INDEX IF NOT EXISTS idx_frs_round ON fighter_round_stats(round_number);

-- ============================================================
-- 6.10 fighter_rolling_stats
-- ============================================================
-- Pre-computed rolling career averages for each fighter AT THE TIME of
-- each fight. Avoids recalculating from raw stats on every query.
-- Updated BEFORE each fight via ETL. Enables point-in-time backtesting
-- without future data leak.
CREATE TABLE IF NOT EXISTS fighter_rolling_stats (
    rolling_id                INTEGER PRIMARY KEY AUTOINCREMENT,
    fighter_id                INTEGER NOT NULL REFERENCES fighters(fighter_id),
    as_of_fight_id            INTEGER NOT NULL REFERENCES fights(fight_id),
    as_of_date                TEXT NOT NULL,
    num_fights                INTEGER,
    num_ufc_fights            INTEGER,
    -- Rolling averages (career, last 3, last 5)
    career_slpm               REAL,
    career_sapm               REAL,
    career_str_acc            REAL,
    career_str_def            REAL,
    career_td_acc             REAL,
    career_td_def             REAL,
    career_sub_att_per15      REAL,
    career_ctrl_per15         REAL,
    career_kd_per15           REAL,
    last3_slpm                REAL,
    last3_sapm                REAL,
    last3_str_acc             REAL,
    last3_str_def             REAL,
    last3_td_acc              REAL,
    last3_td_def              REAL,
    last5_slpm                REAL,
    last5_sapm                REAL,
    -- [PANEL: Opus] Bayesian shrinkage applied values (shrunk toward weight class mean)
    -- Formula: adj_stat = (n * fighter_stat + k * class_mean) / (n + k), k=5
    adj_slpm                  REAL,   -- shrinkage-adjusted SLpM
    adj_sapm                  REAL,
    adj_str_acc               REAL,
    adj_td_acc                REAL,
    adj_td_def                REAL,
    adj_finish_rate           REAL,
    -- Finishing rates
    career_ko_rate            REAL,
    career_sub_rate           REAL,
    career_dec_rate           REAL,
    career_finish_rate        REAL,
    -- Durability
    career_kd_absorbed_per15  REAL,
    career_ko_losses          INTEGER,
    -- Cardio (output decline)
    avg_r1_output             REAL,
    avg_r2_output             REAL,
    avg_r3_output             REAL,
    avg_r4_output             REAL,   -- NULL if never fought 4+ rounds
    avg_r5_output             REAL,   -- NULL if never fought 5 rounds
    output_decline_pct        REAL,
    -- Physical at fight time
    age_at_fight              REAL,
    days_since_last_fight     INTEGER,
    weight_class_id           INTEGER REFERENCES weight_classes(weight_class_id),
    created_at                TEXT DEFAULT (datetime('now')),
    UNIQUE(fighter_id, as_of_fight_id)
);

CREATE INDEX IF NOT EXISTS idx_rolling_fighter ON fighter_rolling_stats(fighter_id);
CREATE INDEX IF NOT EXISTS idx_rolling_date ON fighter_rolling_stats(as_of_date);

-- ============================================================
-- 6.11 fighter_weight_history
-- ============================================================
CREATE TABLE IF NOT EXISTS fighter_weight_history (
    history_id        INTEGER PRIMARY KEY AUTOINCREMENT,
    fighter_id        INTEGER NOT NULL REFERENCES fighters(fighter_id),
    fight_id          INTEGER NOT NULL REFERENCES fights(fight_id),
    weight_class_id   INTEGER REFERENCES weight_classes(weight_class_id),
    weigh_in_lbs      REAL,
    missed_weight     INTEGER DEFAULT 0,
    missed_weight_lbs REAL,
    rehydration_est   REAL,
    moved_up          INTEGER DEFAULT 0,
    moved_down        INTEGER DEFAULT 0,
    UNIQUE(fighter_id, fight_id)
);

CREATE INDEX IF NOT EXISTS idx_wh_fighter ON fighter_weight_history(fighter_id);

-- ============================================================
-- 6.12 style_matchup_matrix
-- ============================================================
CREATE TABLE IF NOT EXISTS style_matchup_matrix (
    matchup_id        INTEGER PRIMARY KEY AUTOINCREMENT,
    style_a           TEXT NOT NULL,
    style_b           TEXT NOT NULL,
    weight_class_id   INTEGER REFERENCES weight_classes(weight_class_id),
    weight_class_group TEXT,     -- [PANEL: Gemini] 'lower'(125-145), 'middle'(155-170), 'upper'(185-265)
    style_a_win_rate  REAL,
    sample_size       INTEGER,
    last_updated      TEXT,
    UNIQUE(style_a, style_b, weight_class_id)
);

-- [PANEL: Gemini] Group weight classes for style matchup to get statistically significant sample sizes.
-- Calculating 'Wrestler vs Boxer at Lightweight' alone yields tiny samples (8-12 fights).
-- Group: lower (125-145), middle (155-170), upper (185-265).

-- ============================================================
-- 6.13 betting_odds
-- ============================================================
CREATE TABLE IF NOT EXISTS betting_odds (
    odds_id           INTEGER PRIMARY KEY AUTOINCREMENT,
    fight_id          INTEGER NOT NULL REFERENCES fights(fight_id),
    fighter_id        INTEGER NOT NULL REFERENCES fighters(fighter_id),
    book              TEXT NOT NULL CHECK(book IN ('draftkings','pinnacle','kalshi','polymarket','fanduel','betmgm','betmma','tapology')),
    market_type       TEXT NOT NULL,
        -- 'moneyline', 'method_ko', 'method_sub', 'method_dec',
        -- 'over_rounds', 'under_rounds', 'round_1', 'round_2', 'round_3',
        -- 'fight_gtd', 'itd'
    odds_american     INTEGER,
    odds_decimal      REAL,
    implied_prob      REAL,
    line_value        REAL,          -- for over/under (e.g., 2.5 rounds)
    snapshot_type     TEXT CHECK(snapshot_type IN ('opening','early','midweek','closing')),
    volume            INTEGER,       -- Kalshi contract volume
    captured_at       TEXT NOT NULL,
    created_at        TEXT DEFAULT (datetime('now'))
);

CREATE INDEX IF NOT EXISTS idx_odds_fight ON betting_odds(fight_id);
CREATE INDEX IF NOT EXISTS idx_odds_fighter ON betting_odds(fighter_id);
CREATE INDEX IF NOT EXISTS idx_odds_book ON betting_odds(book);
CREATE INDEX IF NOT EXISTS idx_odds_market ON betting_odds(market_type);
CREATE INDEX IF NOT EXISTS idx_odds_snapshot ON betting_odds(snapshot_type);
CREATE INDEX IF NOT EXISTS idx_odds_captured ON betting_odds(captured_at);

-- [PANEL: Opus] Line movement history table (MUST-HAVE)
-- Track line velocity, not just opening/closing.
CREATE TABLE IF NOT EXISTS odds_line_movement (
    movement_id       INTEGER PRIMARY KEY AUTOINCREMENT,
    fight_id          INTEGER NOT NULL REFERENCES fights(fight_id),
    fighter_id        INTEGER NOT NULL REFERENCES fighters(fighter_id),
    book              TEXT NOT NULL,
    market_type       TEXT NOT NULL,
    odds_american     INTEGER,
    implied_prob      REAL,
    movement_magnitude REAL,        -- delta from previous snapshot
    captured_at       TEXT NOT NULL,
    hours_to_event    REAL,         -- negative = post-event
    created_at        TEXT DEFAULT (datetime('now'))
);

CREATE INDEX IF NOT EXISTS idx_lm_fight ON odds_line_movement(fight_id);
CREATE INDEX IF NOT EXISTS idx_lm_captured ON odds_line_movement(captured_at);

-- ============================================================
-- 6.14 predictions
-- ============================================================
CREATE TABLE IF NOT EXISTS predictions (
    prediction_id     INTEGER PRIMARY KEY AUTOINCREMENT,
    fight_id          INTEGER NOT NULL REFERENCES fights(fight_id),
    fighter_id        INTEGER NOT NULL REFERENCES fighters(fighter_id),
    source            TEXT NOT NULL,
        -- 'betmma', 'tapology', 'fightmatrix', 'action_network',
        -- 'edgeclaw_elo', 'market_implied'
    win_prob          REAL,
    ko_prob           REAL,
    sub_prob          REAL,
    dec_prob          REAL,
    confidence        REAL,
    captured_at       TEXT NOT NULL,
    created_at        TEXT DEFAULT (datetime('now'))
);

CREATE INDEX IF NOT EXISTS idx_pred_fight ON predictions(fight_id);
CREATE INDEX IF NOT EXISTS idx_pred_source ON predictions(source);

-- ============================================================
-- 6.15 calculated_edges
-- ============================================================
CREATE TABLE IF NOT EXISTS calculated_edges (
    edge_id              INTEGER PRIMARY KEY AUTOINCREMENT,
    fight_id             INTEGER NOT NULL REFERENCES fights(fight_id),
    fighter_id           INTEGER NOT NULL REFERENCES fighters(fighter_id),
    -- No-vig probabilities
    pinnacle_novig_prob  REAL,
    dk_novig_prob        REAL,
    kalshi_novig_prob    REAL,
    -- Model ensemble
    model_avg_prob       REAL,
    model_median_prob    REAL,
    model_count          INTEGER,
    model_std_dev        REAL,
    -- Edge calculations
    edge_vs_pinnacle     REAL,
    edge_vs_kalshi       REAL,
    edge_vs_dk           REAL,
    cross_book_edge      REAL,
    -- Sharp action indicators
    reverse_line_movement INTEGER DEFAULT 0,
    steam_move           INTEGER DEFAULT 0,
    vig_squeeze          INTEGER DEFAULT 0,
    kalshi_volume_surge  INTEGER DEFAULT 0,
    line_move_direction  TEXT,
    line_move_magnitude  REAL,
    line_move_velocity   REAL,       -- [PANEL: Opus] rate of change per hour
    -- Method edges
    model_ko_prob        REAL,
    model_sub_prob       REAL,
    model_dec_prob       REAL,
    market_ko_prob       REAL,
    market_sub_prob      REAL,
    market_dec_prob      REAL,
    ko_edge              REAL,
    sub_edge             REAL,
    dec_edge             REAL,
    -- Composite & tier
    quality_composite    REAL,
    edge_tier            TEXT CHECK(edge_tier IN ('S','A','B','C','D')),
    -- Post-fight grading
    clv_pinnacle         REAL,
    clv_kalshi           REAL,
    prediction_correct   INTEGER,
    method_correct       INTEGER,
    brier_score          REAL,       -- per-prediction Brier score
    created_at           TEXT DEFAULT (datetime('now')),
    updated_at           TEXT DEFAULT (datetime('now')),
    UNIQUE(fight_id, fighter_id)
);

CREATE INDEX IF NOT EXISTS idx_edges_fight ON calculated_edges(fight_id);
CREATE INDEX IF NOT EXISTS idx_edges_fighter ON calculated_edges(fighter_id);
CREATE INDEX IF NOT EXISTS idx_edges_tier ON calculated_edges(edge_tier);
CREATE INDEX IF NOT EXISTS idx_edges_composite ON calculated_edges(quality_composite);

-- ============================================================
-- 6.16 live_tracking
-- ============================================================
-- WAL mode handles concurrent writes. Batch writes every 1 min for ML,
-- every 5 min for method/round props to reduce contention.
CREATE TABLE IF NOT EXISTS live_tracking (
    tracking_id       INTEGER PRIMARY KEY AUTOINCREMENT,
    fight_id          INTEGER NOT NULL REFERENCES fights(fight_id),
    fighter_id        INTEGER NOT NULL REFERENCES fighters(fighter_id),
    book              TEXT DEFAULT 'kalshi',
    market_type       TEXT DEFAULT 'moneyline',
    price             REAL,
    implied_prob      REAL,
    volume            INTEGER,
    volume_delta      INTEGER,
    captured_at       TEXT NOT NULL,
    minutes_to_event  INTEGER,       -- negative = event started
    round_number      INTEGER,       -- which round is live (NULL if between rounds)
    data_latency_ms   INTEGER,       -- [PANEL: Sonar] SLA monitoring
    is_stale          INTEGER DEFAULT 0,  -- 1 if capture missed SLA window
    created_at        TEXT DEFAULT (datetime('now'))
);

CREATE INDEX IF NOT EXISTS idx_live_fight ON live_tracking(fight_id);
CREATE INDEX IF NOT EXISTS idx_live_time ON live_tracking(captured_at);

-- ============================================================
-- 6.17 fighter_injuries (PANEL ADDITION — MUST-HAVE)
-- ============================================================
-- [PANEL: Opus, GPT-4.1, Grok 4.1, DeepSeek, Grok 4.1 Fast]
-- Medical suspensions and injury history. All panel models flagged as MUST-HAVE.
CREATE TABLE IF NOT EXISTS fighter_injuries (
    injury_id         INTEGER PRIMARY KEY AUTOINCREMENT,
    fighter_id        INTEGER NOT NULL REFERENCES fighters(fighter_id),
    fight_id          INTEGER REFERENCES fights(fight_id),  -- fight that caused it
    injury_type       TEXT,           -- 'knee', 'shoulder', 'orbital', 'hand', 'back', etc
    suspension_days   INTEGER,        -- mandatory medical suspension length
    suspension_end    TEXT,           -- ISO date when cleared to return
    is_cleared        INTEGER DEFAULT 0,
    clearance_type    TEXT,           -- 'time_served', 'medical_clearance'
    source            TEXT,           -- 'nsac', 'csac', 'mmajunkie', 'twitter'
    notes             TEXT,
    created_at        TEXT DEFAULT (datetime('now'))
);

CREATE INDEX IF NOT EXISTS idx_injuries_fighter ON fighter_injuries(fighter_id);

-- ============================================================
-- 6.18 referees
-- ============================================================
-- [PANEL: Gemini, GPT-4.1, Sonar, Grok 4.1 Fast]
-- Referee stoppage tendencies affect method/round market pricing.
CREATE TABLE IF NOT EXISTS referees (
    referee_id        INTEGER PRIMARY KEY AUTOINCREMENT,
    name              TEXT NOT NULL UNIQUE,
    total_fights      INTEGER DEFAULT 0,
    early_stoppage_rate REAL,         -- % of fights stopped before most refs would
    late_stoppage_rate  REAL,         -- % of fights allowed to continue longer
    standup_rate        REAL,         -- standups per fight (favors strikers)
    finish_rate_as_ref  REAL,         -- KO+Sub rate in fights they ref
    -- Apex vs arena tendency (referees may behave differently in smaller cage)
    apex_finish_rate    REAL,
    arena_finish_rate   REAL,
    created_at          TEXT DEFAULT (datetime('now')),
    updated_at          TEXT DEFAULT (datetime('now'))
);

-- Junction: which referee is assigned to a fight (pre-event, scraped from UFC.com)
CREATE TABLE IF NOT EXISTS fight_referees (
    fight_id          INTEGER NOT NULL REFERENCES fights(fight_id),
    referee_id        INTEGER NOT NULL REFERENCES referees(referee_id),
    confirmed         INTEGER DEFAULT 0,   -- 0 = expected, 1 = confirmed on fight card
    captured_at       TEXT NOT NULL,
    PRIMARY KEY (fight_id, referee_id)
);

-- ============================================================
-- 6.19 data_quality_log
-- ============================================================
-- [PANEL: Sonar] Sanity checks on scraped data. Flag and quarantine bad rows.
CREATE TABLE IF NOT EXISTS data_quality_log (
    log_id            INTEGER PRIMARY KEY AUTOINCREMENT,
    table_name        TEXT NOT NULL,
    record_id         INTEGER,
    field_name        TEXT,
    raw_value         TEXT,
    issue             TEXT,           -- 'out_of_range', 'impossible_value', 'missing_required'
    resolution        TEXT,           -- 'quarantined', 'corrected', 'accepted_with_flag'
    created_at        TEXT DEFAULT (datetime('now'))
);

7. Quality Composite Formula (0-100 Score)

Computed FRESH for each fighter in each specific matchup. Matchup-dependent — the same fighter will have different composite scores against different opponents.

quality_composite = (
    0.25 * striking_differential_score
  + 0.20 * grappling_advantage_score
  + 0.15 * finishing_ability_score
  + 0.10 * cardio_pace_score
  + 0.10 * durability_score
  + 0.10 * style_matchup_score
  + 0.05 * experience_recency_score
  + 0.05 * situational_score
)

[PANEL: Opus — CRITICAL] These weights (25/20/15/10/10/10/5/5) are educated guesses with zero empirical backing. The plan: launch with these weights as starting point, then implement automated weight optimization using logistic regression on historical fight outcomes. The composite should be a learned model, not a hand-tuned formula. At minimum, run a backtest on 2018-2024 data before going live and adjust weights based on actual predictive power per component. [PANEL: Sonar] Use inverse-MSE weighted averaging as an intermediate approach until logistic regression has enough data.

[PANEL: Opus] The optimal weights will vary BY WEIGHT CLASS (grappling matters more at 170+, striking differential matters more at 135-155) and BY ERA (wrestling dominance has declined since 2018). Per-class tuning deferred to v2 (need 12+ months of data per class).

7.1 Striking Differential Score (25%)

Percentile rank of (last3_slpm - last3_sapm) within the fighter's weight class. A fighter at the 80th percentile in striking differential gets 80. Bonuses: +5 for reach advantage >3 inches. +5 for knockdowns_per_15 in top quartile of division.

Use Bayesian-shrinkage-adjusted stats (adj_slpm, adj_sapm) for fighters with <5 UFC fights.

7.2 Grappling Advantage Score (20%)

Weighted blend within weight class:

40% × TD accuracy percentile
30% × TD defense percentile
20% × control time/min percentile
10% × sub attempts/15 percentile

Use adjusted stats for small-sample fighters.

7.3 Finishing Ability Score (15%)

career_finish_rate * 50
+ knockdowns_per_15_percentile * 30
+ (opponent career_ko_losses > 2 ? +20 : 0)

Capped at 100.

7.4 Cardio / Pace Score (10%)

100 - (output_decline_pct * 100)

A fighter who maintains 90% of R1 output in R3 scores 90. Bonus: +10 for positive championship round history (win % in R4-R5 fights). Penalty: -10 if categorized as "Cardio Cliff" archetype.

7.5 Durability / Chin Score (10%)

100 - (career_kd_absorbed_per15_percentile * 0.5 + ko_losses_penalty)

KO losses penalty — improved decay model [PANEL: Opus]:

The original step function (0/-15/-30/-50) ignores context. Use a decay function:

chin_penalty = sum over each KO loss:
    base_penalty * (1 - recovery_rate * months_since_KO)

where:
    base_penalty = 20 per KO loss
    recovery_rate = 0.02 per month
    minimum recovered penalty = 5 per KO loss (never fully goes away)
    recency flag: if KO loss within last 2 fights, apply 1.5x multiplier

For v1 implementation simplicity, the original step function (0/-15/-30/-50) with recency doubling is acceptable. Upgrade to the decay model in v2.

7.6 Style Matchup Score (10%)

Lookup style_a_win_rate from style_matchup_matrix for this fighter's primary style vs opponent's primary style within weight class GROUP (lower/middle/upper). win_rate * 100 = score. If wrestler vs boxer and wrestler wins 62% historically, score = 62. If no data, default to 50.

[PANEL: Gemini] Group weight classes (lower 125-145, middle 155-170, upper 185-265) to get statistically significant sample sizes. Per-class data is too sparse for most style pairings.

[PANEL: Gemini] Add Stance Matchup modifier: Open-stance (Southpaw vs Orthodox) fights have measurably different SLpM/SApM baselines. Apply ±3 points modifier based on historical open vs closed stance win rates at this weight class group.

7.7 Experience / Recency Score (5%)

Base: min(ufc_fights * 5, 50)
Layoff adjustment:
  60-180 days since last fight = 0   (optimal)
  30-59 days = -10                   (short camp)
  181-365 days = -15                 (ring rust)
  365+ days = -25                    (severe ring rust)

Short notice replacement: If this fighter is a late replacement with notice_days < 28, apply an additional -8 penalty. [PANEL: Opus/GPT-4.1]

7.8 Situational Score (5%)

Base: 50
Modifiers:
  Title fight / main event: +10
  PPV main card: +5
  Missed weight: -15
  Moving up in weight: -5
  Moving down in weight: +5 (if successful history)
  Altitude >5000ft: -5 (both fighters equally)
  Apex cage (25ft): +5 for wrestlers/pressure fighters, -5 for counter strikers
  Returning from medical suspension: -8 if cleared from 90+ day suspension  [PANEL: Opus]
  Camp change to elite gym (within 12 months): +5                           [PANEL: Opus/GPT-4.1]
  Camp change within 6 months (still adjusting): -3
  Left elite gym for smaller camp: -5

CRITICAL — AGE CLIFF (improved) [PANEL: Opus/Gemini/Sonar]:

The original spec uses a step function at exactly age 35 for fighters below 185 lbs. This creates discontinuities (34.9 vs 35.1 year olds treated wildly differently).

Improved model: Use a sigmoid decay curve that starts at 32 and reaches full penalty by 38. Curve steepness by division:

125-155 lbs: sigmoid midpoint at age 34, full penalty (-20) by age 37
170-185 lbs: sigmoid midpoint at age 36, full penalty (-20) by age 39
205-265 lbs: sigmoid midpoint at age 38, full penalty (-20) by age 41

For v1 implementation simplicity, the original step function with correct division groupings is acceptable. Upgrade to sigmoid in v2.

7.9 Compound Situation Boosters (additive modifiers)

These stack on top of the 0-100 base. Apply a floor of 10 — the composite cannot drop below 10 after all boosters. [PANEL: Opus]

Compound Decline Penalty (up to -15):

2+ KO losses + age ≥35 (below 185 lbs): -8
2+ KO losses + age ≥35 + layoff 365+: -15
2+ KO losses + layoff 365+ (any age): -5

Weight Cut Severity Penalty (up to -15):

Missed by 0.5-1.0 lbs: -5
Missed by 1.1-3.0 lbs: -10
Missed by 3.1+ lbs: -15
History of 2+ misses: additional -5

Camp Change Boost/Penalty:

Changed to elite gym (City Kickboxing, ATT, Sanford, Wittman): +5
Changed camps within last 6 months (still adjusting): -3
Left elite gym for smaller camp: -5

Elite Gym List [PANEL: GPT-4.1]: Maintain as a config object (not hardcoded strings), updated annually with win rates:

{
  "elite_gyms": [
    "American Top Team", "City Kickboxing", "Sanford MMA",
    "Jackson-Wink MMA", "AKA", "Tristar Gym", "Glory MMA",
    "10th Planet Jiu-Jitsu", "Rafael Lovato Jr MMA"
  ]
}

Market Resistance Filter: When model shows big edge (S or A tier) but the line doesn't move for 3+ hours while other fight lines are moving, flag and downgrade by one tier.

8. Edge Tier Classification

BOTH composite threshold AND model edge must be met simultaneously.

Tier	Composite	Model Edge Required	Action
S	85-100	+7% vs market	Strong play — max unit
A	70-84	+5% vs market	Standard play — 1 unit
B	55-69	+3% vs market	Lean play — half unit
C	40-54	Any positive edge	Track only — no bet
D	0-39	N/A	Skip — insufficient data

A fight with a great composite but no market edge is still C-tier. A fight with a huge market edge but weak composite is suspicious.

[PANEL: Opus] S-tier threshold of +7% may be too strict for moneyline. On Kalshi/Polymarket, UFC markets often have wide spreads (5-10 cents). The real edges are in METHOD and ROUNDS props where pricing is less efficient. Consider: S-tier at +5% for method/round props (higher inefficiency) and +7% only for moneyline.

9. Method of Victory (MoV) Modeling

The biggest UFC-specific edge vector. Method markets are less efficient than moneyline because books must price three-way probabilities. Public bettors systematically overvalue knockouts and underprice decisions.

9.1 Base Rates

fighter_A_ko_rate    = A.ko_wins / A.total_wins
fighter_A_sub_rate   = A.sub_wins / A.total_wins
fighter_A_dec_rate   = A.dec_wins / A.total_wins

fighter_B_ko_loss_rate  = B.ko_losses / B.total_losses
fighter_B_sub_loss_rate = B.sub_losses / B.total_losses
fighter_B_dec_loss_rate = B.dec_losses / B.total_losses

9.2 Combined Method Probability (sample-size weighted blend)

-- For Fighter A winning by KO:
raw_ko_prob = win_prob_A * (w1 * fighter_A_ko_rate + w2 * fighter_B_ko_loss_rate)

-- Sample-size weights:
w1 = A.total_wins / (A.total_wins + B.total_losses)
w2 = B.total_losses / (A.total_wins + B.total_losses)

-- Normalize so KO + Sub + Dec = win_prob_A

[PANEL: Sonar — CRITICAL] Method outcomes are NOT independent: High TD accuracy correlates with submission wins; weak chin correlates with KO losses. The sample-size weighted blend treats these as independent probabilities. In v2, use multinomial logistic regression or categorical outcome trees to model the full joint distribution. For v1, the blend is adequate.

Alternative approach (geometric mean) — deferred to v2 per panel consensus. Build sample-size weighted first, backtest, then compare.

9.3 Adjustments

Knockout: +5-15% if opponent KO'd in last 2 fights (permanent chin damage)
Submission: +10% if fighter has 3+ sub wins vs opponent with TD def <60%
Decision: +10-15% if both fighters have >60% decision rate AND 3-round fight
Weight class baseline: Heavyweight ~65% finish rate, Flyweight ~45%. Apply divisional priors before fighter-specific adjustments
Cardio: If large output decline disparity, increase late-round finish probability for the fresher fighter
Reach/height advantage >4 inches: +5% to distance KO probability
Referee stoppage tendency: Fast ref (e.g., Marc Goddard) → +3-5% KO/TKO, -3-5% Decision [PANEL: Gemini]

9.4 Market Comparison

Compare model method probabilities to DraftKings method props. Edge = model_ko_prob - market_ko_implied_prob. Method markets typically have 15-25% vig, so significant edges exist. Biggest inefficiencies: decision markets (underpriced) and submission markets (mispriced in grappling matchups).

10. Round-by-Round Modeling & Cardio Curves

10.1 Cardio Curves

-- Compute from fighter_round_stats:
SELECT
    fighter_id,
    round_number,
    AVG(sig_strikes_landed * 1.0 / (round_duration_sec / 60.0)) AS avg_strikes_per_min
FROM fighter_round_stats
GROUP BY fighter_id, round_number
ORDER BY fighter_id, round_number;

10.2 Round Finish Probability

P(finish_in_round_N) = base_finish_rate
    * cardio_multiplier(fighter_A, N)
    * cardio_multiplier(fighter_B, N)
    * cumulative_damage_factor(N)

base_finish_rate = (A.career_finish_rate + B.career_finish_rate) / 2

cardio_multiplier(fighter, N) = fighter.cardio_curve[N] / fighter.cardio_curve[1]
  -- The WEAKER fighter's decline INCREASES opponent's finish probability

cumulative_damage_factor(N) = 1.0 + (0.05 * N)
  -- 5% increase per round (accumulated damage)

P(goes_to_decision) = 1 - SUM(P(finish_in_round_N) for N in 1..scheduled_rounds)

[PANEL: Sonar] Round probabilities are path-dependent, not independent. If Fighter A is damaged in R2, R3 finish probability is conditional on surviving R2 damaged. In v2, use Markov chain or hidden-state model (HMM). For v1, the naive independence model is acceptable.

10.3 Over/Under Rounds Pricing

-- Standard 3-round fight:
P(under_1.5) = P(finish_R1)
P(over_1.5)  = 1 - P(under_1.5)
P(under_2.5) = P(finish_R1) + P(finish_R2)
P(over_2.5)  = 1 - P(under_2.5)

-- 5-round title fights: extend to 2.5, 3.5, 4.5

The 2.5 round market in title fights is often mispriced because the market anchors on 3-round base rates.

11. Live Tracking Architecture

1-minute cadence for ML prices; 5-minute cadence for method/round props. This cuts data volume by ~70% vs tracking all markets every minute. [PANEL: Opus]

11.1 What to Capture

Every 1 minute (ML only):

Kalshi + Polymarket: ML contract price + volume + liquidity
Data latency tracked (ms from request to capture)

Every 5 minutes (method/round):

Kalshi + Polymarket: method contracts (KO/Sub/Dec), round O/U contracts

Round breaks (critical snapshots):

Price at end of round and beginning of next round
Round break recalibration point — most significant pricing events

11.2 UFC-Specific Considerations

Track through the entire fight (no "halftime cutoff")
Volume surges on Kalshi/Polymarket during fights indicate sharp live betting
After a knockdown, prices swing 10-30 cents in seconds — 1-minute capture catches the aftermath
Track hi/lo extremes with round labels

[PANEL: Sonar — HIGH] Live tracking needs a data latency SLA. Define: (1) max acceptable staleness per data point (e.g., 90 seconds for ML), (2) fallback hierarchy: Kalshi > Polymarket > Pinnacle, (3) capture timestamp vs target timestamp in data_latency_ms column. A 10-second latency can be the difference between catching a 0.30 cent swing and being late.

11.3 SQLite WAL Mode for Live Tracking

SQLite in WAL mode handles this use case adequately at the scale of 1-2 UFC events. If contention occurs, write live tracking to a separate SQLite file (live_tracking.db) to isolate write contention from the main database. [PANEL: Opus]

12. Closing Odds & Results Grading

Closing Odds Snapshot:

Capture Pinnacle closing ML + method + rounds props at event_start - 20 min
Capture Kalshi closing prices via candlestick API
Capture Polymarket closing prices (cross-reference with Kalshi)
Capture DraftKings closing lines
Store all in betting_odds table with snapshot_type = 'closing'

Results Grading (post-fight):

ML result: W/L based on winner_id
Method result: correct method predicted?
Rounds result: over/under correct?
CLV (Closing Line Value): clv = our_implied_prob - closing_novig_prob (positive = beat the close)
Brier score per prediction source
Edge tier ROI: Cumulative profit by tier (S/A/B/C/D)

13. Implementation Roadmap (10-Step Build Order)

Panel consensus on build order: backfill and backtest first, live tracking last.

Step 1 — UFCStats scraper + fighter database + historical backfill (2018+)

Build the scraper to be idempotent (re-running doesn't duplicate data) and cache-first
Backfill from 2018 (not 2015): ~2500 fights, sufficient for backtesting, avoids pre-modern MMA noise
Include Sherdog as fallback scraper from day 1 (single point of failure mitigation)
Effort: 1-2 weeks

Step 2 — Rolling stats calculator + Bayesian shrinkage + percentile rankings

Populate fighter_rolling_stats with adjusted stats for all fighters
Build percentile ranking engine per weight class (these feed the composite formula)
Effort: 1 week

Step 3 — Pinnacle closing line scraper + Odds API + historical odds backfill (2018+)

Do the BestFightOdds historical backfill as a ONE-TIME operation with delays and rotating proxies
Once historical data is cached, losing access to BestFightOdds is survivable (Odds API covers current)
Effort: 1 week

Step 4 — Composite formula v1 + backtesting framework

Build the composite, calculate historical edges, backtest against actual outcomes
If the composite doesn't generate profitable edges on historical data, fix the model before continuing
Run logistic regression to validate/adjust the hand-tuned weights
Effort: 1-2 weeks

Step 5 — Method of Victory model + round probability model

This is where the real edge is. Moneyline markets are relatively efficient; method and round markets are not
Build the MoV model using the sample-size weighted blend approach
Backtest against historical DraftKings method props
Effort: 1-2 weeks

Step 6 — Kalshi + Polymarket scrapers (pre-event snapshots only, no live tracking yet)

Pre-event snapshots every 6 hours
Compare to model outputs — first time you see live edges
Effort: 1 week

Step 7 — External prediction scrapers (BetMMA.tips + Tapology) + ensemble

Add external calibration sources after internal model is working
Build ensemble blending internal composite, external predictions, and market-implied probabilities
Weight by historical Brier score (inverse-MSE weighting)
Effort: 1 week

Step 8 — Judge bias database (MMADecisions.com) + referee tendencies + decision probability adjustment

One-time bulk scrape of all UFC judge scorecards and referee assignments
Calculate per-judge and per-referee tendencies
When judges/refs are announced (usually 1-2 days before event), adjust decision probability
Effort: 3-5 days

Step 9 — Automation: event detection, pre-event reports, post-event grading, Brier scoring

Detect upcoming UFC events (scrape UFC.com schedule)
Auto-generate pre-event analysis for each bout
Auto-grade predictions after events
Calculate Brier scores for all sources
Effort: 1-2 weeks

Step 10 — Live 1-minute tracking (ML prices only) + late replacement alerts + line movement alerts

Start with ML prices only at 1-minute intervals
Add alerts for late replacements (MMA Junkie RSS) and significant line moves (>10% shift in 24 hours)
Only build this AFTER moneyline + MoV models are backtested and producing edge
Effort: 1 week

[PANEL: Sonar] Pre-launch production simulation (week 0): Before building anything, run the entire pipeline on a historical fight card (e.g., 5 fights from Jan 2026). Scrape all sources, calculate edges, generate a pre-event report, grade against actual outcomes. Catches broken scrapers, API limits, and schema surprises before real money is involved.

[PANEL: Sonar] Explicit launch gate: Launch when moneyline Brier score > 0.52 on last 50 backtested fights AND Method of Victory model backtest > 0.51. If Phase 10 is reached without these benchmarks, either the model doesn't work or it's untested.

14. Collection Cadence & Storage Estimates

Collection Schedule

Source	Cadence	Notes
UFCStats fighter profiles	Weekly (Monday)	Detect new fighters, update active rosters
UFCStats fight stats	After each event	Cache-first: check if fight exists before scraping
UFCStats round stats	After each event	Same event trigger as fight stats
Sherdog records	Weekly	Fallback / supplemental to UFCStats
Tapology fight cards	Daily	Detect upcoming fights, late replacements
MMA Junkie RSS	Hourly	Injury reports, camp news, replacement announcements
UFC.com event pages	Daily	Judge/referee assignments (appear 1-2 days before event)
The Odds API	4x/day baseline, hourly fight week	Opening/early/midweek/closing snapshots
BestFightOdds	6h pre-fight week, 1h fight week, 15min post-weigh-in	Line movement tracking
Kalshi / Polymarket	4x/day baseline, 6h pre-fight week	Pre-event snapshots
Live: ML prices	Every 1 minute during event	Fight window only
Live: method/round	Every 5 minutes during event	Fight window only
MMADecisions.com	One-time bulk + after each event	Judge scorecards
BetMMA.tips	24h before event	Prediction ensemble input
FightMatrix	48h before event	ELO-based predictions

Storage Estimates

Table	Rows/year	Row size	Annual storage
fighters	~200 new	~500 bytes	~100 KB
events	~45	~200 bytes	~9 KB
fights	~600	~400 bytes	~240 KB
fight_results	~600	~200 bytes	~120 KB
fighter_fight_stats	~1200	~600 bytes	~720 KB
fighter_round_stats	~3600	~400 bytes	~1.4 MB
fighter_rolling_stats	~1200	~800 bytes	~960 KB
fighter_weight_history	~1200	~200 bytes	~240 KB
betting_odds	~15000	~200 bytes	~3 MB
odds_line_movement	~50000	~200 bytes	~10 MB
predictions	~6000	~200 bytes	~1.2 MB
calculated_edges	~1200	~500 bytes	~600 KB
live_tracking	~180000	~200 bytes	~36 MB
fighter_injuries	~300	~200 bytes	~60 KB
judges / bout_judges	~500	~300 bytes	~150 KB
referees / fight_referees	~200	~300 bytes	~60 KB
Total (year 1)			~55 MB

Historical backfill (2018-2026, ~8 years): ~400 MB. Well within SQLite's practical limits. After 5 years of live operation: ~650 MB total — still fine for SQLite.

15. Key Queries (SQL)

Fighter composite inputs at time of fight

-- Get all rolling stats for a specific fighter going into a specific fight
SELECT
    f.name,
    frs.age_at_fight,
    frs.days_since_last_fight,
    frs.career_finish_rate,
    frs.career_ko_losses,
    frs.adj_slpm,        -- Bayesian-shrunk SLpM
    frs.adj_sapm,
    frs.adj_str_acc,
    frs.adj_td_acc,
    frs.adj_td_def,
    frs.output_decline_pct,
    frs.career_ko_rate,
    frs.career_sub_rate,
    frs.career_dec_rate,
    f.primary_style,
    f.team_gym,
    f.gym_change_date
FROM fighters f
JOIN fighter_rolling_stats frs ON f.fighter_id = frs.fighter_id
WHERE frs.as_of_fight_id = ?   -- specific fight ID
  AND frs.fighter_id = ?;       -- specific fighter ID

Weight-class percentile rankings (used in composite scoring)

-- Compute percentile rank of a fighter's striking differential within their weight class
WITH class_fighters AS (
    SELECT
        frs.fighter_id,
        (frs.adj_slpm - frs.adj_sapm) AS str_diff,
        frs.weight_class_id,
        COUNT(*) OVER (PARTITION BY frs.weight_class_id) AS class_count,
        RANK() OVER (PARTITION BY frs.weight_class_id ORDER BY (frs.adj_slpm - frs.adj_sapm)) AS rank_in_class
    FROM fighter_rolling_stats frs
    WHERE frs.as_of_date = (
        SELECT MAX(as_of_date) FROM fighter_rolling_stats WHERE fighter_id = frs.fighter_id
    )
)
SELECT
    fighter_id,
    str_diff,
    ROUND(1.0 * rank_in_class / class_count * 100, 1) AS percentile
FROM class_fighters
WHERE weight_class_id = ?;

Method of Victory edge vs market

-- Compare model method probs to market for all upcoming fights
SELECT
    f1.name AS fighter1,
    f2.name AS fighter2,
    ce.model_ko_prob,
    ce.market_ko_prob,
    ce.ko_edge,
    ce.model_sub_prob,
    ce.market_sub_prob,
    ce.sub_edge,
    ce.model_dec_prob,
    ce.market_dec_prob,
    ce.dec_edge,
    ce.edge_tier
FROM calculated_edges ce
JOIN fights fi ON ce.fight_id = fi.fight_id
JOIN fighters f1 ON fi.fighter1_id = f1.fighter_id
JOIN fighters f2 ON fi.fighter2_id = f2.fighter_id
JOIN events ev ON fi.event_id = ev.event_id
WHERE ev.event_date >= date('now')
  AND ce.fighter_id = fi.fighter1_id
ORDER BY ABS(ce.ko_edge) + ABS(ce.sub_edge) + ABS(ce.dec_edge) DESC;

Line movement velocity alert

-- Detect significant line moves (>10% shift) in last 6 hours
SELECT
    lm.fight_id,
    lm.fighter_id,
    f.name,
    lm.book,
    MIN(lm.implied_prob) AS prob_low,
    MAX(lm.implied_prob) AS prob_high,
    (MAX(lm.implied_prob) - MIN(lm.implied_prob)) AS movement_magnitude,
    COUNT(*) AS snapshot_count
FROM odds_line_movement lm
JOIN fighters f ON lm.fighter_id = f.fighter_id
WHERE lm.captured_at >= datetime('now', '-6 hours')
  AND lm.market_type = 'moneyline'
GROUP BY lm.fight_id, lm.fighter_id, lm.book
HAVING movement_magnitude > 0.10
ORDER BY movement_magnitude DESC;

S and A tier picks for upcoming event

-- All S and A tier picks for the next event
SELECT
    ev.name AS event,
    ev.event_date,
    f.name AS fighter,
    ce.quality_composite,
    ce.edge_tier,
    ce.edge_vs_pinnacle,
    ce.edge_vs_kalshi,
    ce.model_avg_prob,
    ce.pinnacle_novig_prob
FROM calculated_edges ce
JOIN fights fi ON ce.fight_id = fi.fight_id
JOIN events ev ON fi.event_id = ev.event_id
JOIN fighters f ON ce.fighter_id = f.fighter_id
WHERE ev.event_date = (SELECT MIN(event_date) FROM events WHERE event_date >= date('now'))
  AND ce.edge_tier IN ('S','A')
ORDER BY ce.quality_composite DESC;

Pre-event judge panel impact

-- For upcoming fights where judges are announced, show judge tendencies
SELECT
    f1.name AS fighter1,
    f2.name AS fighter2,
    j.name AS judge,
    j.pct_favor_aggressor,
    j.pct_favor_control,
    j.pct_favor_damage,
    j.pct_split,
    j.controversial_rate
FROM pre_fight_judges pfj
JOIN fights fi ON pfj.fight_id = fi.fight_id
JOIN judges j ON pfj.judge_id = j.judge_id
JOIN fighters f1 ON fi.fighter1_id = f1.fighter_id
JOIN fighters f2 ON fi.fighter2_id = f2.fighter_id
JOIN events ev ON fi.event_id = ev.event_id
WHERE ev.event_date = (SELECT MIN(event_date) FROM events WHERE event_date >= date('now'))
ORDER BY fi.bout_order, j.name;

Cardio curve for fighter

-- Output per round for a specific fighter (cardio curve)
SELECT
    fighter_id,
    round_number,
    AVG(CASE WHEN round_duration_sec > 0
        THEN sig_strikes_landed * 1.0 / (round_duration_sec / 60.0)
        ELSE NULL END) AS avg_strikes_per_min,
    COUNT(*) AS sample_size
FROM fighter_round_stats
WHERE fighter_id = ?
GROUP BY fighter_id, round_number
ORDER BY round_number;

Brier score by prediction source (model accountability)

-- Brier score per source over last 100 fights
SELECT
    p.source,
    COUNT(*) AS fights_tracked,
    AVG(ce.brier_score) AS avg_brier,
    SUM(CASE WHEN ce.prediction_correct = 1 THEN 1 ELSE 0 END) * 1.0 / COUNT(*) AS accuracy
FROM predictions p
JOIN calculated_edges ce ON p.fight_id = ce.fight_id AND p.fighter_id = ce.fighter_id
WHERE ce.prediction_correct IS NOT NULL
GROUP BY p.source
ORDER BY avg_brier ASC;

Data freshness / staleness check

-- Alert if any data source hasn't been updated in >24 hours
SELECT
    'ufcstats_events' AS source,
    MAX(created_at) AS last_update,
    ROUND((julianday('now') - julianday(MAX(created_at))) * 24, 1) AS hours_stale
FROM events
UNION ALL
SELECT
    'betting_odds',
    MAX(captured_at),
    ROUND((julianday('now') - julianday(MAX(captured_at))) * 24, 1)
FROM betting_odds
UNION ALL
SELECT
    'live_tracking',
    MAX(captured_at),
    ROUND((julianday('now') - julianday(MAX(captured_at))) * 24, 1)
FROM live_tracking
WHERE captured_at >= date('now', '-7 days');

16. Implementation Checklist

Phase 1: Foundation (Weeks 1-3)

Create SQLite database with PRAGMA journal_mode=WAL and PRAGMA foreign_keys=ON
Deploy all 19 CREATE TABLE statements from Section 6
Build UFCStats.com scraper (rate-limit to 1 req/sec, exponential backoff on 429)
Implement hash-based page structure monitoring (alert on UFCStats schema change)
Build Sherdog fallback scraper (records, results, basic stats)
Backfill historical data: all UFC events 2018+ (~400 events, ~4000 fights)
Build idempotent ETL pipeline (check if fight exists before scraping)
Populate weight_classes seed data
Populate fighter_weight_history from historical weigh-in data
Build fighter_rolling_stats ETL (compute after each fight ingestion)
Implement Bayesian shrinkage for fighters with <5 UFC fights
Implement data sanity checks (SLpM ≤ 15, control_time ≤ round_duration×3, accuracy ≤ 100%)
Set up elite_gyms config object (not hardcoded)

Phase 2: Betting Data Pipeline (Weeks 3-5)

Integrate The Odds API for multi-book odds (DK, Pinnacle, FanDuel, BetMGM)
Build Kalshi UFC market scraper (moneyline + method + rounds)
Build Polymarket UFC market scraper (same markets)
One-time bulk backfill of historical odds from BestFightOdds (2018+)
Build odds_line_movement pipeline (6h pre-fight week, 1h fight week, 15min post-weigh-in)
Build no-vig probability calculator for all books
Implement line movement velocity alerting (>10% shift triggers flag)

Phase 3: Prediction Ensemble (Weeks 5-7)

Build BetMMA.tips scraper
Build Tapology predictions scraper
Build FightMatrix ELO scraper
Build Action Network UFC scraper
Build internal ELO/Glicko-2 model (initialize from UFCStats backfill)
Implement predictions table with ensemble averaging logic
Build model_avg_prob computation across all available sources
Implement Brier scoring for each prediction source

Phase 4: Edge Calculation & Composite (Weeks 7-9)

Implement quality composite formula with all 8 components (Section 7)
Build percentile ranking engine per weight class
Implement style classification algorithm (rule-based from stat thresholds)
Build and populate style_matchup_matrix using weight class groups (lower/middle/upper)
Implement stance matchup modifier (+/-3 per open/closed stance)
Build calculated_edges pipeline (no-vig, cross-book, model edge, sharp action indicators)
Implement edge tier classification (S/A/B/C/D) with composite floor of 10
Build Method of Victory model (sample-size weighted blend — Section 9)
Build round-by-round model and over/under pricing (Section 10)
Run logistic regression backtest on 2018-2024 data to validate/adjust weights

Phase 5: Injury, Referee & Judge Data (Weeks 9-10)

Build MMA Junkie RSS reader for injury/medical reports
Build athletic commission scraper (NSAC/CSAC suspension lists)
Populate fighter_injuries table with historical data
Build MMADecisions.com scraper for judge scorecards
Calculate per-judge tendencies and populate judges table
Build UFC.com fight card scraper for referee + judge assignments
Build pre_fight_judges snapshot pipeline (trigger 48h before event)
Build fight_referees pipeline (trigger when assigned)

Phase 6: Automation & Live Tracking (Weeks 10-12)

Build automated fight card detection (new events → auto-create fight rows)
Build MMA Junkie RSS monitor for late replacement detection
Build pre-event report generation (composites, edges, method probs, round pricing)
Build post-event grading pipeline (CLV, prediction accuracy, method accuracy, Brier)
Build closing odds snapshot automation (event - 20 minutes)
Build live 1-minute ML tracking pipeline (Kalshi + Polymarket)
Build live 5-minute method/round tracking pipeline
Implement data latency SLA monitoring (data_latency_ms column)
Implement fallback hierarchy (Kalshi > Polymarket > Pinnacle) for live data
Test live tracking in WAL mode; if contention, split into separate live_tracking.db
Build performance tracking dashboard: CLV, edge tier ROI, method accuracy, Brier by source

17. Critical Flaws & Mitigations (Panel Review)

CRITICAL: UFCStats.com Scraping is Fragile

Severity: CRITICAL Sources: Opus, DeepSeek, Gemini, all panel models Fix:

Build Sherdog/Tapology as parallel fallback scrapers from day 1
Cache aggressively — historical fight stats never change after completion
Hash-based change detection on page structure with immediate alerts
Rate-limit to 1 req/sec, exponential backoff on 429/503
Rotating residential proxies if needed (~$120/month)

CRITICAL: Composite Formula Weights Are Untested

Severity: CRITICAL Sources: Opus, Sonar, DeepSeek Fix: Launch with hand-tuned weights, but immediately set up logistic regression backtest on 2018-2024 data. Weights should be learned, not guessed. The 25/20/15/10/10/10/5/5 split is an educated starting point. Adjust before going live based on backtest results.

HIGH: Rolling Averages Without Bayesian Shrinkage

Severity: HIGH Source: Opus Fix: All rolling stats shrunk toward weight-class mean using adj_stat = (n * fighter_stat + k * class_mean) / (n + k) where k=5. Implemented in fighter_rolling_stats table as adj_* columns.

HIGH: Live Tracking Latency SLA Missing

Severity: HIGH Source: Sonar Fix: Track data_latency_ms in live_tracking. Define max staleness per market (90s for ML, 5min for method/round). Implement fallback hierarchy: Kalshi > Polymarket > Pinnacle. Flag stale captures with is_stale = 1.

HIGH: No Data Validation Layer

Severity: HIGH Source: Sonar Fix: Sanity checks on all scraped data:

control_time_sec ≤ round_duration_sec × num_rounds
sig_strikes_per_min ≤ 15 (reasonable upper bound)
takedown_accuracy_pct ≤ 1.0
reach_inches > 0 AND reach_inches < 90 Log violations to data_quality_log, quarantine flagged records.

HIGH: Style Classification is Categorical — Lossy

Severity: HIGH Source: Opus Fix: V1 uses categorical tags (acceptable). V2 replaces with continuous style vector stored in fighters table (columns already included in schema: style_striking_offense, style_wrestling_offense, etc.).

HIGH: BestFightOdds.com Will Block Your IP

Severity: HIGH Source: Opus Fix: Do the historical backfill as a ONE-TIME operation with delays and rotating proxies. Once historical data is cached, losing access is survivable because Odds API covers current lines.

MEDIUM: Age Cliff is a Step Function

Severity: MEDIUM Source: Opus, Gemini, Sonar Fix: V1 uses the step function with correct division groupings (see Section 7.8). V2 upgrades to sigmoid decay curve per division.

MEDIUM: Internal ELO Creates Circular Dependency with Composite

Severity: MEDIUM Source: Opus Fix: Do NOT blend ELO as one of the ensemble sources while also using the composite formula. Either: use ELO/Glicko-2 as the model (replacing the composite), or use ELO only for calibration. For v1: the composite formula IS the model. ELO is a separate validation check.

MEDIUM: Chin Degradation Step Function is Too Simple

Severity: MEDIUM Source: Opus Fix: V1 uses the step function (0/-15/-30/-50) with recency doubling. V2 upgrades to the decay function detailed in Section 7.5. Schema is ready for either approach.

18. Over-Engineered Items (Defer or Simplify)

Item	Recommendation	Rationale
Live 1-minute tracking for ALL contract types	SIMPLIFY: ML at 1min, method/round at 5min	Cuts data volume 70%, captures 90% of signal
Geometric mean as alternative MoV approach	DEFER to v2: Build sample-size weighted first	Building two parallel systems doubles code debt
Per-weight-class composite formula tuning	DEFER to v2: Need 12+ months of live data per class	Too little per-class data in year 1
67+ statistical categories from UFCStats	SIMPLIFY: Start with top 20 most predictive	Feature importance analysis will show which add value
Bloody Elbow staff picks as ensemble source	REMOVE: Editorial turnover, poor calibration	BetMMA.tips + Tapology + internal model is sufficient
Compound booster stacking to -45	ADD FLOOR: Composite cannot drop below 10 after boosters	Prevents extreme values that break prediction calibration
Judge bias with 14 bias vectors per judge	SIMPLIFY for v1: Track 4 key tendencies per judge, add more in v2	Small edge (<3%); not worth full complexity at launch
Full cardio curves for all historical fights	SIMPLIFY: Compute on-demand via query, don't pre-store all curves	The query in Section 15 computes this efficiently
5-7 prediction site scrapers simultaneously	PHASE: Build BetMMA + Tapology first, add FightMatrix + Action Network after baseline is working	Ensemble only improves a working model
Cardio archetype classification (Sustainer/Decliner/Cliff)	SIMPLIFY: Use raw `output_decline_pct` in formula directly	Archetypes add interpretive complexity without clear gain over the raw metric

19. Cross-Desk Reusable Components

These are worth building as shared services, not per-desk implementations:

Component	Reusable For
Scraper infrastructure (rate limiting, Cloudflare handling, exponential backoff, health checks, local caching)	All desks — team sports, crypto, forex
Odds aggregation pipeline (Kalshi, Polymarket, Pinnacle, DraftKings standardization, closing line calculation)	All desks
Bayesian shrinkage for small-sample rolling stats	Tennis (individual sport, same problem), Boxing, Golf, any individual sport
ELO/Glicko-2 rating system	Tennis, Boxing, Chess/Esports, any 1v1 competition
Brier scoring + settlement pipeline	All desks — build as shared service
Line movement velocity detection + steam move alerts	NFL, NBA, MLB, NHL, Tennis, Boxing
Inverse-MSE ensemble weighting (better models get more weight)	All desks
Judge/referee bias tracking methodology	Boxing (judge bias even more impactful), NBA (referee tendencies), MLB (umpire strike zones)
RSS-based news/injury alert pipeline	All sports desks (different feeds, same infrastructure)
Backtest harness (predictions → outcomes → Brier score → edge tier ROI)	All desks
Feature engineering templates (percentile ranking, rolling averages, composite normalization)	All desks
Live tracking SLA architecture (prioritization, fallback hierarchy, staleness detection)	All desks doing live markets
Data quality log + sanity checks	All desks

20. What Would Make This A+ (Panel)

[PANEL: Sonar] — Three big moves:

Explicit data quality SLAs for every source: Commit to: UFCStats (check daily, alert if >24h stale), MMADecisions (manual validation on 5 random fights every Friday), judge panel (scraped <24h before fight). When data fails SLA, document it and adjust weights or skip that bet. Right now there is no accountability for data freshness. The data_quality_log table supports this, but you need a daily cron that checks it.
Pre-launch production simulation: Before building anything, run the entire pipeline on a historical fight card (e.g., 5 fights from Jan 2026). Scrape all sources, calculate all edges, generate a pre-event report, then grade against actual outcomes. This catches broken scrapers, API limits, and schema surprises before real money is on the line.
Explicit launch gate with pass/fail criteria: Add a hard requirement: "Launch when moneyline Brier score > 0.52 on last 50 backtested fights AND Method of Victory model backtest > 0.51." If Phase 10 is reached without these benchmarks, either the model doesn't work or it was shipped untested.

[PANEL: Opus] — Replace the composite formula with logistic regression as primary model: Keep the composite formula as a human-readable "explainability layer" but use a logistic regression model trained on the same features for actual edge calculation. A logistic regression on 2018-2024 fight outcomes using the same 8 feature groups will: (1) find the actual optimal weights, (2) reveal which features are predictive and which are noise, (3) output calibrated probabilities directly without needing to map composite scores to win probabilities, (4) be trivially updatable as new data comes in.

21. Appendices

Appendix A: Weight Class Reference

Division	Upper Limit (lbs)	Gender
Strawweight	115	Female
Flyweight	125	Male / Female
Bantamweight	135	Male / Female
Featherweight	145	Male
Lightweight	155	Male
Welterweight	170	Male
Middleweight	185	Male
Light Heavyweight	205	Male
Heavyweight	265	Male

Weight class groups for style matchup matrix:

Lower: 125-145 lbs (Flyweight through Featherweight)
Middle: 155-170 lbs (Lightweight through Welterweight)
Upper: 185-265 lbs (Middleweight through Heavyweight)

Appendix B: Method Classification Reference

Method	Sub-Types	Betting Market
KO/TKO	Punch KO, head kick KO, body shot TKO, ground & pound TKO, doctor stoppage (strikes)	KO/TKO prop
Submission	RNC, guillotine, armbar, triangle, heel hook, D'Arce, anaconda, etc.	Submission prop
Decision - Unanimous	All 3 judges score same winner	Decision prop
Decision - Split	2 of 3 judges score same winner	Decision prop
Decision - Majority	2 judges score winner, 1 draws	Decision prop
Draw	Split draw, majority draw	N/A (void most bets)
No Contest	Accidental foul, failed drug test	N/A (void)
DQ	Illegal strikes, fouls	Rare / N/A

Appendix C: Style Tags

Tag	Key Statistical Indicators
Boxer	High SLpM, high distance %, low TD attempts
Pressure Fighter	Very high SLpM, high SApM, low striking defense
Counter Striker	Low SLpM, high accuracy %, high defense %
Wrestler	High TD attempts, high control time, low SLpM
Grappler/BJJ	High sub attempts, moderate TDs, finish via sub
Clinch Fighter	High clinch strikes %, high clinch control time
Kickboxer	High distance %, balanced head/body/leg strikes
Well-Rounded	Balanced across all categories

Appendix D: Entity Relationship Summary

fighters ──┬── fights (as fighter1 or fighter2)
           ├── fighter_fight_stats (per-fight stats, 2 rows/fight)
           ├── fighter_round_stats (per-round stats, 2×N rows/fight)
           ├── fighter_rolling_stats (career snapshot at each fight)
           ├── fighter_weight_history (weight class + weigh-in per fight)
           ├── fighter_injuries (injury + suspension history)
           ├── betting_odds (odds per fighter per fight per book)
           ├── odds_line_movement (timestamped line changes)
           ├── predictions (per source per fighter per fight)
           └── calculated_edges (final edge calculations per fighter per fight)

events ──── fights (1:many)

fights ──┬── fight_results (1:1)
         ├── fighter_fight_stats (1:2)
         ├── fighter_round_stats (1:many)
         ├── betting_odds (1:many)
         ├── odds_line_movement (1:many)
         ├── predictions (1:many)
         ├── calculated_edges (1:2)
         ├── live_tracking (1:many)
         ├── bout_judges (1:3, for decisions)
         ├── pre_fight_judges (1:3, pre-event snapshot)
         └── fight_referees (1:1 or 1:many during build)

weight_classes ──── fights (1:many)
                    fighter_rolling_stats (1:many)
                    fighter_weight_history (1:many)
                    style_matchup_matrix (optional FK)

judges ──── bout_judges (1:many)
judges ──── pre_fight_judges (1:many)

referees ──── fight_referees (1:many)

Total data points per fight: ~220-240 (fighter stats x2 + betting odds across books + line movement + predictions across sources + calculated edges + live tracking snapshots + judge + referee data).

Appendix E: API Endpoints Reference

Service	Endpoint Pattern	Auth	Notes
The Odds API	`https://api.the-odds-api.com/v4/sports/mma_mixed_martial_arts/odds`	API key header	500 req/mo free
Kalshi REST	`https://api.elections.kalshi.com/trade-api/v2/markets`	RSA-PSS signature	Series filter: MMA
Kalshi Candlestick	`https://api.elections.kalshi.com/trade-api/v2/markets/{ticker}/candlesticks`	RSA-PSS signature	Historical closing prices
Polymarket	`https://gamma-api.polymarket.com/markets`	None	No account needed
UFCStats.com	`http://www.ufcstats.com/statistics/events/completed`	None	HTML scrape, rate-limit aggressively
Sherdog	`https://www.sherdog.com/fighter/{id}`	None	HTML scrape
Tapology	`https://www.tapology.com/fightcenter`	None	HTML scrape

Appendix F: Bayesian Shrinkage Parameters

Applied to all rolling stats for fighters with <10 UFC fights:

adj_stat = (n * fighter_stat + k * class_mean) / (n + k)

where:
  n = number of UFC fights (sample size)
  k = shrinkage coefficient (how much we pull toward class mean)
  class_mean = weight-class average for this stat

Recommended k values:
  k = 5  for most stats (SLpM, SApM, accuracy, defense)
  k = 3  for high-variance stats (KO rate, finish rate)
  k = 8  for very stable stats (reach, height — these don't change)

At n=5: adj_stat is weighted 50/50 between fighter and class mean (k=5)
At n=10: adj_stat is weighted 67/33 toward fighter (k=5)
At n=20: adj_stat is weighted 80/20 toward fighter (k=5)

This is the standard Marcel projection approach used in baseball analytics, adapted for MMA's sample-size problem.

End of merged final spec — UFC/MMA Data Collection Desk

TODO: Upgrade Kalshi REST to WebSocket Feed

Status: NOT BUILT — add when this desk goes live for execution

Current state: All Kalshi data (prices, order books, trades) is fetched via REST API polling on cron schedules. This is fine for edge detection and monitoring, but NOT sufficient for live trade execution.

Why WebSocket matters:

REST polling introduces lag — by the time you see a price, it may already be gone
On tight edges (2-5 cents), even 1-2 seconds of lag can eat the entire edge
WebSocket gives real-time order book deltas and trade notifications
Essential for hitting specific fill prices and monitoring execution quality

What to build when ready:

Connect to Kalshi WebSocket feed (wss://trading-api.kalshi.com/trade-api/ws/v2)
Subscribe to order book channels for active tickers
Stream trade events for real-time freshness (replaces REST trades polling)
Feed live prices into execution engine for precise entry/exit
Update Fill Quality Monitoring to measure slippage vs quoted price

Note: Sports markets use RFQ (Request for Quote) so the visible order book is usually empty — but the trade feed still matters for freshness signals and the WebSocket is required for order submission. Weather markets DO have real visible order books where this upgrade is even more critical.

Added 2026-03-29 — upgrade REST to WS when desk moves to live execution

Source: ~/edgeclaw/results/spec-panel/ufc-desk/spec-final.md