Schema design — normalize, then index
Hook
Up through Level 3 you’ve been reading from nflverse’s flat tables. As a GM, you start building tables — for your team’s draft board, for your own analyses, for the L5 capstone’s API. Schema design is what separates “a folder of CSVs” from “a database.”
Concept
Three rules that handle most cases:
- One concept per table. A
playerstable holds player attributes. Aplayer_seasonstable holds per-season stats. Mixing them is the classic beginner mistake. - Primary keys are required. Every row needs a stable, unique
identifier.
player_id(nflverse uses GSIS IDs) is better thannamebecause names change and duplicate. - Foreign keys enforce relationships. A
player_seasonsrow referencing aplayer_idthat doesn’t exist should be rejected at the database level.
Example: the draft pick value model needs three tables, not one.
CREATE TABLE players ( player_id TEXT PRIMARY KEY, full_name TEXT NOT NULL, position TEXT, college TEXT, height_inches SMALLINT, weight_lbs SMALLINT);
CREATE TABLE draft_picks ( player_id TEXT REFERENCES players(player_id), draft_year SMALLINT, pick_overall SMALLINT, team TEXT, PRIMARY KEY (draft_year, pick_overall));
CREATE TABLE career_av ( player_id TEXT PRIMARY KEY REFERENCES players(player_id), career_av SMALLINT, seasons_played SMALLINT);Each table has one concept. Joins reconstruct the wider view when you need it.
Lions example
If you were building a Lions-specific scouting board, you’d model it as:
| Table | Concept |
|---|---|
prospects | one row per prospect under consideration |
prospect_scouting_grades | one row per (prospect, scout, date) |
prospect_combine_results | one row per prospect with combine measurables |
prospect_visits | one row per (prospect, visit_date) |
You’d not model it as one prospects table with 60 columns including a
scout_1_grade, scout_2_grade, scout_3_grade triple. That breaks the
moment you add a fourth scout.
Try it
Sketch (just on paper or in markdown) a schema for tracking the Lions’ 2025 draft class with:
- Picks (round, overall, team)
- Player attributes (name, position, college, combine 40 time)
- Year-by-year career AV
Three tables, with primary and foreign keys. Don’t write SQL yet — just draw the boxes and the arrows.
Common mistakes
- One giant table. Wide tables are quick to build and slow to maintain. Splitting them later is painful.
- Names as keys. Names duplicate, change, and have apostrophes that break
joins. Use a stable surrogate key (an integer
idor nflverse’splayer_id). - No foreign keys. Without them, “delete this player” leaves orphan rows in five other tables. The database can enforce this for you — let it.
- Premature denormalization for speed. Don’t pre-flatten tables until you’ve measured the query and confirmed normalization is the bottleneck. Postgres is much faster than you think.
Quick check
- Why use
player_idas the key instead offull_name? - What does a foreign key constraint prevent?
- When would you intentionally denormalize (combine two tables)?