Indexes — when, what, why
Hook
Postgres can read a 50-million-row PBP table in two ways: scan it
top-to-bottom, or jump to the rows it needs via an index. The first takes
seconds. The second takes milliseconds. The difference is one CREATE INDEX.
Concept
An index is a separate data structure that points back into the table. The
default flavor in Postgres is a B-tree — sorted, supports =, <, >, IN,
and range queries.
Three rules:
- Index what you filter on. Columns in
WHEREandJOIN ... ONare candidates. - Composite indexes are ordered.
(season, week, posteam)helps queries that filter onseasonalone, on(season, week), or on all three. It does not help a query that filters only onposteam. - Indexes cost write speed. Every
INSERTupdates every index. Index thoughtfully, not maximally.
-- This query takes 800ms without an index, 8ms with one:SELECT * FROM pbp WHERE posteam = 'DET' AND season = 2024;
-- The index:CREATE INDEX idx_pbp_posteam_season ON pbp (posteam, season);Look at the query plan with EXPLAIN ANALYZE:
EXPLAIN ANALYZESELECT * FROM pbp WHERE posteam = 'DET' AND season = 2024;A “Seq Scan” means full table read. An “Index Scan” or “Bitmap Index Scan” means it’s using your index.
Lions example
The Level 3 4th-down analyzer needs:
SELECT * FROM pbp WHERE posteam = 'DET' AND down = 4 AND season >= 2022;The default schema ships idx_pbp_posteam_season (covering posteam,
season) plus a partial idx_pbp_down on down. Postgres can BitmapAnd
those two — usually fast enough. If you wanted a single composite tuned
for exactly this workload, you’d add:
CREATE INDEX idx_pbp_posteam_down_season ON pbp (posteam, down, season);That’s not in the default schema — EXPLAIN ANALYZE before adding it and
confirm the planner reaches for it.
Sometimes you want a partial index — only index rows that match a predicate. The L3 schema already does this:
CREATE INDEX idx_pbp_down ON pbp (down) WHERE down IS NOT NULL;That’s smaller and faster than a full-column index, because the 4-5% of PBP rows with NULL down (kickoffs, extras) are excluded.
Try it
Open a psql session against your local Postgres. Run:
EXPLAIN ANALYZESELECT COUNT(*) FROM weekly_stats WHERE recent_team = 'DET' AND season = 2024;Note the execution time and which scan type Postgres used. Then drop the
index idx_weekly_team_season_week, re-run the same query, and compare.
Re-create the index afterward.
Common mistakes
- Index every column. Postgres maintains every index on every write. Indexes also take disk space. Two indexes per table is reasonable; fifteen is a smell.
- Forgetting composite order. An index on
(a, b)helps queries filtering onaor(a, b), but not onbalone. Order columns most-selective-first. - Indexing
LIKE '%foo%'. B-tree indexes can’t help leading-wildcard matches. Usepg_trgmor full-text search. - Not measuring. Use
EXPLAIN ANALYZE. Don’t add an index without evidence that the planner is reaching for it.
Quick check
- What’s the difference between a “Seq Scan” and an “Index Scan” in the query plan?
- Does an index on
(season, week, posteam)help a query filtered only onposteam? - Why is “more indexes = always faster” wrong?