Play-by-play with nfl_data_py
Hook
weekly_stats is one row per player per game. Play-by-play (PBP) is one row
per play — every down, every snap, every score state. The 4th-down analyzer
capstone, win probability, EPA — all of it lives in PBP.
Concept
The PBP table has ~50,000 rows per season and 300+ columns. The fields you’ll reach for most:
| Column | What it is |
|---|---|
game_id | unique per game |
posteam | the team on offense |
defteam | the team on defense |
down | 1-4, or NULL on kickoffs / extras |
ydstogo | yards to go for first down |
yardline_100 | distance to the opponent’s end zone (0-100) |
qtr | quarter, 1-5 (OT = 5) |
game_seconds_remaining | clock, in seconds |
score_differential | offense’s lead, negative if trailing |
play_type | pass, run, field_goal, punt, kickoff, extra_point, qb_kneel |
epa | expected points added by this play |
wp / wpa | win probability / win probability added |
import nfl_data_py as nfl
pbp = nfl.import_pbp_data([2024], columns=[ 'game_id', 'season', 'week', 'posteam', 'defteam', 'down', 'ydstogo', 'yardline_100', 'qtr', 'game_seconds_remaining', 'score_differential', 'play_type', 'epa', 'wpa',])The columns= filter cuts the download dramatically. The full PBP is 300+
columns; pulling 13 is plenty for most analyses and 10x faster.
Lions example
Every Lions 4th-down play in 2024 with the situation captured:
import nfl_data_py as nfl
pbp = nfl.import_pbp_data( [2024], columns=['week', 'qtr', 'posteam', 'defteam', 'down', 'ydstogo', 'yardline_100', 'score_differential', 'game_seconds_remaining', 'play_type', 'desc', 'epa'],)
lions_4th = pbp.loc[ (pbp['posteam'] == 'DET') & (pbp['down'] == 4) & pbp['play_type'].isin(['pass', 'run', 'field_goal', 'punt']), ['week', 'qtr', 'ydstogo', 'yardline_100', 'score_differential', 'game_seconds_remaining', 'play_type', 'desc', 'epa']].sort_values(['week', 'qtr', 'game_seconds_remaining'], ascending=[True, True, False])
print(f"Lions 4th-down plays: {len(lions_4th)}")print(lions_4th.head(10).to_string(index=False))Note: nflverse exposes the play description as desc on the DataFrame.
Our Postgres pbp table renames it to description to dodge the SQL
keyword — keep the two contexts straight.
That’s the raw material for the L3 capstone. You add an expected-value model on top, and you’ve got the analyzer.
Try it
Pull 2024 PBP. Count Lions 4th-down attempts by decision type — went for it
(pass or run), field goal, or punt. Group by play_type. Add a column for
the average ydstogo per decision type.
Common mistakes
- Loading PBP without the
columns=filter. The full table is hundreds of MB per season. Filter early. - Forgetting
play_typefilters. PBP includes every snap, including kickoffs, extra points, and special-teams quirks. For offensive analysis, filter topass,run,field_goal,punt. - Treating
score_differentialas “what the score is.” It’s the offense’s lead at the start of the play. Negative means the offense is trailing. - Mixing NULLs on
down. Plays without a down (kickoffs, two-point attempts) haveNULL. Filter them out or your aggregates will skew.
Quick check
- What’s the difference between
weekly_statsandpbpgranularity? - Why use
columns=[...]when callingimport_pbp_data? - What does
score_differential = -7mean about the offense?