dbt basics
Hook
A CTE inside a notebook lives and dies with that notebook. A dbt model is that same SQL, version-controlled, runnable from a CLI, and chained into other models. Same logic, vastly more leverage.
Concept
dbt (data build tool) treats SQL queries as files in a folder. Each file
is a model — a SELECT statement that dbt materializes as a view or a
table in your database. You reference other models with {{ ref('name') }},
and dbt figures out the run order.
-- models/staging/stg_weekly_stats.sqlSELECT player_id, player_display_name, recent_team AS team, season, week, season_type, rushing_yards, receiving_yards, passing_yardsFROM {{ source('nflverse', 'weekly_stats') }}WHERE season_type = 'REG';-- models/marts/lions_season_totals.sqlSELECT player_display_name, season, SUM(rushing_yards) AS rush, SUM(receiving_yards) AS rec, SUM(passing_yards) AS passFROM {{ ref('stg_weekly_stats') }}WHERE team = 'DET'GROUP BY player_display_name, season;Then:
dbt run # builds every modeldbt test # runs data quality checksdbt docs generate && dbt docs serve # auto-generated lineage docsLions example
For the 1PRIDE project, a dbt setup would have:
data/dbt/ dbt_project.yml profiles.yml models/ staging/ stg_weekly_stats.sql (clean weekly, regular season only) stg_schedules.sql (one row per game, DET-perspective columns) stg_pbp_4th_down.sql (PBP filtered to 4th downs) marts/ lions_season_totals.sql lions_4th_down_decisions.sql lions_wr_room_metrics.sqlThe staging layer cleans nflverse data once. The marts layer consumes staging and builds analysis-ready tables. The L5 FastAPI then queries the marts directly — no ad-hoc joins, no per-request transformations.
Try it
(No SQL exercise for this lesson — it’s conceptual.) Sketch a dbt model hierarchy for the L5 capstone. Three staging models, three mart models, one final reporting view. Just titles and dependencies.
If you want hands-on practice, install dbt-core and run through the “Getting Started” tutorial (about 30 minutes) before tackling the L4 capstone.
Common mistakes
- Treating dbt as a runner for one-off queries. It’s a transformation layer. Every model should be reusable.
- Skipping the staging layer. Going from source to mart in one query means every mart re-does the cleaning. Stage once, reuse everywhere.
- No tests.
dbt testchecks uniqueness, not-null, and referential integrity. Add them on primary keys at minimum. - Materializing everything as a table. Views are fine until they’re slow. Default to view; flip to table when query times start hurting.
Quick check
- What’s the difference between a dbt staging model and a mart model?
- How does
{{ ref('other_model') }}differ from naming a table directly? - When would you materialize a model as a
tableinstead of aview?