dbt basics

Level 4 · Lesson 8

Hook

A CTE inside a notebook lives and dies with that notebook. A dbt model is that same SQL, version-controlled, runnable from a CLI, and chained into other models. Same logic, vastly more leverage.

Concept

dbt (data build tool) treats SQL queries as files in a folder. Each file is a model — a SELECT statement that dbt materializes as a view or a table in your database. You reference other models with {{ ref('name') }}, and dbt figures out the run order.

-- models/staging/stg_weekly_stats.sql
SELECT
    player_id,
    player_display_name,
    recent_team       AS team,
    season,
    week,
    season_type,
    rushing_yards,
    receiving_yards,
    passing_yards
FROM {{ source('nflverse', 'weekly_stats') }}
WHERE season_type = 'REG';

-- models/marts/lions_season_totals.sql
SELECT
    player_display_name,
    season,
    SUM(rushing_yards)   AS rush,
    SUM(receiving_yards) AS rec,
    SUM(passing_yards)   AS pass
FROM {{ ref('stg_weekly_stats') }}
WHERE team = 'DET'
GROUP BY player_display_name, season;

Then:

dbt run   # builds every model
dbt test  # runs data quality checks
dbt docs generate && dbt docs serve   # auto-generated lineage docs

Lions example

For the 1PRIDE project, a dbt setup would have:

data/dbt/
  dbt_project.yml
  profiles.yml
  models/
    staging/
      stg_weekly_stats.sql       (clean weekly, regular season only)
      stg_schedules.sql          (one row per game, DET-perspective columns)
      stg_pbp_4th_down.sql       (PBP filtered to 4th downs)
    marts/
      lions_season_totals.sql
      lions_4th_down_decisions.sql
      lions_wr_room_metrics.sql

The staging layer cleans nflverse data once. The marts layer consumes staging and builds analysis-ready tables. The L5 FastAPI then queries the marts directly — no ad-hoc joins, no per-request transformations.

Try it

(No SQL exercise for this lesson — it’s conceptual.) Sketch a dbt model hierarchy for the L5 capstone. Three staging models, three mart models, one final reporting view. Just titles and dependencies.

If you want hands-on practice, install dbt-core and run through the “Getting Started” tutorial (about 30 minutes) before tackling the L4 capstone.

Common mistakes

Treating dbt as a runner for one-off queries. It’s a transformation layer. Every model should be reusable.
Skipping the staging layer. Going from source to mart in one query means every mart re-does the cleaning. Stage once, reuse everywhere.
No tests. dbt test checks uniqueness, not-null, and referential integrity. Add them on primary keys at minimum.
Materializing everything as a table. Views are fine until they’re slow. Default to view; flip to table when query times start hurting.

Quick check

What’s the difference between a dbt staging model and a mart model?
How does {{ ref('other_model') }} differ from naming a table directly?
When would you materialize a model as a table instead of a view?

dbt basics

→ Hook

⊕ Concept

★ Lions example

▶ Try it

⚠ Common mistakes

✓ Quick check