GROUP BY
Hook
Level 1 aggregates collapsed everything into one row. GROUP BY is the
upgrade: aggregate per bucket — per player, per team, per week. One row
out per unique combination of group columns.
Concept
SELECT group_col, AGG(value)FROM tableWHERE ...GROUP BY group_col;The rule: every column in the SELECT list must either be in GROUP BY or
wrapped in an aggregate function. Mixing raw columns with aggregates is the
single most common SQL error in your career.
-- WRONG: player_display_name is not in GROUP BY and not aggregatedSELECT player_display_name, recent_team, SUM(rushing_yards)FROM weekly_statsGROUP BY recent_team;You can group by multiple columns. The bucket is the combination — group by
(season, recent_team) and you get one row per team-season.
Lions example
Lions players’ 2024 regular-season rushing totals, sorted by yards:
SELECT player_display_name, COUNT(*) AS games, SUM(carries) AS carries, SUM(rushing_yards) AS yards, SUM(rushing_tds) AS tds, ROUND(AVG(rushing_yards)::numeric, 1) AS yards_per_gameFROM weekly_statsWHERE recent_team = 'DET' AND season = 2024 AND season_type = 'REG' AND carries IS NOT NULLGROUP BY player_display_nameORDER BY yards DESC;One row per player. Gibbs (≈1,400 yds) and Montgomery (≈775 yds) at the top, then situational backs and the QB-as-runner rows.
You can group across teams to compare position groups. Top NFC North WRs in 2024:
SELECT player_display_name, recent_team, SUM(receiving_yards) AS yards, SUM(receiving_tds) AS tdsFROM weekly_statsWHERE recent_team IN ('DET', 'GB', 'MIN', 'CHI') AND season = 2024 AND season_type = 'REG' AND position_group = 'WR'GROUP BY player_display_name, recent_teamORDER BY yards DESCLIMIT 10;Try it
Compute points scored per Lions player in the 2024 regular season — six
points per rushing TD, six per receiving TD, four per passing TD (standard
fantasy scoring; real football puts 6 on the board regardless of who threw
it, but we’re using fantasy math here to make the QB stat readable). Group
by player. Sort by total descending. Use GROUP BY plus a hand-rolled
SUM(rushing_tds * 6 + receiving_tds * 6 + passing_tds * 4).
Common mistakes
- Selecting columns that aren’t grouped or aggregated. Postgres will reject it; MySQL will silently pick “any value.” Both are bad — be explicit.
- Filtering on aggregates in
WHERE. You can’t.WHEREruns before the group-and-aggregate step. UseHAVING(Lesson 4). - Grouping by too few columns. If you want one row per player-team, you
need
GROUP BY player_display_name, recent_team— otherwise a traded player collapses into one row with two teams’ worth of stats. COUNT(*)≠COUNT(col)within groups.COUNT(*)counts every row in the group;COUNT(col)skips nulls.
Quick check
- Which columns in a
SELECTare legal alongsideGROUP BY? - What does
GROUP BY player_display_name, seasonproduce vs justGROUP BY player_display_name? - Why does
WHERE SUM(rushing_yards) > 100fail, and what should you use instead?