Prompt
Build a model that predicts an NFL WR’s next-week receiving yards from their last-week receiving yards and last-week targets (2021-2024, regular season, WRs only, only weeks 2-18 since they need a prior week).
Compare a DummyRegressor(strategy='mean') baseline to a
LinearRegression. Use 5-fold cross-validation. Report mean R² ± std for
both. Comment on whether the linear model adds real signal.
Expected output
Constant R² = 0.000 ± 0.001Linear R² = 0.0XX ± 0.0XXPlus a short interpretation (the gap is small in absolute terms; that’s expected for a noisy outcome).
Hint
The SQL is the hard part — you need LAG window functions to pull
last-week stats. The model itself is two cross_val_score calls.
Solution
import pandas as pdfrom sklearn.dummy import DummyRegressorfrom sklearn.linear_model import LinearRegressionfrom sklearn.model_selection import cross_val_scorefrom sqlalchemy import create_engine
eng = create_engine("postgresql+psycopg://onepride:lions@localhost:5432/onepride")
df = pd.read_sql( """ SELECT player_display_name, recent_team, season, week, receiving_yards, LAG(receiving_yards) OVER w AS prev_yards, LAG(targets) OVER w AS prev_targets FROM weekly_stats WHERE season BETWEEN 2021 AND 2024 AND season_type = 'REG' AND position_group = 'WR' AND targets > 0 WINDOW w AS (PARTITION BY player_display_name, season ORDER BY week) """, eng,)df = df.dropna(subset=['prev_yards', 'prev_targets'])
X = df[['prev_yards', 'prev_targets']]y = df['receiving_yards']
for name, model in [ ('Constant', DummyRegressor(strategy='mean')), ('Linear', LinearRegression()),]: scores = cross_val_score(model, X, y, cv=5, scoring='r2') print(f"{name:10s} R² = {scores.mean():.3f} ± {scores.std():.3f}")Expect R² in the 0.05-0.15 range for Linear. That’s real in the sense that it beats the baseline, but the absolute number is low because week-over-week receiving is largely noise. The model is finding the small amount of week-over-week signal that does exist.