I have created a dataset where I have each NBA's team opponent-adjusted offensive and defensive rating, after each day of the season, for the last 20 years. I'd like to use this, along with their offensive/defensive rating from the previous season, to predict their season-ending rating. So for example, the Bucks have an offensive rating 7 points better than league average right now (after 9 games). Last year they were 2 points better than average. I want to predict where they'll end this year.

Here's an example of what my data looks like.

**oRTG+_ytd**is their year to date rating.

**GP_ytd**is the number of games played so far.

**oRTG+_f**is the season ending rating.

**oRTG+_lastYear**is their rating last year.

I want to predict

**oRTG+_f.**

I can use a simple regression, except that's going to miss the weighting for the number of games played so far this year. A +10 rating after 40 games is going to be much more significant than a +10 rating after 5 games. I don't think adding sample weights solves for this issue either.

My standard approach to this sort of problem is to optimize X and Y in the following equation using Excel's solver or scipy.optimize.curve_fit.

(<oRTG+_ytd>*<GP_ytd> + <oRTG+_lastYear>*X + 0*Y)/(<GP_ytd>+X+Y) = <oRTG+_f>

What that will do is essentially assign each team's current rating its weight based on the number of games played so far. Last Year's rating gets X games weight. And then I'm applying a regression to the mean weight of Y games (0 = mean offensive rating). And that yields a weighted average. This is similar to the methodology laid out here, except adding a feature for last year's rating as well: http://statitudes.com/blog/2013/11/12/h ... -using-srs

This may not be the only model however, and I'm not sure it's the best model. It's the one that makes intuitive sense to me (a weighted average based on the number of games played so far), but given the number of machine learning options available, I am interested in whether other, potentially more elegant models exist which can do this just as well.

This is a common problem in sports analytics, and generalizable outside of this particular dataset obviously. I am curious about how people go about solving issues like this. Is this an area where a Bayesian Regression is needed?