Adjusting for recency? (bbstats, 2010)

Crow · Post by **Crow** » Fri Apr 29, 2011 4:30 am

bbstats

Joined: 25 Apr 2010
Posts: 46

PostPosted: Mon Dec 06, 2010 10:44 pm Post subject: Adjusting for recency? Reply with quote
Is there any documented, non-arbitrary method for adjusting team ratings for recent performance?
_________________
http://thebasketballdistribution.blogspot.com

http://twitter.com/bbstats
Back to top
View user's profile Send private message Visit poster's website
bbstats

Joined: 25 Apr 2010
Posts: 46

PostPosted: Tue Dec 07, 2010 3:19 pm Post subject: Reply with quote
I'll take your silence as a 'no.'

I'm going to try to fit my recency model to whichever ends up being the most predictive, starting with doing 5 predictions at a time.

My model simply adds residual performance (actual - predicted), as weighted by

=R*(gameDate-NBAstartDate)/(MostRecentNBAdate-NBAstartDate) + (1-R)

The 5 predictions are an R of 0, .25, .5, .75, and 1:

Code:
Tuesday/Dec7 0% 25% 50% 75% 100%
Denver at Charlotte(7:00 PM) -1.5 -1.2 -0.9 -0.4 0.3
Cleveland at Philadelphia(7:00 PM) 9.7 10.2 10.9 11.8 13.2
New Jersey at Atlanta(7:00 PM) 9.3 9.3 9.4 9.5 9.6
Detroit at Houston(8:30 PM) 8.8 8.9 9.1 9.4 9.8
Golden State at Dallas(8:30 PM) 13.5 13.7 14.1 14.7 15.5
Phoenix at Portland(10:00 PM) 2.3 2.0 1.5 0.8 -0.3
Washington at LA Lakers(10:30 PM) 16.8 16.7 16.5 16.2 15.7

_________________
http://thebasketballdistribution.blogspot.com

http://twitter.com/bbstats
Back to top
View user's profile Send private message Visit poster's website
Crow

Joined: 20 Jan 2009
Posts: 821

PostPosted: Tue Dec 07, 2010 6:34 pm Post subject: Reply with quote
Have you looked at what Holinger and Pelton (at basketball prospectus) do to adjust their power ratings as the season proceeds or have any reaction to it? The adjustments are fairly simple, but are they "arbitrary" in your view? Hollinger gives some additional weight for recent performance and has gotten some criticism for it from those that argue or point to research that casts doubt of the greater power of recent performance. Pelton describes his diminishing weight on the pre=season rating and increasing weight to early / recent performance as the season progresses in a recent article.

You could try searching past threads or look for any related academic research.

You asked about adjusting team ratings but it was not clear until post two that you were interested in game prediction. I would assume that some of the gambling related sites have non-arbitrary methods that are probably not fully documented but I have seen a few where they give some high-level description of what they are doing. You may already know more about these sites / methods than I do. But, somewhat distinctly, have you looked at the predictions and background material at nbastuffer.com? The author of that site is also active here. Perhaps you and he can discuss it further in some fashion.

Posts often summarize quickly but it sometimes takes a bit more to latch on and gain interest and respond on point and well. The first post in this case was quite light.

Here are some questions:

Are the listed predictions for the home or visiting team? I can't be sure my interpretation is correct.

Did you address home court advantage at all here and how? Something for the reader to be aware of. You have dealt with this detail and consistency in other models as indicated in discussion in other threads. I am guessing you haven't here on purpose to keep this recency model simple for now. Do you eventually intend to use what is learned as a building block in a grander model? Do you have any interest in incorporating the findings in the Effect of Rest Days on Efficiencies thread into such a grand model?

How do the predictions compare to the opening or closing betting lines? Or what would be suggested by the simple as of today unadjusted power ratings? Are you interested in tracking those comparisons here?

Last edited by Crow on Wed Dec 08, 2010 1:41 pm; edited 12 times in total
Back to top
View user's profile Send private message
back2newbelf

Joined: 21 Jun 2005
Posts: 275

PostPosted: Tue Dec 07, 2010 6:37 pm Post subject: Reply with quote
Why don't you retrodict all past seasons using data from basketball-reference or elsewhere
I think I once tried exponential moving average and it didn't do better than simple point differential.
Also, Hollingers' power rankings formula weights recent games differently, but I think I also found if you remove the part that adjusts for recency from his formula the pre(retro)diction results don't change significantly
Back to top
View user's profile Send private message
mtamada

Joined: 28 Jan 2005
Posts: 377

PostPosted: Tue Dec 07, 2010 8:32 pm Post subject: Reply with quote
back2newbelf wrote:
Why don't you retrodict all past seasons using data from basketball-reference or elsewhere
I think I once tried exponential moving average and it didn't do better than simple point differential.

Right, the intuitive answer is look at past data to find an optimal weighting scheme.

To switch to jargon, in econometric analysis of time series data you can do moving average models, autoregressive moels, or integrated models, or do all three at once with ARIMA (AutoRegressive Integrated Moving Average) models. Or go on from there to GARCH models, etc. (Look at "time series" under wikipedia as a start.)

For engineers, the equivalent sort of stuff comes from digital signal processing and filters. DeanO likes or at least used to like using Kalman filters. But the intuition on all of these is the same, you pick a functional form and by looking at the data, you estimate what sort of weighting scheme leads to the best results.
Back to top
View user's profile Send private message
bbstats

Joined: 25 Apr 2010
Posts: 46

PostPosted: Tue Dec 07, 2010 10:13 pm Post subject: Reply with quote
Thanks for the replies and inquiries.

I wasn't counting on getting a super-speedy reply; my 2nd post was more of a, "Hey, I should try this for myself!"

Most of what I thought about doing would take quite a while, and I hadn't seen any non-arbitrary methodologies posted by anyone (i.e. Hollinger's power ratings use a team's last ten games). I suppose the most rigorous answer would be to do what back2newbelf suggested, although on the surface it seems like the results won't be significant (?). I suppose I was really just wondering if anyone had yet found any 'optimal weighting scheme.'

I should check out nbastuffer's stuff too, I hadn't seen predictive models in my prior quick skimming of the site.

Crow wrote:
Are the listed predictions for the home or visiting team? I can't be sure my interpretation is correct.

These are home spreads, sorry for the lack of detail; sometimes I can be a bit sparse on detail as a result of being an Advertising major -- I know that people are less likely to pay attention to what I have to say if I type too much...

Crow wrote:
Did you address home court advantage at all here and how? Something for the reader to be aware of.

Yes. My models are ever-changing - not exactly a positive thing for people who are interested in what I'm doing. Right now my numbers are just using a modified version of the Excel-Solver method, and so I minimize residuals between Actual and Expected performance where expected performance = HomeTmRating - AwayTmRating + homeCourtAdvantage (home court is currently worth the average of every 2010-2011 HomeScore minus AwayScore).

Crow wrote:
You have dealt with this detail and consistency in other models as indicated in discussion in other threads. I am guessing you haven't here on purpose to keep this recency model simple for now.

Sort of. From my understanding of averages (via a high-school stats course and DeanO's B.O.P.), it makes the most sense to predict expected point differentials with averages, and predict chance of win% with these averages in tandem with Variation/Standard Deviation. Since point differentials are intuitive outcomes, (whereas win% isn't exactly an intuitive outcome from a single game) I was just going to use those for the recency model.

Crow wrote:
Do you eventually intend to use what is learned as a building block in a grander model?

Yes. My aim here is to eventually predict how we expect a team to perform currently, or even their expected playoff(s) performance.

Crow wrote:
Do you have any interest in incorporating the findings in the Effect of Rest Days on Efficiencies thread into such a grand model?

I have thought about it, and there seems to be quite a bit of detail about it on this board. However, being the lazy man that I am, I was hoping that their effect was be statistically insignificant Wink

Crow wrote:
How do the predictions compare to the opening or closing betting lines?

My general predictions are relatively close to the opening betting lines (I'm listed as WALK).

I guess if I continue with this very non-retrodictive (read: lazy) method, I might be able to come up with some idea of how much recent play affects 'future' play.

I'm sure as soon as my exams are over, I'll retrodict past seasons to find that 'optimal weighting scheme.'
_________________
http://thebasketballdistribution.blogspot.com

http://twitter.com/bbstats
Back to top
View user's profile Send private message Visit poster's website
Italian Stallion

Joined: 04 Mar 2009
Posts: 112

PostPosted: Wed Dec 08, 2010 10:12 am Post subject: Reply with quote
I'm trying to do the same thing, but unfortunately I'm operating with much less statistical background and have been putting the pieces of the puzzle together using the insights of others.

My own point spreads are typically very similar to the official ones.

I am tracking instances where my model suggested something much different than the official line to see if it was more or less predictive. So far the results suggest it is less predictive. In fact the result of my picks were exactly 50% winners and 50% losers.

That suggests that in some instances my line is missing something.

So I started looking at the details of the games when I disagreed with the official line strongly. I came to the conclusion that the official line may be weighting recent performances more heavily than distant performances (I was giving recency no extra weight).

Now the trick is to weight recent performances properly.
Back to top
View user's profile Send private message
Chilltown

Joined: 16 Apr 2010
Posts: 15
Location: Boston

PostPosted: Tue Dec 14, 2010 8:06 am Post subject: Reply with quote
Perhaps I'm missing something here, but wouldn't recency best be modeled using time series regressions (particularly ADL models)? The fundamental question is whether recent performance predicts current performance better than past performance.

Off the top of my head, you could have variables for a team's offensive of defensive efficiency, their opponent's stats, etc (you could make whatever metric you wanted your dependent variable). You might also want to introduce dummy variables for back to backs, three games in four days, etc, etc.

It also might be cool to do a GARCH model of the volatility of teams' performance.
Back to top
View user's profile Send private message Visit poster's website
Ryan J. Parker

Joined: 23 Mar 2007
Posts: 711
Location: Raleigh, NC

PostPosted: Tue Dec 14, 2010 11:09 am Post subject: Reply with quote
When I took an applied time series class, I tried to apply AR/MA/ARCH/GARCH models to seasonal data. I didn't get far, but I'd be interested in seeing if anyone else has success.
_________________
I am a basketball geek.
Back to top
View user's profile Send private message Visit poster's website
gabefarkas

Joined: 31 Dec 2004
Posts: 1313
Location: Durham, NC

PostPosted: Tue Dec 14, 2010 8:51 pm Post subject: Reply with quote
What's an ADL model?
Back to top
View user's profile Send private message Send e-mail AIM Address
DSMok1

Joined: 05 Aug 2009
Posts: 611
Location: Where the wind comes sweeping down the plains

PostPosted: Tue Dec 14, 2010 10:53 pm Post subject: Reply with quote
gabefarkas wrote:
What's an ADL model?

Autoregressive Distributed Lag, I think.
_________________
GodismyJudgeOK.com/DStats
Twitter.com/DSMok1
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Mike G

Joined: 14 Jan 2005
Posts: 3615
Location: Hendersonville, NC

PostPosted: Wed Dec 15, 2010 7:36 am Post subject: Reply with quote
Compared to any improvement one could make by weighing recent games more heavily, one is still better off with some team-specific information.

If a superstar has been out for a month, and he's about to return, is the most recent month really more indicative than the previous one?

If a team is suddenly winning based on a creative lineup/strategy mix, is that element of surprise likely to keep on winning, or will the rest of the league wise up to it?

Several years of predictions based on a bunch of analytic systems have not produced anything more accurate than what the more intuitive predictors have offered.
_________________
`
36% of all statistics are wrong
Back to top
View user's profile Send private message Send e-mail
DSMok1

Joined: 05 Aug 2009
Posts: 611
Location: Where the wind comes sweeping down the plains

PostPosted: Wed Dec 15, 2010 9:17 am Post subject: Reply with quote
Mike G wrote:
Compared to any improvement one could make by weighing recent games more heavily, one is still better off with some team-specific information.

If a superstar has been out for a month, and he's about to return, is the most recent month really more indicative than the previous one?

If a team is suddenly winning based on a creative lineup/strategy mix, is that element of surprise likely to keep on winning, or will the rest of the league wise up to it?

Several years of predictions based on a bunch of analytic systems have not produced anything more accurate than what the more intuitive predictors have offered.

That's why I use regressed SPM and then adjust according to expected minutes played to get my predictions. The standard deviation of game margin from expected is under 10 by this method, I think significantly under 10 (though I haven't done a rigorous evaluation).
_________________
GodismyJudgeOK.com/DStats
Twitter.com/DSMok1
Back to top
View user's profile Send private message Send e-mail Visit poster's website
mtamada

Joined: 28 Jan 2005
Posts: 377

PostPosted: Wed Dec 15, 2010 7:04 pm Post subject: Reply with quote
DSMok1 wrote:
gabefarkas wrote:
What's an ADL model?

Autoregressive Distributed Lag, I think.

Yup, for some concise but readable notes on time series stuff, that goes into more detail than the wikipedia article on time series that I mentioned, try Nathaniel Beck's notes from when he taught at UCSD (evidently he's at NYU now): http://www.nyu.edu/classes/nbeck/longdata/tss.pdf
Back to top
View user's profile Send private message
Italian Stallion

Joined: 04 Mar 2009
Posts: 112

PostPosted: Thu Dec 16, 2010 1:02 am Post subject: Reply with quote
Mike G wrote:
Compared to any improvement one could make by weighing recent games more heavily, one is still better off with some team-specific information.

If a superstar has been out for a month, and he's about to return, is the most recent month really more indicative than the previous one?

If a team is suddenly winning based on a creative lineup/strategy mix, is that element of surprise likely to keep on winning, or will the rest of the league wise up to it?

Several years of predictions based on a bunch of analytic systems have not produced anything more accurate than what the more intuitive predictors have offered.

I agree that things like that can be critical, but they are also well known and probably being accounted for by most people in their thinking.

Unless I can find things that aren't being built in properly and am including all the relevant information properly myself, I am wasting my time trying to build a model. I should just look at what Vegas thinks. lol

Recent performance seems to be one of the things I am not building in properly. The other may be win/loss record. Even though point differential has proven to be more predictive than win/loss record, I suspect a combination of both is superior to either alone.

EvanZ

Joined: 22 Nov 2010
Posts: 307

PostPosted: Thu Dec 16, 2010 9:33 am Post subject: Reply with quote
I would think that if there was some consistent pattern in the data for all teams, it could be found by doing some autocorrelation. Is this what these time series models do?
_________________
http://www.thecity2.com
http://www.ibb.gatech.edu/evan-zamir
Back to top
View user's profile Send private message
greyberger

Joined: 27 Sep 2010
Posts: 53

PostPosted: Thu Dec 16, 2010 4:46 pm Post subject: Reply with quote
Quote:
The other may be win/loss record. Even though point differential has proven to be more predictive than win/loss record, I suspect a combination of both is superior to either alone.

This subject is kind of a puzzle to me. I'd always wondered why it's said that point differential is a 'better' indicator of future performance than record, since there's no reason for the two to be mutually exclusive. Considering how the prominent ranking/prediction models (Hollinger rankings, B-R.com rankings, etc) are based on point margin and don't use record, I figure there has to be a good reason.

Perhaps nobody's found the right way to incorporate win% or win-loss records yet, or perhaps it's just redundant and less information-rich than schemes with point margin as an input. If the win-loss component doesn't improve the predictions with half a season or more to predict with, then it's hard to justify using it just at the beginning of the season either.
Back to top
View user's profile Send private message Send e-mail
back2newbelf

Joined: 21 Jun 2005
Posts: 276

PostPosted: Thu Dec 16, 2010 5:29 pm Post subject: Reply with quote
Hollinger uses opponent record to compute SoS which factors into his ratings.
Sagarin lists two methods on his site and states
"In ELO CHESS, only winning and losing matters; the score margin is of
no consequence,which makes it very "politically correct". However it
is less accurate in its predictions for upcoming games than is the
PURE POINTS, in which the score margin is the only thing that matters.
PURE POINTS is also known as PREDICTOR, BALLANTINE, RHEINGOLD, WHITE OWL, and is the best single PREDICTOR of future games."
Back to top
View user's profile Send private message
bbstats

Joined: 25 Apr 2010
Posts: 46

PostPosted: Thu Dec 16, 2010 8:44 pm Post subject: Reply with quote
Well, if we're talking about a system that retrodictively (and hopefully in the future) predicts %chance of win --

Points are basically a non-binary version of Wins (or a way to value each 1 or 0 value for 1=win, 0=loss). And since better teams will try and succeed at winning by more points, margin ends up being a better indicator. The central limit theorem agrees with this ideas, and teams tend to play towards a specific mean point-value. I incorporate eWin% with this mean value alongside standard deviation. And point-margin is honestly just documented to be a better indicator. However, let's appease the idea for a minute, that Wins (or Win%) should also be included.

To incorporate wins alongside margin would effectively be doing something like this:

where
win=roughly 10.4 points (average NBA win margin this year)
loss=-10.4
x=modeled percentage of weight to wins where 1-x is the weight to margin

>
Rating=x*(Win-Losses)/(games played)*(10.4)+ (1-x)*AvgMargin
>

I think it's pretty self-evident that this is a really silly idea and that x would almost certainly be modeled as 0.

EDIT: Well, dang! I ran this through excel solver and for this season through 12/13, I got an x-value of .26! (Also, I got home-margin advantage of 3.08 and home-win advantage of .68* for a Home win and 1.46* for a Road win)

SECOND EDIT Nope. Jumped the gun. Fixed my home-court-advantage terms and the x-value went to zero.
_________________
http://thebasketballdistribution.blogspot.com

http://twitter.com/bbstats
Back to top
View user's profile Send private message Visit poster's website
Chilltown

Joined: 16 Apr 2010
Posts: 15
Location: Boston

PostPosted: Fri Dec 17, 2010 6:47 pm Post subject: Reply with quote
Sorry if I wasn't very clear in my first post. Obviously testing for autocorrelation is the first step. If AC is present, I was thinking of an ADL model that minimizes BIC where we test for significant lags in the predictors (specifically the Y variable and any controls). Of course Mike has nicely illustrated some of the drawbacks: team-specific information is always important and going back too far may invalidate the stationarity assumption of the ADL model. Just my thought.