Computing fake RAPM for players of the 90s
Re: Computing fake RAPM for players of the 90s
I have done some research on distribution of starters and bench players for each quarter depending on current point differential of the game.
Difference in score was put in bins with size 5. So, a team can "lead" by more than 20, more than 15 but less than 20, more than 10 but less than 15 etc.
That score difference is recorded for each quarter, giving us a sequence of 4 numbers
The most commong sequences were:
(1) Leading by 5+ -> leading by 20+ -> leading by 20+ -> Won by 20+
(2) Leading by 10+ -> leading by 15+ -> leading by 20+ -> Won by 20+
which had the following percentage of starters per quarter. The team that led after Q1 is listed first
(1)
Q1: 0.83 0.82
Q2: 0.56 0.52
Q3: 0.70 0.79
Q4: 0.24 0.21
(2)
Q1: 0.82 0.85
Q2: 0.55 0.51
Q3: 0.75 0.79
Q4: 0.25 0.22
Distribution is very similar, that's because the score is similar, too.
For comparison, here's the distribution for the sequence
Leading by more than 0 -> Behind more than 0 (but less than 5) -> Teams even -> Won by 5+
Q1: 0.83 0.81
Q2: 0.51 0.53
Q3: 0.83 0.75
Q4: 0.56 0.59
I'm going to use those distributions to create better fake matchupfiles.
I fear that a new problem will arise: Teams that get badly blown out in Q1 start to bring in bench players earlier. RAPM might punish the bench players unfairly for being more often on the court in first quarters that went bad. Hopefully it's less of a problem than the one I had before (bench players of really good teams being rated too high). Bench players of teams that get blown out in 1st quarters probably aren't great, anyway
Difference in score was put in bins with size 5. So, a team can "lead" by more than 20, more than 15 but less than 20, more than 10 but less than 15 etc.
That score difference is recorded for each quarter, giving us a sequence of 4 numbers
The most commong sequences were:
(1) Leading by 5+ -> leading by 20+ -> leading by 20+ -> Won by 20+
(2) Leading by 10+ -> leading by 15+ -> leading by 20+ -> Won by 20+
which had the following percentage of starters per quarter. The team that led after Q1 is listed first
(1)
Q1: 0.83 0.82
Q2: 0.56 0.52
Q3: 0.70 0.79
Q4: 0.24 0.21
(2)
Q1: 0.82 0.85
Q2: 0.55 0.51
Q3: 0.75 0.79
Q4: 0.25 0.22
Distribution is very similar, that's because the score is similar, too.
For comparison, here's the distribution for the sequence
Leading by more than 0 -> Behind more than 0 (but less than 5) -> Teams even -> Won by 5+
Q1: 0.83 0.81
Q2: 0.51 0.53
Q3: 0.83 0.75
Q4: 0.56 0.59
I'm going to use those distributions to create better fake matchupfiles.
I fear that a new problem will arise: Teams that get badly blown out in Q1 start to bring in bench players earlier. RAPM might punish the bench players unfairly for being more often on the court in first quarters that went bad. Hopefully it's less of a problem than the one I had before (bench players of really good teams being rated too high). Bench players of teams that get blown out in 1st quarters probably aren't great, anyway
Re: Computing fake RAPM for players of the 90s
This is so fun!
Re: Computing fake RAPM for players of the 90s
I have now worked in the scores of each quarter, together with expected distribution of (non) starters for each quarter, according to current point differential of the game. The results look reasonable. Here's an example for 95-96 until 97-98 (the Bulls 2nd 3peat)
Two still existing problems are
a) players with low minutes are too close to 0, and
b) some starters on very good teams, who just happened to play with all-time-greats, get too high of a rating
I should be able to somewhat fix both problems by using BoxScore informed priors
Code: Select all
Leaders in Off.On-Def.On
Bill Wennington
Randy Brown
Kukoc
Pippen
Ron Harper
Jordan
Longley
Kerr
Rodman
Code: Select all
Leaders in (Off.On-Def.On)*MP
Jordan
Pippen
Harper
Rodman
Malone
Stockton
Longley
Hornacek
Kukoc
Payton
Code: Select all
Leaders in fake RAPM
Pippen
Robinson
Shaq
Stockton
Jordan
Mourning
Kukoc
Eddie Jones
Ron Harper
Malone
Horace Grant
a) players with low minutes are too close to 0, and
b) some starters on very good teams, who just happened to play with all-time-greats, get too high of a rating
I should be able to somewhat fix both problems by using BoxScore informed priors
Re: Computing fake RAPM for players of the 90s
Appreciate this effort, J.E.The results look reasonable.
Assume you find "reasonable" the 3rd list. The first one has some barely-replacement players, Wennington and Randy Brown, on top. All are Bulls. The next one is all Bulls and Jazz, plus Payton at #10.
When Jordan faced Stockton for 12 games in the Finals, it was easy to see Jordan >> Stockton. Could it be otherwise against other opponents?
It's almost believable that Pippen was more indispensable than Jordan in those years. In '98, Pip missed 38 games. The Bulls were 36-8 when he played, and 26-12 when he didn't -- that's 1.75 times as likely to lose.
If Ron Harper (or Toni Kukoc) could have gotten himself ejected along with Karl Malone, that's advantage: Bulls.
-
- Posts: 105
- Joined: Thu Jul 26, 2012 8:49 pm
- Location: Dallas, TX
Re: Computing fake RAPM for players of the 90s
J.E., when you end up revising this and adding the box priors, I suggest that you consider redoing the spm regression and taking out all 3 pointer stats (3pm, 3pa, 3p%) and using the coefficients you find in the redo for years before 1995. The fact is that the 3-pointer was totally different before 1995 - I'm confident it's value will be skewed for that era if we're basing the value on today's results. Accordingly, you'll get odd results for players because a few guys take WAY more threes than the rest of the league.
Re: Computing fake RAPM for players of the 90s
This is seriously the coolest thread.
Re: Computing fake RAPM for players of the 90s
Yes, I meant the 3rd list. The other two are there to show that my simulation algorithm does something sane.Mike G wrote: Assume you find "reasonable" the 3rd list.
It certainly could be otherwise against other opponents, but I wouldn't want to change anyone's opinion when there's that small a difference in rank between the two, especially considering the fact that it's fake RAPMWhen Jordan faced Stockton for 12 games in the Finals, it was easy to see Jordan >> Stockton. Could it be otherwise against other opponents?
I'm sorry but I'm not going to do this. I want to finish this project quickly, so I can move on to other things. Besides, absolutely removing 3s will very likely skew things just as badly in the other directionJ.E., when you end up revising this and adding the box priors, I suggest that you consider redoing the spm regression and taking out all 3 pointer stats (3pm, 3pa, 3p%) and using the coefficients you find in the redo for years before 1995. The fact is that the 3-pointer was totally different before 1995 - I'm confident it's value will be skewed for that era if we're basing the value on today's results. Accordingly, you'll get odd results for players because a few guys take WAY more threes than the rest of the league.
The fake xRAPM numbers also seem reasonable, but I'll probably put pure fake RAPM of the 90s (10y RAPM) online so people can see a pure +/- analysis of that time period
Re: Computing fake RAPM for players of the 90s
J.E., I agree with you regarding the 3pt shot. It is rather obvious that it contains a specific value for the team, and players, who can shoot the ball well from the perimeter are usually showing a positive overall impact. Also, the timeline jbrocato23 is referring to is not really accurate. The NBA moved the line in to a uniform 6.70m away from the basket instead of having 6.70 at the corners and 7.24m at the top in 1994/95. For the 1997/98 season the line was moved back to the old distance.
For 1994/95 to 1996/97 you can see an increased 3p% and increased shot attempts. That's all.
Also, you are saying the result of the fake is looking reasonable. Did you test it with the current available dataset (if I'm not mistaken, you have one from 2001 to 2012) and can give an estimate of an "RMSE", meaning, how far away is the fake from the real RAPM value? Maybe such comparison shows you a systematic error for a specific player type. Also, did you try using a lower prior for the low minute players to compensate for the described effect?
Mike, the results are not really surprising in the first two lists, given the fact that the Bulls had in average a 9.91 SRS for those 3 seasons, it is to expected that they will have a lot of players on top of the first list (OnCourt Ortg - OnCourt Drtg) and also for the 2nd list. After that we have the SuperSonics with 6.88 SRS and the Jazz with 6.65 SRS. Thus, we can expect players from those teams being in the 2nd list as well, which is just multiplying the 1st list with the minutes played for each respective player.
The 3rd list shows also some players on top we can expect to be there. Robinson was out for the tank season of the Spurs, which might boost his value here while dropping the value for his teammates he played with. Shaq played for the Magic and then for the Lakers. Jordan started the 1998 season with a injury to his index finger and had trouble all season to adjust to that. He started out with like 70% free shooter, but improved upon this throughout the season. Well, Pippen came back when Jordan already was better shooting, but I guess the influence of the first games is seen here and will probably make Pippen look better.
The splits for Jordan:
First 18 games: 26.8 ppg on 49 TS% in 39.2 mpg, 4.5 MOV for the Bulls
Next 17 games: 30.8 ppg on 56 TS% in 38.7 mpg, 6.1 MOV for the Bulls
Last 47 games: 28.7 ppg on 54 TS% in 38.7 mpg, 8.5 MOV for the Bulls
I can see Pippen getting some credit for Jordan adjusting to his finger injury here. Nonetheless, the names on the list are not that surprising with the exception of Rodman not being among them, but I guess he is pretty close.
For 1994/95 to 1996/97 you can see an increased 3p% and increased shot attempts. That's all.
Also, you are saying the result of the fake is looking reasonable. Did you test it with the current available dataset (if I'm not mistaken, you have one from 2001 to 2012) and can give an estimate of an "RMSE", meaning, how far away is the fake from the real RAPM value? Maybe such comparison shows you a systematic error for a specific player type. Also, did you try using a lower prior for the low minute players to compensate for the described effect?
Mike, the results are not really surprising in the first two lists, given the fact that the Bulls had in average a 9.91 SRS for those 3 seasons, it is to expected that they will have a lot of players on top of the first list (OnCourt Ortg - OnCourt Drtg) and also for the 2nd list. After that we have the SuperSonics with 6.88 SRS and the Jazz with 6.65 SRS. Thus, we can expect players from those teams being in the 2nd list as well, which is just multiplying the 1st list with the minutes played for each respective player.
The 3rd list shows also some players on top we can expect to be there. Robinson was out for the tank season of the Spurs, which might boost his value here while dropping the value for his teammates he played with. Shaq played for the Magic and then for the Lakers. Jordan started the 1998 season with a injury to his index finger and had trouble all season to adjust to that. He started out with like 70% free shooter, but improved upon this throughout the season. Well, Pippen came back when Jordan already was better shooting, but I guess the influence of the first games is seen here and will probably make Pippen look better.
The splits for Jordan:
First 18 games: 26.8 ppg on 49 TS% in 39.2 mpg, 4.5 MOV for the Bulls
Next 17 games: 30.8 ppg on 56 TS% in 38.7 mpg, 6.1 MOV for the Bulls
Last 47 games: 28.7 ppg on 54 TS% in 38.7 mpg, 8.5 MOV for the Bulls
I can see Pippen getting some credit for Jordan adjusting to his finger injury here. Nonetheless, the names on the list are not that surprising with the exception of Rodman not being among them, but I guess he is pretty close.
Re: Computing fake RAPM for players of the 90s
I could certainly look up those players that are "off" the most, butmystic wrote:Also, you are saying the result of the fake is looking reasonable. Did you test it with the current available dataset (if I'm not mistaken, you have one from 2001 to 2012) and can give an estimate of an "RMSE", meaning, how far away is the fake from the real RAPM value? Maybe such comparison shows you a systematic error for a specific player type.
a) I think we already know who they are (role players on very good teams)
b) whoever those might be in the end, what can we do about them? It's not like there's an abundance of data we could still work into the ratings
I'm giving more weight to the BoxScore informed prior than I did in the 2000s, 50/50 instead of 65/35. That will give most lower minute players a lower prior.Also, did you try using a lower prior for the low minute players to compensate for the described effect?
Re: Computing fake RAPM for players of the 90s
a) Yes, that sounds like something we can expect.
b) You could assume that they don't get minutes, because they aren't very good. Meaning, assign them to a fixed value which is below average. There might also be the case, that the players are in for the last quarter in blowouts. Maybe it is possible to assume that the majority of the additional point differential in the 4th for the winner was achieved during the early minutes.
c) I'm looking forward to the results.
As bbstats said, that is indeed one really cool thread and some great work you are putting out here.
b) You could assume that they don't get minutes, because they aren't very good. Meaning, assign them to a fixed value which is below average. There might also be the case, that the players are in for the last quarter in blowouts. Maybe it is possible to assume that the majority of the additional point differential in the 4th for the winner was achieved during the early minutes.
c) I'm looking forward to the results.
As bbstats said, that is indeed one really cool thread and some great work you are putting out here.
Re: Computing fake RAPM for players of the 90s
Alright. Year-by-year ratings for the 90s are online now at
http://stats-for-the-nba.appspot.com/ratings/1991.html etc.
The ratings were built using my BoxScore ratings and ratings from season N-1 as prior.
The BoxScore rating doesn't like Jordan too much. The RAPM part helps a bit with this, but never so much to push him into the top spot.
The #1s for each year are extremely boring, it's Robinson, Robinson, Robinson, Robinson, Robinson, Shaq, Robinson, Robinson .. you get the idea
There's no doubt that, in some cases, the ratings can be far from reality. Take them with a grain of salt
I'll post 10year pure fake RAPM ratings later
http://stats-for-the-nba.appspot.com/ratings/1991.html etc.
The ratings were built using my BoxScore ratings and ratings from season N-1 as prior.
The BoxScore rating doesn't like Jordan too much. The RAPM part helps a bit with this, but never so much to push him into the top spot.
The #1s for each year are extremely boring, it's Robinson, Robinson, Robinson, Robinson, Robinson, Shaq, Robinson, Robinson .. you get the idea
There's no doubt that, in some cases, the ratings can be far from reality. Take them with a grain of salt
I'll post 10year pure fake RAPM ratings later
Re: Computing fake RAPM for players of the 90s
J.E., what did you use for the BoxScore rating? Is it described somewhere?
I'm interested to see what the results would be without mixing in box score stats. If superstars typically are rated very highly and "scrubs" are rated poorly based only on your matchup data and not boxscore stats, that's already a nice validation of this method I think.
I'm interested to see what the results would be without mixing in box score stats. If superstars typically are rated very highly and "scrubs" are rated poorly based only on your matchup data and not boxscore stats, that's already a nice validation of this method I think.
Re: Computing fake RAPM for players of the 90s
Some of those that will probably end up overrated actually *do* get minutes. I'm thinking guys like Ron Harper and Luc Longley (not saying they're bad, but probably not elite). Also, players will low minutes already get a lower prior due to the BoxScore ratingmystic wrote:b) You could assume that they don't get minutes, because they aren't very good. Meaning, assign them to a fixed value which is below average. There might also be the case, that the players are in for the last quarter in blowouts. Maybe it is possible to assume that the majority of the additional point differential in the 4th for the winner was achieved during the early minutes.
viewtopic.php?f=2&t=8025J.E., what did you use for the BoxScore rating? Is it described somewhere?
here's pure fake RAPM from 90-91 to 99-00I'm interested to see what the results would be without mixing in box score stats
http://stats-for-the-nba.appspot.com/ratings/90s.html
Jordan's far and away #1 on offense and overall
Magic and Bird would very likely have a higher rating if I included some of the 80s
Minor headscratchers are Derek Fisher at #10, Jerome Williams at #15, Jaren Jackson. Robert Horry is at #10, but that doesn't really surprise me
Elton Brand is dead last, being the MP leader of the 99-00 Bulls that had an SRS of -9.2
Re: Computing fake RAPM for players of the 90s
^ Appreciate it. Thanks.
Re: Computing fake RAPM for players of the 90s
I'll concede that MJ's defense might not be as strong as some centers' defense...but to never have him at #1 is beyond suspect. 
