RAPM request thread

Home for all your discussion of basketball statistical analysis.
Post Reply
bchaikin
Posts: 307
Joined: Thu May 12, 2011 2:09 am

Re: RAPM request thread

Post by bchaikin »

tony mitchell (6-6, 201314, mil, 10 min)
tony mitchell (6-8, 201314, det, 79 min)

marcus williams (6-3, 0607-0910, njn, gsw, mem)
marcus williams (6-7, 0708-0809, san, lac)

chris wright (6-8, 11-12 gsw, 13-14 mil)
chris wright (6-1, 12-13, dal)

dee brown (6-1, 9091-0102, bos, tor, orl)
dee brown (6-0, 0607, 0809, uta, was, pho)
AcrossTheCourt
Posts: 237
Joined: Sat Feb 16, 2013 11:56 am

Re: RAPM request thread

Post by AcrossTheCourt »

Yes, I know those are the players, but I can't figure out who corresponds to which entry in the 15 year age adjusted file.
A Gravity Well
Posts: 24
Joined: Mon Jun 23, 2014 1:38 am

Re: RAPM request thread

Post by A Gravity Well »

How do you split RAPM into offensive and defensive?
J.E.
Posts: 852
Joined: Fri Apr 15, 2011 8:28 am

Re: RAPM request thread

Post by J.E. »

A Gravity Well wrote:How do you split RAPM into offensive and defensive?
I explain it here https://www.youtube.com/watch?v=OuC0YZTADcE
A Gravity Well
Posts: 24
Joined: Mon Jun 23, 2014 1:38 am

Re: RAPM request thread

Post by A Gravity Well »

J.E. wrote:
A Gravity Well wrote:How do you split RAPM into offensive and defensive?
I explain it here https://www.youtube.com/watch?v=OuC0YZTADcE
Lovely presentation.

You don't use "-1" for any dummy variable. As I understand -- wrongly understood? -- it, the home team is assigned +1 and the away team is assigned -1? And the result vector is positive if the home team has outscored the away team, and negative if the away team has outscored the home team?

Code: Select all

P1o P2o P3o P4o P5o   P1d P2d P3d P4d P5d    P6o P7o P8o P9o P10o P6d P7d P8d P9d P10d
1   1   1   1   1       0   0   0   0   0     0   0   0   0   0   -1  -1  -1  -1  -1
0   0   0   0   0       1   1   1   1   1    -1  -1  -1  -1  -1   0   0   0   0   0

J.E.
Posts: 852
Joined: Fri Apr 15, 2011 8:28 am

Re: RAPM request thread

Post by J.E. »

The results field is simply the points scored in that specific possession (assuming ~190 possessions per game, 95 per team, which differs from most people's interpretation). There's no subtracting going on

Whether you assign a +1 or -1 to defensive dummies is just personal preference, although if the results field really was 'home_pts - away_pts' you'd have to assign '-1' to defenders
Nate
Posts: 132
Joined: Tue Feb 24, 2015 2:35 pm

Re: RAPM request thread

Post by Nate »

Thanks for responding to people's requests.

Is the data set that you're using available to the public in a computer-friendly format somewhere?
J.E.
Posts: 852
Joined: Fri Apr 15, 2011 8:28 am

Re: RAPM request thread

Post by J.E. »

Nate wrote:Is the data set that you're using available to the public in a computer-friendly format somewhere?
I used to put matchupdata online, but the site where I got the PlayByPlay from asked me to take it down, so I did

Basketballvalue http://basketballvalue.com/downloads.php at least has some data that you can play around with (goes through 2012)
A Gravity Well
Posts: 24
Joined: Mon Jun 23, 2014 1:38 am

Re: RAPM request thread

Post by A Gravity Well »

J.E. wrote:The results field is simply the points scored in that specific possession (assuming ~190 possessions per game, 95 per team, which differs from most people's interpretation). There's no subtracting going on

Whether you assign a +1 or -1 to defensive dummies is just personal preference, although if the results field really was 'home_pts - away_pts' you'd have to assign '-1' to defenders
I suppose I have a conceptual hurdle to clear, then.

Code: Select all

library(glmnet)
thebeast <- read.csv("~/Desktop/beast.csv", header=TRUE)
poss <- as.numeric(thebeast$possessions)
margin <- as.numeric(thebeast$home_point_margin)
thebeast$possessions <- NULL
thebeast$home_point_margin <- NULL
mbeast <- data.matrix(thebeast)
lambda <- cv.glmnet(mbeast,margin,weights=poss,nfolds=5)
lambda.min <- lambda$lambda.min
ridge <- glmnet(mbeast,margin,family c=("gaussian"),poss,alpha=0,lambda=lambda.min)
coef(ridge,s=lambda.min)
That's my code.

My CSV is such that:

Code: Select all

possessions  home_point_margin  P1o P2o P3o P4o P5o   P1d P2d P3d P4d P5d    P6o P7o P8o P9o P10o   P6d P7d P8d P9d P10d
     5            100            1   1   1   1   1      0   0   0   0   0     0   0   0   0   0    -1  -1  -1  -1  -1
     5            -75            0   0   0   0   0      1   1   1   1   1    -1  -1  -1  -1  -1     0   0   0   0   0
r1 is the home team on offense, r2 is is the away team on offense, and home point margin is per 100 possessions. I thought doing it like this would have the calculation be:

Code: Select all

home team on offense: Home_Margin (will be positive or zero) = P1o + P2o + P3o + P4o + P5o - (P6d + P7d + P8d +P9d + P10d)
home team on defense: Home_Margin (will be negative or zero) = P1d + P2d + P3d + P4d + P5d - (P6o + P7o + P8o +P9o + P10o)
Where have I gone off the rails? Wouldn't it have to be structured like this to account for HCA (adding a dummy variable of 1 for HCA in each row)?
Nate
Posts: 132
Joined: Tue Feb 24, 2015 2:35 pm

Re: RAPM request thread

Post by Nate »

A Gravity Well wrote:...

Where have I gone off the rails? Wouldn't it have to be structured like this to account for HCA (adding a dummy variable of 1 for HCA in each row)?
Your notation seems a bit confused. Do you have any experience with regression models?
A Gravity Well
Posts: 24
Joined: Mon Jun 23, 2014 1:38 am

Re: RAPM request thread

Post by A Gravity Well »

Ignore above. Had moment of clarity.

Code: Select all

possessions      resultv        P1o P2o P3o P4o P5o   P1d P2d P3d P4d P5d    P6o P7o P8o P9o P10o   P6d P7d P8d P9d P10d
     1              3            1   1   1   1   1      0   0   0   0   0     0   0   0   0   0     1   1   1   1   1
     1              2            0   0   0   0   0      1   1   1   1   1     1   1   1   1   1     0   0   0   0   0
     1              0            1   1   1   1   1      0   0   0   0   0     0   0   0   0   0     1   1   1   1   1
     1              2            0   0   0   0   0      1   1   1   1   1     1   1   1   1   1     0   0   0   0   0
     1              0            1   1   1   1   1      0   0   0   0   0     0   0   0   0   0     1   1   1   1   1
     1              0            0   0   0   0   0      1   1   1   1   1     1   1   1   1   1     0   0   0   0   0
     1              2            1   1   1   1   1      0   0   0   0   0     0   0   0   0   0     1   1   1   1   1
     1              0            0   0   0   0   0      1   1   1   1   1     1   1   1   1   1     0   0   0   0   0
How does one then incorporate HCA -- or score differential at time of possession? How does one denote which team is the home team/which team is the away team? If I want to add "this team is up x with y minutes left" or "this team is down x with y minutes left"...

What I had been doing was believing +1/-1 was sufficient for offsetting home and away. From there I added a HCA column that was always set to 1. I had then used a number of different columns representing different score differentials (hometeamup4, hometeamup9, etc), but since my earlier way of doing this was bonkers busted, I am unsure of how to proceed.

EDIT:

...is it just as simple as splitting HCA into HCA_offense and HCA_defense?
Last edited by A Gravity Well on Fri Oct 16, 2015 7:26 pm, edited 1 time in total.
A Gravity Well
Posts: 24
Joined: Mon Jun 23, 2014 1:38 am

Re: RAPM request thread

Post by A Gravity Well »

Nate wrote:
A Gravity Well wrote:...

Where have I gone off the rails? Wouldn't it have to be structured like this to account for HCA (adding a dummy variable of 1 for HCA in each row)?
Your notation seems a bit confused. Do you have any experience with regression models?
No. Experience, but not as much carryover as I'd have liked originally from my field.
Nate
Posts: 132
Joined: Tue Feb 24, 2015 2:35 pm

Re: RAPM request thread

Post by Nate »

A Gravity Well wrote:...

How does one then incorporate HCA -- or score differential at time of possession? How does one denote which team is the home team/which team is the away team? If I want to add "this team is up x with y minutes left" or "this team is down x with y minutes left"...

What I had been doing was believing +1/-1 was sufficient for offsetting home and away. From there I added a HCA column that was always set to 1. I had then used a number of different columns representing different score differentials (hometeamup4, hometeamup9, etc), but since my earlier way of doing this was bonkers busted, I am unsure of how to proceed.

EDIT:

...is it just as simple as splitting HCA into HCA_offense and HCA_defense?
I would be inclined to handle home court by having the same kind of split as with offense and defense - P1oh, P1dh, P1or, P1dr.

To add a linear time left or scoring differential factor per player, you could have P1oh-diff, P1dh-diff, P1or-diff, P1d-diff *in addition to* the other independent variables, and have the score differential in them.

You can obviously run into sample size issues if you split things too much, and assumptions of linearity aren't always appropriate.
J.E.
Posts: 852
Joined: Fri Apr 15, 2011 8:28 am

Re: RAPM request thread

Post by J.E. »

A Gravity Well wrote:I had then used a number of different columns representing different score differentials (hometeamup4, hometeamup9, etc)
This should work. One column for each possible score differential, although you might want put all situations with e.g. score_diff>30 into one bucket

For HCA one column should be enough, which is simply switched on during home team offensive possessions

Alternatively you could adjust the results vector Y by 'average_home_eff_per_possession' and 'average_away_eff_per_possession'.
So, if the home team scores 3 points, the result turns into, say, 3-1.07 = 1.93.
If the away team scores 3 points, the result turns into 3-1.04 = 1.96

Having a centered Y is good to have, anyway
A Gravity Well
Posts: 24
Joined: Mon Jun 23, 2014 1:38 am

Re: RAPM request thread

Post by A Gravity Well »

Nate wrote:I would be inclined to handle home court by having the same kind of split as with offense and defense - P1oh, P1dh, P1or, P1dr.

To add a linear time left or scoring differential factor per player, you could have P1oh-diff, P1dh-diff, P1or-diff, P1d-diff *in addition to* the other independent variables, and have the score differential in them.

You can obviously run into sample size issues if you split things too much, and assumptions of linearity aren't always appropriate.
Wouldn't doing it like this allow the values of HCA or anything else added to be more knowable?

Code: Select all

resultv   HCo HCd   P1o P2o P3o P4o P5o   P1d P2d P3d P4d P5d    P6o P7o P8o P9o P10o   P6d P7d P8d P9d P10d
   3       1   0     1   1   1   1   1      0   0   0   0   0     0   0   0   0   0     1   1   1   1   1
   2       0   1     0   0   0   0   0      1   1   1   1   1     1   1   1   1   1     0   0   0   0   0
   0       1   0     1   1   1   1   1      0   0   0   0   0     0   0   0   0   0     1   1   1   1   1
   2       0   1     0   0   0   0   0      1   1   1   1   1     1   1   1   1   1     0   0   0   0   0
   0       1   0     1   1   1   1   1      0   0   0   0   0     0   0   0   0   0     1   1   1   1   1
   0       0   1     0   0   0   0   0      1   1   1   1   1     1   1   1   1   1     0   0   0   0   0
   2       1   0     1   1   1   1   1      0   0   0   0   0     0   0   0   0   0     1   1   1   1   1
   0       0   1     0   0   0   0   0      1   1   1   1   1     1   1   1   1   1     0   0   0   0   0
EDIT: [DISREGARDFirst five lines of HomeTeamUpXo or HomeTeamDownXo et al

Code: Select all

Tie  HTu1o HTu2o  HTu3o  HTd1o  HTd2o  HTu1d  HTu2d  HTu3d
 1     0     0       0      0      0      0      0      0    
 0     0     0       0      0      0      0      0      1
 0     1     0       0      0      0      0      0      0
 0     0     0       0      0      0      1      0      0
 0     0     0       0      1      0      0      0      0
/DISREGARD. Status of offensive team enough, home effects taken into account elsewhere

Or is there a way to do (Player1offence*HomeTeam), a sort of interaction term which could be solved for, instead of making homecourt advantage it's own column? Could incorporate multiple interaction terms (right phrase?) easily that way.
Last edited by A Gravity Well on Sat Oct 17, 2015 3:50 am, edited 1 time in total.
Post Reply