RAPM request thread
Re: RAPM request thread
tony mitchell (6-6, 201314, mil, 10 min)
tony mitchell (6-8, 201314, det, 79 min)
marcus williams (6-3, 0607-0910, njn, gsw, mem)
marcus williams (6-7, 0708-0809, san, lac)
chris wright (6-8, 11-12 gsw, 13-14 mil)
chris wright (6-1, 12-13, dal)
dee brown (6-1, 9091-0102, bos, tor, orl)
dee brown (6-0, 0607, 0809, uta, was, pho)
tony mitchell (6-8, 201314, det, 79 min)
marcus williams (6-3, 0607-0910, njn, gsw, mem)
marcus williams (6-7, 0708-0809, san, lac)
chris wright (6-8, 11-12 gsw, 13-14 mil)
chris wright (6-1, 12-13, dal)
dee brown (6-1, 9091-0102, bos, tor, orl)
dee brown (6-0, 0607, 0809, uta, was, pho)
-
- Posts: 237
- Joined: Sat Feb 16, 2013 11:56 am
Re: RAPM request thread
Yes, I know those are the players, but I can't figure out who corresponds to which entry in the 15 year age adjusted file.
-
- Posts: 24
- Joined: Mon Jun 23, 2014 1:38 am
Re: RAPM request thread
How do you split RAPM into offensive and defensive?
Re: RAPM request thread
I explain it here https://www.youtube.com/watch?v=OuC0YZTADcEA Gravity Well wrote:How do you split RAPM into offensive and defensive?
-
- Posts: 24
- Joined: Mon Jun 23, 2014 1:38 am
Re: RAPM request thread
Lovely presentation.J.E. wrote:I explain it here https://www.youtube.com/watch?v=OuC0YZTADcEA Gravity Well wrote:How do you split RAPM into offensive and defensive?
You don't use "-1" for any dummy variable. As I understand -- wrongly understood? -- it, the home team is assigned +1 and the away team is assigned -1? And the result vector is positive if the home team has outscored the away team, and negative if the away team has outscored the home team?
Code: Select all
P1o P2o P3o P4o P5o P1d P2d P3d P4d P5d P6o P7o P8o P9o P10o P6d P7d P8d P9d P10d
1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 -1 -1 -1 -1 -1
0 0 0 0 0 1 1 1 1 1 -1 -1 -1 -1 -1 0 0 0 0 0
Re: RAPM request thread
The results field is simply the points scored in that specific possession (assuming ~190 possessions per game, 95 per team, which differs from most people's interpretation). There's no subtracting going on
Whether you assign a +1 or -1 to defensive dummies is just personal preference, although if the results field really was 'home_pts - away_pts' you'd have to assign '-1' to defenders
Whether you assign a +1 or -1 to defensive dummies is just personal preference, although if the results field really was 'home_pts - away_pts' you'd have to assign '-1' to defenders
Re: RAPM request thread
Thanks for responding to people's requests.
Is the data set that you're using available to the public in a computer-friendly format somewhere?
Is the data set that you're using available to the public in a computer-friendly format somewhere?
Re: RAPM request thread
I used to put matchupdata online, but the site where I got the PlayByPlay from asked me to take it down, so I didNate wrote:Is the data set that you're using available to the public in a computer-friendly format somewhere?
Basketballvalue http://basketballvalue.com/downloads.php at least has some data that you can play around with (goes through 2012)
-
- Posts: 24
- Joined: Mon Jun 23, 2014 1:38 am
Re: RAPM request thread
I suppose I have a conceptual hurdle to clear, then.J.E. wrote:The results field is simply the points scored in that specific possession (assuming ~190 possessions per game, 95 per team, which differs from most people's interpretation). There's no subtracting going on
Whether you assign a +1 or -1 to defensive dummies is just personal preference, although if the results field really was 'home_pts - away_pts' you'd have to assign '-1' to defenders
Code: Select all
library(glmnet)
thebeast <- read.csv("~/Desktop/beast.csv", header=TRUE)
poss <- as.numeric(thebeast$possessions)
margin <- as.numeric(thebeast$home_point_margin)
thebeast$possessions <- NULL
thebeast$home_point_margin <- NULL
mbeast <- data.matrix(thebeast)
lambda <- cv.glmnet(mbeast,margin,weights=poss,nfolds=5)
lambda.min <- lambda$lambda.min
ridge <- glmnet(mbeast,margin,family c=("gaussian"),poss,alpha=0,lambda=lambda.min)
coef(ridge,s=lambda.min)
My CSV is such that:
Code: Select all
possessions home_point_margin P1o P2o P3o P4o P5o P1d P2d P3d P4d P5d P6o P7o P8o P9o P10o P6d P7d P8d P9d P10d
5 100 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 -1 -1 -1 -1 -1
5 -75 0 0 0 0 0 1 1 1 1 1 -1 -1 -1 -1 -1 0 0 0 0 0
Code: Select all
home team on offense: Home_Margin (will be positive or zero) = P1o + P2o + P3o + P4o + P5o - (P6d + P7d + P8d +P9d + P10d)
home team on defense: Home_Margin (will be negative or zero) = P1d + P2d + P3d + P4d + P5d - (P6o + P7o + P8o +P9o + P10o)
Re: RAPM request thread
Your notation seems a bit confused. Do you have any experience with regression models?A Gravity Well wrote:...
Where have I gone off the rails? Wouldn't it have to be structured like this to account for HCA (adding a dummy variable of 1 for HCA in each row)?
-
- Posts: 24
- Joined: Mon Jun 23, 2014 1:38 am
Re: RAPM request thread
Ignore above. Had moment of clarity.
How does one then incorporate HCA -- or score differential at time of possession? How does one denote which team is the home team/which team is the away team? If I want to add "this team is up x with y minutes left" or "this team is down x with y minutes left"...
What I had been doing was believing +1/-1 was sufficient for offsetting home and away. From there I added a HCA column that was always set to 1. I had then used a number of different columns representing different score differentials (hometeamup4, hometeamup9, etc), but since my earlier way of doing this was bonkers busted, I am unsure of how to proceed.
EDIT:
...is it just as simple as splitting HCA into HCA_offense and HCA_defense?
Code: Select all
possessions resultv P1o P2o P3o P4o P5o P1d P2d P3d P4d P5d P6o P7o P8o P9o P10o P6d P7d P8d P9d P10d
1 3 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1
1 2 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0
1 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1
1 2 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0
1 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1
1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0
1 2 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1
1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0
What I had been doing was believing +1/-1 was sufficient for offsetting home and away. From there I added a HCA column that was always set to 1. I had then used a number of different columns representing different score differentials (hometeamup4, hometeamup9, etc), but since my earlier way of doing this was bonkers busted, I am unsure of how to proceed.
EDIT:
...is it just as simple as splitting HCA into HCA_offense and HCA_defense?
Last edited by A Gravity Well on Fri Oct 16, 2015 7:26 pm, edited 1 time in total.
-
- Posts: 24
- Joined: Mon Jun 23, 2014 1:38 am
Re: RAPM request thread
No. Experience, but not as much carryover as I'd have liked originally from my field.Nate wrote:Your notation seems a bit confused. Do you have any experience with regression models?A Gravity Well wrote:...
Where have I gone off the rails? Wouldn't it have to be structured like this to account for HCA (adding a dummy variable of 1 for HCA in each row)?
Re: RAPM request thread
I would be inclined to handle home court by having the same kind of split as with offense and defense - P1oh, P1dh, P1or, P1dr.A Gravity Well wrote:...
How does one then incorporate HCA -- or score differential at time of possession? How does one denote which team is the home team/which team is the away team? If I want to add "this team is up x with y minutes left" or "this team is down x with y minutes left"...
What I had been doing was believing +1/-1 was sufficient for offsetting home and away. From there I added a HCA column that was always set to 1. I had then used a number of different columns representing different score differentials (hometeamup4, hometeamup9, etc), but since my earlier way of doing this was bonkers busted, I am unsure of how to proceed.
EDIT:
...is it just as simple as splitting HCA into HCA_offense and HCA_defense?
To add a linear time left or scoring differential factor per player, you could have P1oh-diff, P1dh-diff, P1or-diff, P1d-diff *in addition to* the other independent variables, and have the score differential in them.
You can obviously run into sample size issues if you split things too much, and assumptions of linearity aren't always appropriate.
Re: RAPM request thread
This should work. One column for each possible score differential, although you might want put all situations with e.g. score_diff>30 into one bucketA Gravity Well wrote:I had then used a number of different columns representing different score differentials (hometeamup4, hometeamup9, etc)
For HCA one column should be enough, which is simply switched on during home team offensive possessions
Alternatively you could adjust the results vector Y by 'average_home_eff_per_possession' and 'average_away_eff_per_possession'.
So, if the home team scores 3 points, the result turns into, say, 3-1.07 = 1.93.
If the away team scores 3 points, the result turns into 3-1.04 = 1.96
Having a centered Y is good to have, anyway
-
- Posts: 24
- Joined: Mon Jun 23, 2014 1:38 am
Re: RAPM request thread
Wouldn't doing it like this allow the values of HCA or anything else added to be more knowable?Nate wrote:I would be inclined to handle home court by having the same kind of split as with offense and defense - P1oh, P1dh, P1or, P1dr.
To add a linear time left or scoring differential factor per player, you could have P1oh-diff, P1dh-diff, P1or-diff, P1d-diff *in addition to* the other independent variables, and have the score differential in them.
You can obviously run into sample size issues if you split things too much, and assumptions of linearity aren't always appropriate.
Code: Select all
resultv HCo HCd P1o P2o P3o P4o P5o P1d P2d P3d P4d P5d P6o P7o P8o P9o P10o P6d P7d P8d P9d P10d
3 1 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1
2 0 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0
0 1 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1
2 0 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0
0 1 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1
0 0 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0
2 1 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1
0 0 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0
Code: Select all
Tie HTu1o HTu2o HTu3o HTd1o HTd2o HTu1d HTu2d HTu3d
1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1
0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 1 0 0
0 0 0 0 1 0 0 0 0
Or is there a way to do (Player1offence*HomeTeam), a sort of interaction term which could be solved for, instead of making homecourt advantage it's own column? Could incorporate multiple interaction terms (right phrase?) easily that way.
Last edited by A Gravity Well on Sat Oct 17, 2015 3:50 am, edited 1 time in total.