Page 5 of 11
Re: RAPM request thread
Posted: Wed Sep 09, 2015 2:22 am
by bchaikin
tony mitchell (6-6, 201314, mil, 10 min)
tony mitchell (6-8, 201314, det, 79 min)
marcus williams (6-3, 0607-0910, njn, gsw, mem)
marcus williams (6-7, 0708-0809, san, lac)
chris wright (6-8, 11-12 gsw, 13-14 mil)
chris wright (6-1, 12-13, dal)
dee brown (6-1, 9091-0102, bos, tor, orl)
dee brown (6-0, 0607, 0809, uta, was, pho)
Re: RAPM request thread
Posted: Sat Sep 12, 2015 4:30 am
by AcrossTheCourt
Yes, I know those are the players, but I can't figure out who corresponds to which entry in the 15 year age adjusted file.
Re: RAPM request thread
Posted: Fri Oct 16, 2015 3:34 am
by A Gravity Well
How do you split RAPM into offensive and defensive?
Re: RAPM request thread
Posted: Fri Oct 16, 2015 8:15 am
by J.E.
A Gravity Well wrote:How do you split RAPM into offensive and defensive?
I explain it here
https://www.youtube.com/watch?v=OuC0YZTADcE
Re: RAPM request thread
Posted: Fri Oct 16, 2015 9:53 am
by A Gravity Well
Lovely presentation.
You don't use "-1" for any dummy variable. As I understand -- wrongly understood? -- it, the home team is assigned +1 and the away team is assigned -1? And the result vector is positive if the home team has outscored the away team, and negative if the away team has outscored the home team?
Code: Select all
P1o P2o P3o P4o P5o P1d P2d P3d P4d P5d P6o P7o P8o P9o P10o P6d P7d P8d P9d P10d
1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 -1 -1 -1 -1 -1
0 0 0 0 0 1 1 1 1 1 -1 -1 -1 -1 -1 0 0 0 0 0
Re: RAPM request thread
Posted: Fri Oct 16, 2015 10:38 am
by J.E.
The results field is simply the points scored in that specific possession (assuming ~190 possessions per game, 95 per team, which differs from most people's interpretation). There's no subtracting going on
Whether you assign a +1 or -1 to defensive dummies is just personal preference, although if the results field really was 'home_pts - away_pts' you'd have to assign '-1' to defenders
Re: RAPM request thread
Posted: Fri Oct 16, 2015 12:47 pm
by Nate
Thanks for responding to people's requests.
Is the data set that you're using available to the public in a computer-friendly format somewhere?
Re: RAPM request thread
Posted: Fri Oct 16, 2015 2:50 pm
by J.E.
Nate wrote:Is the data set that you're using available to the public in a computer-friendly format somewhere?
I used to put matchupdata online, but the site where I got the PlayByPlay from asked me to take it down, so I did
Basketballvalue
http://basketballvalue.com/downloads.php at least has some data that you can play around with (goes through 2012)
Re: RAPM request thread
Posted: Fri Oct 16, 2015 4:42 pm
by A Gravity Well
J.E. wrote:The results field is simply the points scored in that specific possession (assuming ~190 possessions per game, 95 per team, which differs from most people's interpretation). There's no subtracting going on
Whether you assign a +1 or -1 to defensive dummies is just personal preference, although if the results field really was 'home_pts - away_pts' you'd have to assign '-1' to defenders
I suppose I have a conceptual hurdle to clear, then.
Code: Select all
library(glmnet)
thebeast <- read.csv("~/Desktop/beast.csv", header=TRUE)
poss <- as.numeric(thebeast$possessions)
margin <- as.numeric(thebeast$home_point_margin)
thebeast$possessions <- NULL
thebeast$home_point_margin <- NULL
mbeast <- data.matrix(thebeast)
lambda <- cv.glmnet(mbeast,margin,weights=poss,nfolds=5)
lambda.min <- lambda$lambda.min
ridge <- glmnet(mbeast,margin,family c=("gaussian"),poss,alpha=0,lambda=lambda.min)
coef(ridge,s=lambda.min)
That's my code.
My CSV is such that:
Code: Select all
possessions home_point_margin P1o P2o P3o P4o P5o P1d P2d P3d P4d P5d P6o P7o P8o P9o P10o P6d P7d P8d P9d P10d
5 100 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 -1 -1 -1 -1 -1
5 -75 0 0 0 0 0 1 1 1 1 1 -1 -1 -1 -1 -1 0 0 0 0 0
r1 is the home team on offense, r2 is is the away team on offense, and home point margin is per 100 possessions. I thought doing it like this would have the calculation be:
Code: Select all
home team on offense: Home_Margin (will be positive or zero) = P1o + P2o + P3o + P4o + P5o - (P6d + P7d + P8d +P9d + P10d)
home team on defense: Home_Margin (will be negative or zero) = P1d + P2d + P3d + P4d + P5d - (P6o + P7o + P8o +P9o + P10o)
Where have I gone off the rails? Wouldn't it have to be structured like this to account for HCA (adding a dummy variable of 1 for HCA in each row)?
Re: RAPM request thread
Posted: Fri Oct 16, 2015 7:06 pm
by Nate
A Gravity Well wrote:...
Where have I gone off the rails? Wouldn't it have to be structured like this to account for HCA (adding a dummy variable of 1 for HCA in each row)?
Your notation seems a bit confused. Do you have any experience with regression models?
Re: RAPM request thread
Posted: Fri Oct 16, 2015 7:21 pm
by A Gravity Well
Ignore above. Had moment of clarity.
Code: Select all
possessions resultv P1o P2o P3o P4o P5o P1d P2d P3d P4d P5d P6o P7o P8o P9o P10o P6d P7d P8d P9d P10d
1 3 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1
1 2 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0
1 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1
1 2 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0
1 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1
1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0
1 2 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1
1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0
How does one then incorporate HCA -- or score differential at time of possession? How does one denote which team is the home team/which team is the away team? If I want to add "this team is up x with y minutes left" or "this team is down x with y minutes left"...
What I had been doing was believing +1/-1 was sufficient for offsetting home and away. From there I added a HCA column that was always set to 1. I had then used a number of different columns representing different score differentials (hometeamup4, hometeamup9, etc), but since my earlier way of doing this was bonkers busted, I am unsure of how to proceed.
EDIT:
...is it just as simple as splitting HCA into HCA_offense and HCA_defense?
Re: RAPM request thread
Posted: Fri Oct 16, 2015 7:22 pm
by A Gravity Well
Nate wrote:A Gravity Well wrote:...
Where have I gone off the rails? Wouldn't it have to be structured like this to account for HCA (adding a dummy variable of 1 for HCA in each row)?
Your notation seems a bit confused. Do you have any experience with regression models?
No. Experience, but not as much carryover as I'd have liked originally from my field.
Re: RAPM request thread
Posted: Fri Oct 16, 2015 9:47 pm
by Nate
A Gravity Well wrote:...
How does one then incorporate HCA -- or score differential at time of possession? How does one denote which team is the home team/which team is the away team? If I want to add "this team is up x with y minutes left" or "this team is down x with y minutes left"...
What I had been doing was believing +1/-1 was sufficient for offsetting home and away. From there I added a HCA column that was always set to 1. I had then used a number of different columns representing different score differentials (hometeamup4, hometeamup9, etc), but since my earlier way of doing this was bonkers busted, I am unsure of how to proceed.
EDIT:
...is it just as simple as splitting HCA into HCA_offense and HCA_defense?
I would be inclined to handle home court by having the same kind of split as with offense and defense - P1oh, P1dh, P1or, P1dr.
To add a linear time left or scoring differential factor per player, you could have P1oh-diff, P1dh-diff, P1or-diff, P1d-diff *in addition to* the other independent variables, and have the score differential in them.
You can obviously run into sample size issues if you split things too much, and assumptions of linearity aren't always appropriate.
Re: RAPM request thread
Posted: Fri Oct 16, 2015 10:17 pm
by J.E.
A Gravity Well wrote:I had then used a number of different columns representing different score differentials (hometeamup4, hometeamup9, etc)
This should work. One column for each possible score differential, although you might want put all situations with e.g. score_diff>30 into one bucket
For HCA one column should be enough, which is simply switched on during home team offensive possessions
Alternatively you could adjust the results vector Y by 'average_home_eff_per_possession' and 'average_away_eff_per_possession'.
So, if the home team scores 3 points, the result turns into, say, 3-1.07 = 1.93.
If the away team scores 3 points, the result turns into 3-1.04 = 1.96
Having a centered Y is good to have, anyway
Re: RAPM request thread
Posted: Fri Oct 16, 2015 10:29 pm
by A Gravity Well
Nate wrote:I would be inclined to handle home court by having the same kind of split as with offense and defense - P1oh, P1dh, P1or, P1dr.
To add a linear time left or scoring differential factor per player, you could have P1oh-diff, P1dh-diff, P1or-diff, P1d-diff *in addition to* the other independent variables, and have the score differential in them.
You can obviously run into sample size issues if you split things too much, and assumptions of linearity aren't always appropriate.
Wouldn't doing it like this allow the values of HCA or anything else added to be more knowable?
Code: Select all
resultv HCo HCd P1o P2o P3o P4o P5o P1d P2d P3d P4d P5d P6o P7o P8o P9o P10o P6d P7d P8d P9d P10d
3 1 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1
2 0 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0
0 1 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1
2 0 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0
0 1 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1
0 0 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0
2 1 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1
0 0 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0
EDIT: [DISREGARDFirst five lines of HomeTeamUpXo or HomeTeamDownXo et al
Code: Select all
Tie HTu1o HTu2o HTu3o HTd1o HTd2o HTu1d HTu2d HTu3d
1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1
0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 1 0 0
0 0 0 0 1 0 0 0 0
/DISREGARD. Status of offensive team enough, home effects taken into account elsewhere
Or is there a way to do (Player1offence*HomeTeam), a sort of interaction term which could be solved for, instead of making homecourt advantage it's own column? Could incorporate multiple interaction terms (right phrase?) easily that way.