Page 2 of 3

Re: Demystifying Ridge Regression

Posted: Tue May 07, 2013 3:12 pm
by talkingpractice
Fwiw, a kid who used to intern for us did 2 different intro R classes as well as a game theory class at Coursera, and said they were all the nuts.

Re: Demystifying Ridge Regression

Posted: Tue May 07, 2013 9:48 pm
by AcrossTheCourt
I've heard of using -3 or -2 for rookies as the standard. But there are clearly rookies who have an impact right away (guys who spent a while in school like Duncan or, oddly enough, Rubio), and there are clearly rookies who are over their heads (Austin Rivers, Morrison.) So it doesn't seem ideal to use the same prior for everyone.

What about a basic statistical plus minus model for rookies for their priors? Would using their +/- from the NPI model be too kooky?

Edit: I just realized it'd be hard to apply a statistical model because rookies will generally have less of an impact than their stats suggest.

Re: Demystifying Ridge Regression

Posted: Tue May 07, 2013 9:59 pm
by Mike G
Aren't high draft picks expected to be better than later picks?
Why not give a #12 pick the average value (prior) of a #12 pick?

Re: Demystifying Ridge Regression

Posted: Wed May 08, 2013 5:21 am
by v-zero
As both of you suggest there is plenty more you can do to improve rookie priors, but I don't want to bog down in something like that on this thread. None of your suggestions are bad, except perhaps the use of NPI values, as that creates a confirmation bias for the model.

Re: Demystifying Ridge Regression

Posted: Tue May 14, 2013 8:44 pm
by DSMok1
Since V-Zero started this thread to try to explain ridge regression in layman's terms, I'll honor his request to keep this thread for that purpose.

For greater details on the ridge regression and how it differs from OLS, please see Mystic's post at Contrasting Ridge Regression and OLS



A few notes from Mystic & V-Zero's exchange which has been deleted:
v-zero wrote:The maths really doesn't matter like you suggest, and understanding matrix algebra is good, but not everything. I do understand all of that, as do a great many, but many more don't, and lots of people don't have time to re-learn or newly learn a whole new mess of maths. This thread isn't about formalities, they are an impediment to grasping an idea in so many instances for so many people.
mystic wrote:At a certain point the math has to be at least briefly introduced in order to really understand an issue. And I felt that your explanation so far is not sufficient, let alone that your first post gives the impression that the prior is per se part of what is called ridge regression.
Both v-zero and mystic had valid points here--over-simplifying a complex topic can be useful for relative novices, but also such over simplification may hinder comprehensive understanding as the learner progresses. I personally try to write so novices can understand, because most folks aren't interested in the technicalities, but if such a person wants to do the math themselves, they need to be able to find out how to do it rigorously.

Re: Demystifying Ridge Regression

Posted: Tue May 14, 2013 8:55 pm
by v-zero
DSMok1 wrote:Since V-Zero started this thread to try to explain ridge regression in layman's terms, I'll honor his request to keep this thread for that purpose.

For greater details on the ridge regression and how it differs from OLS, please see Mystic's post at Contrasting Ridge Regression and OLS
Thanks. I have deleted my posts from that thread as they now offer nothing useful, apologies for my frustrated reaction. I have also posted a link and short blurb in my opening post to guide those who desire to know more of the mathematics.

Re: Demystifying Ridge Regression

Posted: Wed May 15, 2013 7:38 pm
by TheSpiceWeasel
v-zero wrote:I wish you hadn't posted that. That is a good explanation, but the maths really doesn't matter like you suggest, and understanding matrix algebra is good, but not everything. I do understand all of that, as do a great many, but many more don't, and lots of people don't have time to re-learn or newly learn a whole new mess of maths. This thread isn't about formalities, they are an impediment to grasping an idea in so many instances for so many people.
Actually, you should probably wish you hadn't posted what you did. Because it implies you don't really know regression analysis, just enough to be dangerous.

I would hardly consider the mathematical derivation a "formality". It's the basis upon which the results can be interpreted. In other words, if you don't understand what's going into it, how can you possibly expect to understand what's coming out of it?
v-zero wrote:Also, the notion that you need to grasp the abstraction of an idea in order to grasp the reality is rather odd, and seems to be an obsession of Mathematicians at the undergrad level (not that I'm suggesting that you're at the undergrad level). Maths is a tool, a wonderful, beautiful tool; but it is not reality. Nobody should think in abstractions, that's for once you're done with the thinking.
"Ah, thanks for the clarification" said nobody. Again, if you don't know anything about the "maths" that go into a procedure, then do you really think you can be an expert on what comes out of it?

Re: Demystifying Ridge Regression

Posted: Thu May 16, 2013 5:32 am
by v-zero
The formal derivation is just that, formal. I was never looking to give people deep understanding of Bayesian MAP estimation or anything like that via this thread, rather to relate in a much more technically understandable way what the 'machines' are doing in the background. I fully endorse trying to gain a deeper understanding, it's always best to get a good grip on that stuff, but often it is much easier if you are first comfortable with what's going on in a less formal sense, the motion of the numbers, so to speak.

If you want to assume that I don't understand regression analysis then that's fine, but please don't barge in here and confuse the intention of this thread for something that it is not - I'm not trying to take anybody to expert level in this thread, I'm trying to offer a foot in the door.

Re: Demystifying Ridge Regression

Posted: Thu May 16, 2013 8:39 am
by kpascual
Is ridge regression "better" than a relatively simpler model? And by "better", I'm thinking not just in terms of predictive power, but also explain-ability. I'm sure you could prove that a ridge-regressed model has lower error/is more predictive than a vanilla OLS or even simpler model. But how much better, and is it worth the added predictive power if it's a lot harder to explain, especially to a non-stat-sy decision maker?

I personally have trouble with the RAPM stat, partially because of its name. When I think of ridge regression, I start thinking about mountains, and when my mind hears RAPM, it thinks of dinosaurs or Chris Bosh because it looks like the word "raptor". This could just be an education thing, kind of like how I'm personally more comfortable with OPS and FIP nowadays. But in general acronyms are mystifying when you're not in on the lingo, and I think that contributes to the mystifying-ness of the stat, in addition to its statistical interpretation and implementation.

Re: Demystifying Ridge Regression

Posted: Thu May 16, 2013 10:11 am
by mystic
kpascual wrote:Is ridge regression "better" than a relatively simpler model? And by "better", I'm thinking not just in terms of predictive power, but also explain-ability. I'm sure you could prove that a ridge-regressed model has lower error/is more predictive than a vanilla OLS or even simpler model. But how much better, and is it worth the added predictive power if it's a lot harder to explain, especially to a non-stat-sy decision maker?
Maybe a look at the thread which was seperated from this one helps with that. First, there is the existence thereom, which says that you can always find a lambda such that the MSE will be smaller than for OLS. And then you have to understand that ridge regression merely means an introduced bias, where you add a term "lambda*identity matrix" additive to the normal OLS. It is really not that difficult in the end.

The advantages are especially big, if you have a sample with multicollinearities (usually existing in ill-posed problems). That is the case for basketball and a normal OLS can't solve the issues adequately. As an example we may look at Kobe Bryant and Pau Gasol during the 2010-11 season. The normal OLS gives us -8.5 for Bryant and +11.9 for Gasol (according to Barzilai's basketballvalue.com). Now, the main issue is that both played together 2400 minutes. The remaining minutes without each other are used to determine which one of those has the positive and the negative value. But, small changes here in that much smaller sample without each other could completely switch the values for those two, where Bryant could have the +10 and Gasol the -8 value. That's how extreme the multicollinearity issues influence the results in OLS.

To help with that issue, a stabilization of the regression is necessary. And ridge regression is just doing that. Adding the bias to the regression will introduce a constrain to how much a value can differ from the mean. In our case the mean is 0. The lambda as the constrain does not allow huge positive and negative values, all players are getting basically stringed together. With such method we get Bryant and Gasol much closer together, with Gasol being (depending on the weighting scheme) at about +3 and Bryant being at +1.5. In essence, when the sample without each other is smaller, the players can't be seperated much. While the OLS would use any kind of little information in order to seperate two players in a big way, the ridge regression does not do that.
kpascual wrote: I personally have trouble with the RAPM stat, partially because of its name. When I think of ridge regression, I start thinking about mountains, and when my mind hears RAPM, it thinks of dinosaurs or Chris Bosh because it looks like the word "raptor". This could just be an education thing, kind of like how I'm personally more comfortable with OPS and FIP nowadays. But in general acronyms are mystifying when you're not in on the lingo, and I think that contributes to the mystifying-ness of the stat, in addition to its statistical interpretation and implementation.
Ignore the name, a name is just that, a name is just a label, the meaning comes with the definition. The name ridge comes from Hoerl and Kennard, 1970. Hoerl has used a method called "ridge analysis" before, that's how the name comes into play. What you have there is basically a graphical solution which presents the dependencies between reponses and factors. And that indeed looks like a ridge. But really, forget about that, because, as you mentioned, that doesn't really help.
Ridge regression is also called "regularization", and at that, as I previously said, a stabilization algorithm. Just imagine that RAPM simply means "improved APM" instead, because that's what it basically is.

The implementation is rather easy, the only thing to consider is determining the ridge factor (lambda) via cross validation, but essentially for similar sized samples the lambda shouldn't differ much. And by using statistical tools like R, where you can find various regression scripts, the cross validation process to determine the lambda is not that difficult. Once you have the lambda, it is basically the same as an OLS. Using matrix algebra will help understand that part, at least to my taste, easily. You can interpret the lambda here as some sort of prior used, from which the end result for each player can't differ much. In the case of non-prior informed ridge regression that value would be 0, the mean. So, and now you may just jump to the point where you think of ridge regression as a method which just regresses all player values to the mean, which helps the predictive power, and where we come back to the previously mentioned existence theorem, which was later proven by Chawla. This means, the predictive power is per se better for ridge than for normal OLS.

Overall, the next time you see the term "RAPM", just think about that as a value derived from regression to the mean. And you can easily interpret the value as the amount of points a single player influences the result per 100 possession for a team. In average a team had 92 possession (last season), they played about 48.3 minutes. If you then multiply the player's invidual RAPM value with 0.92 and then with his mpg devided by 48.3, you get a value by how much a player influenced the scoring margin per game for a specific team with his presence on the court. What exactly he did, can't be said with that value, just that he either helped or not in comparison to an artificial average player, who has per se the value 0. And at that point, we would dive into the discussion about replacement value versus value over average, but that is a different topic.

Hope, that at last helped a bit.

Re: Demystifying Ridge Regression

Posted: Thu May 16, 2013 11:18 am
by DSMok1
RAPM = Linear regression with regression to the mean or some other value built-in. It works way better in dealing with the issues with plain APM than doing a plain APM and regressing to the mean afterwards, because it greatly helps the collinearity [why does my spell check always flag that?] problem in APM. Properly done, the amount of regression, and to what mean or prior, is or can be found using out of sample validation.

Re: Demystifying Ridge Regression

Posted: Thu May 16, 2013 8:29 pm
by kpascual
My question wasn't about the stat or the statistical method itself, I understand how regularized regression works. It's really about model selection within the context of communicating it to another person so that they do something with it. Applied statistics, I suppose.

It's my opinion that a simpler model that's easily explained is usually better than a more complex model that's harder to explain, because the end user has a greater probability of listening and doing something with it. This is obviously context-dependent: Mark Cuban would probably be fine with understanding and using RAPM, but Jim Buss or the Maloofs might be less comfortable.

My point is that communicability matters, IMO more than the model itself, and to someone less statistically-inclined, it might not be worth the effort to understand RAPM. I'm sure this thread exists to lower those barriers, but I wanted to point out why it could be hard for someone to accept RAPM beyond the statistical aspects and matrix algebra.

Re: Demystifying Ridge Regression

Posted: Thu May 16, 2013 8:55 pm
by mystic
Oh, tbh, your second post sounds completely different than your first. But anyway, I think that this is indeed worth the effort, because the differences in terms of math aren't that huge to begin with. And RAPM is so much better than APM, that it shouldn't even be a question at all. We have an ill-posed problem, that should be clear, and the OLS is not able to handle the multicollinearities.

If I would need to explain ridge to someone who can handle the results from an OLS, I would just say what DSMok1 wrote: Linear regression with regression to the mean built-in, while overall resulting into a much better reliability. Not quite sure, either the person trusts me on that or not; that is the only question here, if the person can't handle the math anyway.

Re: Demystifying Ridge Regression

Posted: Thu May 16, 2013 10:32 pm
by EvanZ
To paraphrase that girl in the AT&T commercial, WE WANT LESS FITTING, WE WANT LESS FITTING. :lol:

Re: Demystifying Ridge Regression

Posted: Fri May 17, 2013 12:45 am
by ed küpfer
I would like to see a worked out example, with R code if possible.