modeling player compatibility

Home for all your discussion of basketball statistical analysis.
Post Reply
italia13calcio
Posts: 98
Joined: Sun Dec 08, 2013 2:54 am

modeling player compatibility

Post by italia13calcio » Sat Aug 04, 2018 1:53 am

Hi all!

Wanted to share some work I've been working on, which I think is best described as attempting to model player compatibility. I had a poster at CASSIS but wanted to make it more widely available.

Here is a link to a github page which has a link to both the poster and a bunch of the code used for the project.

https://hwchase17.github.io/sports/

Would love to get some feedback - not sure how much more work I will do on it, but at the very least would love to know what people find interesting/not interesting/surprising/scary.
https://hwchase17.github.io/sports/

Follow me @aabsstats - I follow back ;)

Crow
Posts: 5286
Joined: Thu Apr 14, 2011 11:10 pm

Re: modeling player compatibility

Post by Crow » Sat Aug 04, 2018 2:17 am

Good project. I can't view the data on my phone right now but I will look at it in next few days.

DSMok1
Posts: 830
Joined: Thu Apr 14, 2011 11:18 pm
Location: Maine
Contact:

Re: modeling player compatibility

Post by DSMok1 » Mon Aug 06, 2018 1:26 pm

This is really impressive work! The results in the LeBron case study certainly make sense intuitively.

My biggest question on a model like this, with a huge number of features, is how were the estimates were cross-validated--how was random noise was accounted for?
Developer of Box Plus/Minus
APBRmetrics Forum Administrator
GodismyJudgeOK.com/DStats/
Twitter.com/DSMok1

Crow
Posts: 5286
Joined: Thu Apr 14, 2011 11:10 pm

Re: modeling player compatibility

Post by Crow » Wed Aug 08, 2018 8:07 pm

Looked at the poster. Could you say something to explain the data in bottom left xcorner of poster? I would want to understand this but it is not obvious.

The LBJ example wets the appetite to see more examples.

I don't understand how to use the links to notebook, beyond seeing code. How do I see the data that would allow me to review other player cases? What file format do I pick, what software do I need? Does using any of this absolutely require Python? Any way to make it available in Excel? Help would be appreciated.

italia13calcio
Posts: 98
Joined: Sun Dec 08, 2013 2:54 am

Re: modeling player compatibility

Post by italia13calcio » Sat Aug 11, 2018 12:10 am

DSMok1 wrote:
Mon Aug 06, 2018 1:26 pm
This is really impressive work! The results in the LeBron case study certainly make sense intuitively.

My biggest question on a model like this, with a huge number of features, is how were the estimates were cross-validated--how was random noise was accounted for?
Thanks! Not entirely sure what you mean but will try my best to answer. I performed cross validation on the regularization parameter for each model (just using standard sklearn LogisticRegressionCV), and then used the best one to fit a model on the full data. Is that what you were asking?
Crow wrote:
Wed Aug 08, 2018 8:07 pm
Looked at the poster. Could you say something to explain the data in bottom left xcorner of poster? I would want to understand this but it is not obvious.
The data in the bottom left corner is comparing the out of sample performance of my model against several other models. For example, I made the choice to first model shot location, then whether it went it or not. Is that a 'good' decision? If you define 'good' as being what performs best a predicting out of sample, using log loss as your metric, then those metrics seem to suggest that it was a good choice. What including position coefficients a 'good' choice? That is a little less clear as the metrics aren't much better.

The big caveat, and perhaps this is what DSMok1 was getting at, is those metrics are only the results from predicting out of sample once. (I used last 130 games as out of sample). Ideally, I would run this 100 or so times, hold a different random subset of games out as a test set, and then get 100 different sets of out of sample metrics and could see how many times out of 100 the decision to use my model framework was 'good' compared to others.
Crow wrote:
Wed Aug 08, 2018 8:07 pm
I don't understand how to use the links to notebook, beyond seeing code. How do I see the data that would allow me to review other player cases? What file format do I pick, what software do I need? Does using any of this absolutely require Python? Any way to make it available in Excel? Help would be appreciated.
I'm actually working on a small java site/app that would allow you to see how any two players would interact, once that it up that output will be on github in some form, can update then.
https://hwchase17.github.io/sports/

Follow me @aabsstats - I follow back ;)

Post Reply