Page 1 of 3
Talking about data acquisition
Posted: Mon Apr 22, 2013 4:17 am
by andylarsen
To me, the hardest part in doing Basketball Analytics right now is data manipulation. I have enough of a stats/math background that I have various tools that I'd love to run numbers through, but actually getting that data in a format I can work with (generally just a simple CSV will work) is hard to make happen.
For example, the data at basketball-reference is useful, and it's fantastic that I can format individual tables as CSVs. However, it's difficult to get in a format where you're looking at differences between players in any sort of meaningful way; you have to download each player's table one at a time, then do some excel work to make it workable. It would be great if there were some sort of downloadable database with player statistics. Indeed, one existed, but it's only updated through 2009 (databasebasketball.com).
Ditto with the +/- numbers. There's a real need right now for someone to keep public, updated RAPM numbers, and I have the software/knowhow to do that... except the play-by-play data is so scattered. With a play-by-play parser, or database along the lines of the old basketballvalue one, this would be a much easier project to tackle.
Another example, while we're at it: Synergy. The Synergy Silverlight app makes it relatively easy to see how a player is doing, but difficult to compare players, or do any sort of real research into the Synergy numbers league-wide. While this is maybe the hardest task (because the numbers are all presented via Silverlight and not in HTML like the other sources), it would be of real value to the basketball community.
So, maybe you guys know some tricks I don't: how do you acquire your data, and get it into workable formats? Any tips to share? I figure I can't be the only one who has this same roadblock on a regular basis.
Re: Talking about data acquisition
Posted: Mon Apr 22, 2013 6:00 am
by Statman
andylarsen wrote:So, maybe you guys know some tricks I don't: how do you acquire your data, and get it into workable formats? Any tips to share? I figure I can't be the only one who has this same roadblock on a regular basis.
Believe me - you aren't the only one with this problem. I actually have been very college b-ball focused for a while (as many on here know) - and it's even MUCH more difficult than compiling data from the NBA. 347 teams, inconsistencies in the data between sources, trying to get (accurate) player year (frosh, soph, etc) - not to mention trying to get player ages.
I have so much work to do before the NBA draft - and roadblocks abound that soak up my time greatly.
I feel your pain - and welcome ANY ideas - with hopes I can find something that can help.
Re: Talking about data acquisition
Posted: Mon Apr 22, 2013 2:39 pm
by Jon Nichols
andylarsen wrote:
For example, the data at basketball-reference is useful, and it's fantastic that I can format individual tables as CSVs. However, it's difficult to get in a format where you're looking at differences between players in any sort of meaningful way; you have to download each player's table one at a time, then do some excel work to make it workable. It would be great if there were some sort of downloadable database with player statistics. Indeed, one existed, but it's only updated through 2009 (databasebasketball.com).
I would suggest one thing: go to the league summary pages at B-R, such as here:
http://www.basketball-reference.com/lea ... otals.html
If you hover your mouse over the "Stats" tab at the top, you can toggle between Totals, Per 36, Per Game, etc. That's a lot faster than going player by player, and everything on those pages can also be grabbed in csv form. If all you were looking for was player data over the last 20 years, that could be done much faster using these pages. You can also establish data connections through Excel.
Re: Talking about data acquisition
Posted: Mon Apr 22, 2013 3:52 pm
by DSMok1
Jon Nichols wrote:andylarsen wrote:
For example, the data at basketball-reference is useful, and it's fantastic that I can format individual tables as CSVs. However, it's difficult to get in a format where you're looking at differences between players in any sort of meaningful way; you have to download each player's table one at a time, then do some excel work to make it workable. It would be great if there were some sort of downloadable database with player statistics. Indeed, one existed, but it's only updated through 2009 (databasebasketball.com).
I would suggest one thing: go to the league summary pages at B-R, such as here:
http://www.basketball-reference.com/lea ... otals.html
If you hover your mouse over the "Stats" tab at the top, you can toggle between Totals, Per 36, Per Game, etc. That's a lot faster than going player by player, and everything on those pages can also be grabbed in csv form. If all you were looking for was player data over the last 20 years, that could be done much faster using these pages. You can also establish data connections through Excel.
I have run up against the same hurdles, and still use BBref as my primary data source (with Excel macros/data connections to download large quantities of data).
Re: Talking about data acquisition
Posted: Mon Apr 22, 2013 5:49 pm
by Kevin Pelton
I'd note that some of what you're finding is by design. For example, the ability to sort and manage Synergy data is part of what teams/media corporations are paying for with the full version.
Re: Talking about data acquisition
Posted: Thu Apr 25, 2013 5:11 am
by AcrossTheCourt
It'd be great to have an NBA data warehouse for stuff like this that isn't someone's website where the place is abandoned once the researcher is bought by a team. Just some shared site for important data like play by play csv's and yearly adjusted/regularized plus minus numbers.
Has anyone parsed that data from the late 90's? That's a project we should focus on. It's play by play data from the Jordan Bulls era and you can get great estimates for prime Shaq.
Re: Talking about data acquisition
Posted: Thu Apr 25, 2013 6:02 pm
by Ori
AcrossTheCourt wrote:It'd be great to have an NBA data warehouse for stuff like this that isn't someone's website where the place is abandoned once the researcher is bought by a team. Just some shared site for important data like play by play csv's and yearly adjusted/regularized plus minus numbers.
Has anyone parsed that data from the late 90's? That's a project we should focus on. It's play by play data from the Jordan Bulls era and you can get great estimates for prime Shaq.
How can you expect someone to be interested enough in basketball to focus on creating public data, but not jump at the opportunity to work for a team? From what I understand, generally the team doesn't give them a choice and that is why the updates stop occurring.
I guess I'm just saying, you can't blame them.

Re: Talking about data acquisition
Posted: Thu Apr 25, 2013 8:28 pm
by AcrossTheCourt
No, not blaming them for joining a team. The problem is once they do the website is toast. So you need a website with a large group of people or some system where certain roles can be filled once the person is gone.
Re: Talking about data acquisition
Posted: Fri Apr 26, 2013 12:49 am
by EvanZ
AcrossTheCourt wrote:No, not blaming them for joining a team. The problem is once they do the website is toast. So you need a website with a large group of people or some system where certain roles can be filled once the person is gone.
You need someone like me who is happy with his current job and not looking to be hired by the NBA (although I can't say the opportunity hasn't been presented).
And if I did take a job, I'd make it a condition of being hired that the site would have to stay up.
FWIW, I'm actually planning to get the 90's data (as far back as I can go) this summer and put it on nbawowy. Look for it.
Re: Talking about data acquisition
Posted: Fri Apr 26, 2013 2:08 am
by DSMok1
EvanZ wrote:AcrossTheCourt wrote:No, not blaming them for joining a team. The problem is once they do the website is toast. So you need a website with a large group of people or some system where certain roles can be filled once the person is gone.
You need someone like me who is happy with his current job and not looking to be hired by the NBA (although I can't say the opportunity hasn't been presented).
And if I did take a job, I'd make it a condition of being hired that the site would have to stay up.
FWIW, I'm actually planning to get the 90's data (as far back as I can go) this summer and put it on nbawowy. Look for it.
I'm with Evan here--I am not looking to get hired either. Unfortunately, my data analysis/programming skills aren't what Evan's are!
Hopefully some of the data will be pulled together for easier usage sometime soon--more and more people have the ability to compile it.
Re: Talking about data acquisition
Posted: Fri Apr 26, 2013 10:55 pm
by AcrossTheCourt
EvanZ wrote:AcrossTheCourt wrote:No, not blaming them for joining a team. The problem is once they do the website is toast. So you need a website with a large group of people or some system where certain roles can be filled once the person is gone.
You need someone like me who is happy with his current job and not looking to be hired by the NBA (although I can't say the opportunity hasn't been presented).
And if I did take a job, I'd make it a condition of being hired that the site would have to stay up.
FWIW, I'm actually planning to get the 90's data (as far back as I can go) this summer and put it on nbawowy. Look for it.
I'd love to see that 90's data in any form, though it'd be fun to play around with the raw data. Thanks for that.
Re: Talking about data acquisition
Posted: Sat Apr 27, 2013 4:16 pm
by kpascual
I've been thinking for a while about building an API on top of my site, so that my play by play, shot chart, and fiveman datas can be publicly accessible to you all. Actually it's already 70% built.
But I hesitate to do so mainly because I would incur all the bandwidth costs, and would have to deal with all the complaints about data quality, too. Plus I spent countless hours building all the scripts, and man is it a lot of ***** work.
I would also like to point out that my NBA data scraping scripts are open-sourced, so you can get the data on your own computer.
https://github.com/kpascual/nbascrape
Re: Talking about data acquisition
Posted: Sat Apr 27, 2013 5:38 pm
by EvanZ
Are you running everything off your own personal server?
Re: Talking about data acquisition
Posted: Sun Apr 28, 2013 2:07 am
by Ori
kpascual wrote:I've been thinking for a while about building an API on top of my site, so that my play by play, shot chart, and fiveman datas can be publicly accessible to you all. Actually it's already 70% built.
But I hesitate to do so mainly because I would incur all the bandwidth costs, and would have to deal with all the complaints about data quality, too. Plus I spent countless hours building all the scripts, and man is it a lot of fucking work.
I would also like to point out that my NBA data scraping scripts are open-sourced, so you can get the data on your own computer.
https://github.com/kpascual/nbascrape
I'm really excited to look through this, thank you.
Re: Talking about data acquisition
Posted: Sun Apr 28, 2013 5:09 pm
by kpascual
EvanZ wrote:Are you running everything off your own personal server?
Shared web hosting. So I don't own the box per se, but I'm paying for it. I was thinking of moving everything to Amazon (AWS) to scale, but again, bandwidth costs.
You know what, **** it, I'll expose some of my data. I'll post a link when it's complete.