For the hobbyists out there...
This season I'll be posting structured play-by-play data from every regular and post season NBA game at:
http://www.cs.umd.edu/hcil/eventflow/NBA/nbaData.shtml
The datasets are scraped from ESPN.com. I started generating them in order to test the software that my research group is working on, but I figured that other people might be able to make use of them as well. Most notably, I'll be posting an xml breakdown of each possession, including the events that occurred and the players on the court. More details can be found on the site.
Feel free to use the datasets as you please, and keep me posted if you do anything "cool" with them (drop me a line at madeyjay@umd.edu). Certainly my advisor would be thrilled to know that the weekend I spent building the parser was a worthwhile venture.
Cheers,
Megan
NBA Datasets 2013-14 Season
Re: NBA Datasets 2013-14 Season
Awesome, awesome gift to the community! Can I ask how you went about determining who was on the court during each possession? On that note, what sort of error rate do you think these have on that?
Re: NBA Datasets 2013-14 Season
The line-ups were no peach. It's essentially done in two phases: the first phase constructs an initial 5-player line-up based on the player associated with each event (if there is one). The second phase uses the substitution events to swap players in and out of that line-up. The whole process resets at the start of each period.
Actually, I was surprised at how accurate the line-up construction is - or at least, appears to be. The only indication I get that something has gone wrong is when a player subs in, but the player who subs out isn't in the line-up. I've run the line-up constructor now on about 100 games though, and only three of them have had this problem. In most cases, it's due to an anomalous series of substitutions, early in the period (when the initial line-up is being constructed), involving a player with a weird name (thanks for nothing Luc Richard Mbah a Moute). So not terribly often.
Actually, I was surprised at how accurate the line-up construction is - or at least, appears to be. The only indication I get that something has gone wrong is when a player subs in, but the player who subs out isn't in the line-up. I've run the line-up constructor now on about 100 games though, and only three of them have had this problem. In most cases, it's due to an anomalous series of substitutions, early in the period (when the initial line-up is being constructed), involving a player with a weird name (thanks for nothing Luc Richard Mbah a Moute). So not terribly often.
Re: NBA Datasets 2013-14 Season
Yeah, that's the hard bit in parsing play by play. I have come across a whole host of data-gimping in the thirteen years of data I have pushed through my code, such that it now has a special function which *just* sorts out weird lineup situations, using a combination of multiple passes and cross-referencing. Amusingly enough I also spent a whole weekend writing my parser.
Re: NBA Datasets 2013-14 Season
Having written parsers for all the major sites, I would recommend using NBC's play-by-play. It's the cleanest data source I found.
-
- Posts: 35
- Joined: Mon Apr 08, 2013 6:45 am
Re: NBA Datasets 2013-14 Season
I'm getting an error when I click on the XML link. Anybody else?
Thanks for putting this together and making it public!
Thanks for putting this together and making it public!
Re: NBA Datasets 2013-14 Season
Whoops. Corrected.
Re: NBA Datasets 2013-14 Season
Sounds fun!...am I missing something though? I don't see any data.
Re: NBA Datasets 2013-14 Season
Just fixed it. There was a rogue comma in this morning's upload.