Parsing play-by-play data

Home for all your discussion of basketball statistical analysis.
Post Reply
ethanluo
Posts: 9
Joined: Sat May 16, 2015 8:14 am

Parsing play-by-play data

Post by ethanluo »

Hi I have been working on basketball analytics for quite a while. Some of the data I need needs to be parsed directly from the playbyplay.

A few people have written code to extract the play by play from statsnba.com or espn, but I noticed that they do not ususally have tools to parse the play by play into usable csv for statistics. So I have been implementing my own parser to do that job and I hope to share the codebase through open source with this community to facilitate the process.

I noticed that different sources have different format for the pbp, so what people usually do is to write regular expressions for different sites, which I believe can be hectic. Furthermore, there maybe some outliers. I hope to implement a universal one that can be quickly implemented for different websites. To do that I did some very simple natural language processing and tokenization of the text and after that I will do classification via machine learning.

It works okay at this moment but I definitely need some help. In order to assess the reliability of this parser I need prepared data to complete the parser. I noticed that NBAStuffer has the desired data that I want to learn the parser. But in order for me to complete the parser for websites such as ESPN, I will probabily need someone to manually prepare the data in format similar to that of NBAStuffer. I am not sure whether someone already has it.

Anyone has any idea I I shall proceed from here?
browning
Posts: 19
Joined: Sat Jun 02, 2012 4:09 pm

Re: Parsing play-by-play data

Post by browning »

Hey, sounds like a good cause, I'm curious how effective NLP will be against regexes though because each site has a pretty standard format which makes regexes work well.

Anyways, I'm happy to help you get the data, I have a parser for both espn and stats.nba.com play-by-play sites

You can email me at bwbrowning@gmail.com and we can talk about the format you would like the data in.

Cheers
Post Reply