Womens Play By Play Data?

Home for all your discussion of basketball statistical analysis.
Post Reply
RowRowFan
Posts: 11
Joined: Wed Jan 03, 2024 5:06 am

Womens Play By Play Data?

Post by RowRowFan »

Hello, I want to try to parse Play by Play data for College womens basketball, similar to https://www.bigdataball.com/datasets/wnba/historical/

As of right now, the only code I found on github had everything except for the players on the court, which is a bit important as I want to run RAPM using it. I was thinking of doing it manually but I was just wondering if anyone new any resource or any place it was already availble, either coding wise or directly like in bigdatabball. Thanks!
Crow
Posts: 10536
Joined: Thu Apr 14, 2011 11:10 pm

Re: Womens Play By Play Data?

Post by Crow »

Does what bigdataball have what you want?

Have you checked with herhoopstats.com?

Can you scrape from ncaaw official site or possibly request a download? Are you doing basic research to share with public or for a commercial purpose?
RowRowFan
Posts: 11
Joined: Wed Jan 03, 2024 5:06 am

Re: Womens Play By Play Data?

Post by RowRowFan »

Crow wrote: Thu Feb 08, 2024 12:36 am Does what bigdataball have what you want?

Have you checked with herhoopstats.com?

Can you scrape from ncaaw official site or possibly request a download? Are you doing basic research to share with public or for a commercial purpose?
BigDataBall doesn't have NCAA Womens data, but it essentially has the format im looking for in its WNBA and NBA data since I wanted to get net rtg and play around with lineup data and stuff like that form it just for fun, and herhoopsstats didnt have much either.

I was thinking of figuring out how to scrape the data but wanted to see if there was a simply option to purchase it commercially first, or a tutorial in terms of scraping it in the same format as BigDataBall. I used the wehoop package originally but it lacked the players on the floor
rjb2
Posts: 14
Joined: Fri Jan 26, 2024 5:47 am

Re: Womens Play By Play Data?

Post by rjb2 »

This will be sort of a solution. There is an R package that scrapes pbp from stats.ncaa called bigballR. There is a function called get_play_by_play which scrapes the pbp and a function called get_possessions which parses it and returns information that includes the players on the court. The problem is that it's built for men's basketball, which means that it may be difficult to efficiently scrape the women's games. If you are able to scrape the pbp ID's for women's games then you should be good.

https://github.com/jflancer/bigballR
RowRowFan
Posts: 11
Joined: Wed Jan 03, 2024 5:06 am

Re: Womens Play By Play Data?

Post by RowRowFan »

rjb2 wrote: Thu Feb 08, 2024 2:44 am This will be sort of a solution. There is an R package that scrapes pbp from stats.ncaa called bigballR. There is a function called get_play_by_play which scrapes the pbp and a function called get_possessions which parses it and returns information that includes the players on the court. The problem is that it's built for men's basketball, which means that it may be difficult to efficiently scrape the women's games. If you are able to scrape the pbp ID's for women's games then you should be good.

https://github.com/jflancer/bigballR
Thank you! The season id is what made it mens seasons so I can just parse that and go by every date and it should work. Youre a life saver!
RowRowFan
Posts: 11
Joined: Wed Jan 03, 2024 5:06 am

Re: Womens Play By Play Data?

Post by RowRowFan »

So its working but its incredibly slow (going to take around 2-3 days to get the data I need), or is that just how long it would be expected to take? it gets box score id, and then play by play id since it cant get play by play id directly.
rjb2
Posts: 14
Joined: Fri Jan 26, 2024 5:47 am

Re: Womens Play By Play Data?

Post by rjb2 »

RowRowFan wrote: Fri Feb 09, 2024 1:01 am So its working but its incredibly slow (going to take around 2-3 days to get the data I need), or is that just how long it would be expected to take? it gets box score id, and then play by play id since it cant get play by play id directly.
Yeah scraping a lot of games takes awhile. First it compiles all the ID's then scrapes each game individually.
Post Reply