Can anyone recommend a stats data formatting program?
Can anyone recommend a stats data formatting program?
I have a 4 year degree in Computer Science and a strong interest in statistics, specifically basketball statistics. I’ll get straight to the point. I was wonder if someone could recommend me a statistics program that takes game statistics or play by play data and converts/imports them into a text file with only box score data in it. I could then create a program that uses that text file and basically do anything that I want with it.
I used to have a C++ program that did something similar but there were a lot of problems with it. The part that causes the most problems is where I parse box score data, which is why I'm wondering if someone knows of a program that does that.
I used to have a C++ program that did something similar but there were a lot of problems with it. The part that causes the most problems is where I parse box score data, which is why I'm wondering if someone knows of a program that does that.
Re: Can anyone recommend a stats data formatting program?
I like to use Python for scraping/parsing purposes to create text files that I then feed to Excel or R. I know others use PERL for at least part of that initial process.
Re: Can anyone recommend a stats data formatting program?
Yeah, I'm not as proficient with Python or Pearl to pull that off.
I'm trying to maybe get something done using C++ using play by play data to create my own file with just team box scores (and it would also have the dates of the games)... it would then just be a matter of updating it every time I want the newest info. And then I could make a second program that takes that file and I could basically just make a bunch of functions to sort the data into teams, etc and it would be able to calculate weird shit like how well a team does on 0 days rest or which team is playing the best team defense in the last 30 days. The parsing is the hardest part though, after that its just 'for' loops and all that.
I have play by play data from basketballvalue.com, but does anyone know if there's anything more programmer friendly on another website?
I'm trying to maybe get something done using C++ using play by play data to create my own file with just team box scores (and it would also have the dates of the games)... it would then just be a matter of updating it every time I want the newest info. And then I could make a second program that takes that file and I could basically just make a bunch of functions to sort the data into teams, etc and it would be able to calculate weird shit like how well a team does on 0 days rest or which team is playing the best team defense in the last 30 days. The parsing is the hardest part though, after that its just 'for' loops and all that.
I have play by play data from basketballvalue.com, but does anyone know if there's anything more programmer friendly on another website?
Re: Can anyone recommend a stats data formatting program?
I think EvanZ can help you--he's got some (PERL?) code. I have Excel Macros that import box scores, also, if you prefer that.
Re: Can anyone recommend a stats data formatting program?
I've setup a GitHub account and posted my code (written in Ruby) for parsing the PBP data from bbv.com. It wasn't originally written for anyone but me, so you will have to slog through it on your own. It might be helpful for giving you some ideas, but my guess is you will want to re-write your own C++ code to do whatever it is you want to do with the data.
https://github.com/EvanZ/BBV_PBP_PARSER ... _output.rb
https://github.com/EvanZ/BBV_PBP_PARSER ... _output.rb
Re: Can anyone recommend a stats data formatting program?
Hmmm... if it parses through the data and puts it into a usable format... that's perfect! Thank you.
Also... DSM, I'm curious how you used Excel Macros to import box scores. I used to do something like that but it was a giant mess. I'm wondering if you do something similar or if you thought of a way to do it I wasn't thinking of.
Also... DSM, I'm curious how you used Excel Macros to import box scores. I used to do something like that but it was a giant mess. I'm wondering if you do something similar or if you thought of a way to do it I wasn't thinking of.
Re: Can anyone recommend a stats data formatting program?
Well, it's not very pretty, but it can be done. Basketball Reference works pretty well with Excel. This code reads from a given URL and imports the Four Factor Box Score, and Advanced Box Score for each team.
Obviously, tweaks can be made to put it inside a loop and grab a bunch of box scores and place them sequentially on a page (with a game ID for each boxscore). I've did that a while back, but I don't have the code in front of me.
Obviously, tweaks can be made to put it inside a loop and grab a bunch of box scores and place them sequentially on a page (with a game ID for each boxscore). I've did that a while back, but I don't have the code in front of me.
Code: Select all
Sub Single_Gm_DLoad()
'
' Single_Gm_DLoad Macro
'
'
Dim URLgame As String
Dim QueryBox As QueryTable
Dim QueryT1 As QueryTable
Dim QueryT2 As QueryTable
Dim Team1 As String
Dim Team2 As String
Dim TableTm1 As String
Dim TableTm2 As String
Dim GameDate As Date
Dim Today As Date
Today = DateValue(Now)
GameDate = Sheets("Single Game Analysis").Range("S5")
If Not Today > GameDate Then
MsgBox ("Game has not yet been completed and box score posted")
Exit Sub
End If
URLgame = "URL;" & Sheets("Single Game Analysis").Cells(5, 14)
Sheets("Single Game Analysis").Range("B8:H11").ClearContents
Sheets("Single Game Analysis").Range("B16:O34").ClearContents
Sheets("Single Game Analysis").Range("B39:O57").ClearContents
Set QueryBox = Sheets("Single Game Analysis").QueryTables.Add(Connection:=URLgame, Destination:=Sheets("Single Game Analysis").Range("$B$8"))
With QueryBox
.Name = "Box_Score"
.FieldNames = True
.RowNumbers = False
.FillAdjacentFormulas = False
.PreserveFormatting = True
.RefreshOnFileOpen = False
.BackgroundQuery = False
.RefreshStyle = xlOverwriteCells
.SavePassword = False
.SaveData = True
.AdjustColumnWidth = False
.RefreshPeriod = 0
.WebSelectionType = xlSpecifiedTables
.WebFormatting = xlWebFormattingNone
.WebTables = """four_factors"""
.WebPreFormattedTextToColumns = True
.WebConsecutiveDelimitersAsOne = True
.WebSingleBlockTextImport = False
.WebDisableDateRecognition = True
.WebDisableRedirections = False
.Refresh BackgroundQuery:=False
End With
Sheets("Single Game Analysis").QueryTables(1).Delete
Team1 = Sheets("Single Game Analysis").Cells(10, 2)
Team2 = Sheets("Single Game Analysis").Cells(11, 2)
TableTm1 = Team1 & "_advanced"
TableTm2 = Team2 & "_advanced"
Set QueryT1 = Sheets("Single Game Analysis").QueryTables.Add(Connection:=URLgame, Destination:=Sheets("Single Game Analysis").Range("$B$16"))
With QueryT1
.Name = "Team_1_Stats"
.FieldNames = True
.RowNumbers = False
.FillAdjacentFormulas = False
.PreserveFormatting = True
.RefreshOnFileOpen = False
.BackgroundQuery = False
.RefreshStyle = xlOverwriteCells
.SavePassword = False
.SaveData = True
.AdjustColumnWidth = False
.RefreshPeriod = 0
.WebSelectionType = xlSpecifiedTables
.WebFormatting = xlWebFormattingNone
' .WebTables = """CLE_advanced"""
.WebTables = TableTm1
.WebPreFormattedTextToColumns = True
.WebConsecutiveDelimitersAsOne = True
.WebSingleBlockTextImport = False
.WebDisableDateRecognition = True
.WebDisableRedirections = False
.Refresh BackgroundQuery:=False
End With
Sheets("Single Game Analysis").QueryTables(1).Delete
Set QueryT2 = Sheets("Single Game Analysis").QueryTables.Add(Connection:=URLgame, Destination:=Sheets("Single Game Analysis").Range("$B$39"))
With QueryT2
.Name = "Team_2_Stats"
.FieldNames = True
.RowNumbers = False
.FillAdjacentFormulas = False
.PreserveFormatting = True
.RefreshOnFileOpen = False
.BackgroundQuery = False
.RefreshStyle = xlOverwriteCells
.SavePassword = False
.SaveData = True
.AdjustColumnWidth = False
.RefreshPeriod = 0
.WebSelectionType = xlSpecifiedTables
.WebFormatting = xlWebFormattingNone
' .WebTables = """OKC_advanced"""
.WebTables = TableTm2
.WebPreFormattedTextToColumns = True
.WebConsecutiveDelimitersAsOne = True
.WebSingleBlockTextImport = False
.WebDisableDateRecognition = True
.WebDisableRedirections = False
.Refresh BackgroundQuery:=False
End With
Sheets("Single Game Analysis").QueryTables(1).Delete
End Sub
Re: Can anyone recommend a stats data formatting program?
I thought the issue with Basketball Reference was that they don't have like up to the day updates?DSMok1 wrote:Well, it's not very pretty, but it can be done. Basketball Reference works pretty well with Excel. This code reads from a given URL and imports the Four Factor Box Score, and Advanced Box Score for each team.
Obviously, tweaks can be made to put it inside a loop and grab a bunch of box scores and place them sequentially on a page (with a game ID for each boxscore). I've did that a while back, but I don't have the code in front of me.
Code: Select all
Sub Single_Gm_DLoad() ' ' Single_Gm_DLoad Macro ' ' Dim URLgame As String Dim QueryBox As QueryTable Dim QueryT1 As QueryTable Dim QueryT2 As QueryTable Dim Team1 As String Dim Team2 As String Dim TableTm1 As String Dim TableTm2 As String Dim GameDate As Date Dim Today As Date Today = DateValue(Now) GameDate = Sheets("Single Game Analysis").Range("S5") If Not Today > GameDate Then MsgBox ("Game has not yet been completed and box score posted") Exit Sub End If URLgame = "URL;" & Sheets("Single Game Analysis").Cells(5, 14) Sheets("Single Game Analysis").Range("B8:H11").ClearContents Sheets("Single Game Analysis").Range("B16:O34").ClearContents Sheets("Single Game Analysis").Range("B39:O57").ClearContents Set QueryBox = Sheets("Single Game Analysis").QueryTables.Add(Connection:=URLgame, Destination:=Sheets("Single Game Analysis").Range("$B$8")) With QueryBox .Name = "Box_Score" .FieldNames = True .RowNumbers = False .FillAdjacentFormulas = False .PreserveFormatting = True .RefreshOnFileOpen = False .BackgroundQuery = False .RefreshStyle = xlOverwriteCells .SavePassword = False .SaveData = True .AdjustColumnWidth = False .RefreshPeriod = 0 .WebSelectionType = xlSpecifiedTables .WebFormatting = xlWebFormattingNone .WebTables = """four_factors""" .WebPreFormattedTextToColumns = True .WebConsecutiveDelimitersAsOne = True .WebSingleBlockTextImport = False .WebDisableDateRecognition = True .WebDisableRedirections = False .Refresh BackgroundQuery:=False End With Sheets("Single Game Analysis").QueryTables(1).Delete Team1 = Sheets("Single Game Analysis").Cells(10, 2) Team2 = Sheets("Single Game Analysis").Cells(11, 2) TableTm1 = Team1 & "_advanced" TableTm2 = Team2 & "_advanced" Set QueryT1 = Sheets("Single Game Analysis").QueryTables.Add(Connection:=URLgame, Destination:=Sheets("Single Game Analysis").Range("$B$16")) With QueryT1 .Name = "Team_1_Stats" .FieldNames = True .RowNumbers = False .FillAdjacentFormulas = False .PreserveFormatting = True .RefreshOnFileOpen = False .BackgroundQuery = False .RefreshStyle = xlOverwriteCells .SavePassword = False .SaveData = True .AdjustColumnWidth = False .RefreshPeriod = 0 .WebSelectionType = xlSpecifiedTables .WebFormatting = xlWebFormattingNone ' .WebTables = """CLE_advanced""" .WebTables = TableTm1 .WebPreFormattedTextToColumns = True .WebConsecutiveDelimitersAsOne = True .WebSingleBlockTextImport = False .WebDisableDateRecognition = True .WebDisableRedirections = False .Refresh BackgroundQuery:=False End With Sheets("Single Game Analysis").QueryTables(1).Delete Set QueryT2 = Sheets("Single Game Analysis").QueryTables.Add(Connection:=URLgame, Destination:=Sheets("Single Game Analysis").Range("$B$39")) With QueryT2 .Name = "Team_2_Stats" .FieldNames = True .RowNumbers = False .FillAdjacentFormulas = False .PreserveFormatting = True .RefreshOnFileOpen = False .BackgroundQuery = False .RefreshStyle = xlOverwriteCells .SavePassword = False .SaveData = True .AdjustColumnWidth = False .RefreshPeriod = 0 .WebSelectionType = xlSpecifiedTables .WebFormatting = xlWebFormattingNone ' .WebTables = """OKC_advanced""" .WebTables = TableTm2 .WebPreFormattedTextToColumns = True .WebConsecutiveDelimitersAsOne = True .WebSingleBlockTextImport = False .WebDisableDateRecognition = True .WebDisableRedirections = False .Refresh BackgroundQuery:=False End With Sheets("Single Game Analysis").QueryTables(1).Delete End Sub
Re: Can anyone recommend a stats data formatting program?
It's 6:45 AM in the East, and b-r.com has last night's games updated.
Re: Can anyone recommend a stats data formatting program?
Evan, how do you handle stuff like two "Johnsons"s being on the court at the same time, and bbv just calling the player in question "Johnson"?
I got PBP data in bbv format dating back to 2002 if someone wants them
I got PBP data in bbv format dating back to 2002 if someone wants them
Re: Can anyone recommend a stats data formatting program?
I think this would only be a problem if there were two Johnsons with the same first initial playing for the same team on the court at the same time.J.E. wrote:Evan, how do you handle stuff like two "Johnsons"s being on the court at the same time, and bbv just calling the player in question "Johnson"?
I got PBP data in bbv format dating back to 2002 if someone wants them
Re: Can anyone recommend a stats data formatting program?
bbv definitely doesn't always list initials when it would have been necessary..EvanZ wrote:I think this would only be a problem if there were two Johnsons with the same first initial playing for the same team on the court at the same time.J.E. wrote:Evan, how do you handle stuff like two "Johnsons"s being on the court at the same time, and bbv just calling the player in question "Johnson"?
I got PBP data in bbv format dating back to 2002 if someone wants them
Re: Can anyone recommend a stats data formatting program?
The Nets have 3 or possibly 4 Williamses, including Shelden and Shawne.
Re: Can anyone recommend a stats data formatting program?
As far as I can tell, Shawne is in the pbp data as "Sha. Williams", and Shelden is in there as "She. Williams". This season, anyway.