Can anyone recommend a stats data formatting program?

Home for all your discussion of basketball statistical analysis.
Post Reply
agentkirb
Posts: 21
Joined: Thu Dec 29, 2011 8:56 pm

Can anyone recommend a stats data formatting program?

Post by agentkirb »

I have a 4 year degree in Computer Science and a strong interest in statistics, specifically basketball statistics. I’ll get straight to the point. I was wonder if someone could recommend me a statistics program that takes game statistics or play by play data and converts/imports them into a text file with only box score data in it. I could then create a program that uses that text file and basically do anything that I want with it.

I used to have a C++ program that did something similar but there were a lot of problems with it. The part that causes the most problems is where I parse box score data, which is why I'm wondering if someone knows of a program that does that.
xkonk
Posts: 307
Joined: Fri Apr 15, 2011 12:37 am

Re: Can anyone recommend a stats data formatting program?

Post by xkonk »

I like to use Python for scraping/parsing purposes to create text files that I then feed to Excel or R. I know others use PERL for at least part of that initial process.
agentkirb
Posts: 21
Joined: Thu Dec 29, 2011 8:56 pm

Re: Can anyone recommend a stats data formatting program?

Post by agentkirb »

Yeah, I'm not as proficient with Python or Pearl to pull that off.

I'm trying to maybe get something done using C++ using play by play data to create my own file with just team box scores (and it would also have the dates of the games)... it would then just be a matter of updating it every time I want the newest info. And then I could make a second program that takes that file and I could basically just make a bunch of functions to sort the data into teams, etc and it would be able to calculate weird shit like how well a team does on 0 days rest or which team is playing the best team defense in the last 30 days. The parsing is the hardest part though, after that its just 'for' loops and all that.

I have play by play data from basketballvalue.com, but does anyone know if there's anything more programmer friendly on another website?
DSMok1
Posts: 1119
Joined: Thu Apr 14, 2011 11:18 pm
Location: Maine
Contact:

Re: Can anyone recommend a stats data formatting program?

Post by DSMok1 »

I think EvanZ can help you--he's got some (PERL?) code. I have Excel Macros that import box scores, also, if you prefer that.
Developer of Box Plus/Minus
APBRmetrics Forum Administrator
Twitter.com/DSMok1
EvanZ
Posts: 912
Joined: Thu Apr 14, 2011 10:41 pm
Location: The City
Contact:

Re: Can anyone recommend a stats data formatting program?

Post by EvanZ »

I've setup a GitHub account and posted my code (written in Ruby) for parsing the PBP data from bbv.com. It wasn't originally written for anyone but me, so you will have to slog through it on your own. It might be helpful for giving you some ideas, but my guess is you will want to re-write your own C++ code to do whatever it is you want to do with the data.

https://github.com/EvanZ/BBV_PBP_PARSER ... _output.rb
agentkirb
Posts: 21
Joined: Thu Dec 29, 2011 8:56 pm

Re: Can anyone recommend a stats data formatting program?

Post by agentkirb »

Hmmm... if it parses through the data and puts it into a usable format... that's perfect! Thank you.

Also... DSM, I'm curious how you used Excel Macros to import box scores. I used to do something like that but it was a giant mess. I'm wondering if you do something similar or if you thought of a way to do it I wasn't thinking of.
DSMok1
Posts: 1119
Joined: Thu Apr 14, 2011 11:18 pm
Location: Maine
Contact:

Re: Can anyone recommend a stats data formatting program?

Post by DSMok1 »

Well, it's not very pretty, but it can be done. Basketball Reference works pretty well with Excel. This code reads from a given URL and imports the Four Factor Box Score, and Advanced Box Score for each team.

Obviously, tweaks can be made to put it inside a loop and grab a bunch of box scores and place them sequentially on a page (with a game ID for each boxscore). I've did that a while back, but I don't have the code in front of me.

Code: Select all

Sub Single_Gm_DLoad()
'
' Single_Gm_DLoad Macro
'

'
    Dim URLgame As String
    Dim QueryBox As QueryTable
    Dim QueryT1 As QueryTable
    Dim QueryT2 As QueryTable
    Dim Team1 As String
    Dim Team2 As String
    Dim TableTm1 As String
    Dim TableTm2 As String
    Dim GameDate As Date
    Dim Today As Date
    
    Today = DateValue(Now)
    GameDate = Sheets("Single Game Analysis").Range("S5")
    
    If Not Today > GameDate Then
    MsgBox ("Game has not yet been completed and box score posted")
    Exit Sub
    End If
    
    URLgame = "URL;" & Sheets("Single Game Analysis").Cells(5, 14)

    Sheets("Single Game Analysis").Range("B8:H11").ClearContents
    Sheets("Single Game Analysis").Range("B16:O34").ClearContents
    Sheets("Single Game Analysis").Range("B39:O57").ClearContents
    
    Set QueryBox = Sheets("Single Game Analysis").QueryTables.Add(Connection:=URLgame, Destination:=Sheets("Single Game Analysis").Range("$B$8"))
        
    With QueryBox
        .Name = "Box_Score"
        .FieldNames = True
        .RowNumbers = False
        .FillAdjacentFormulas = False
        .PreserveFormatting = True
        .RefreshOnFileOpen = False
        .BackgroundQuery = False
        .RefreshStyle = xlOverwriteCells
        .SavePassword = False
        .SaveData = True
        .AdjustColumnWidth = False
        .RefreshPeriod = 0
        .WebSelectionType = xlSpecifiedTables
        .WebFormatting = xlWebFormattingNone
        .WebTables = """four_factors"""
        .WebPreFormattedTextToColumns = True
        .WebConsecutiveDelimitersAsOne = True
        .WebSingleBlockTextImport = False
        .WebDisableDateRecognition = True
        .WebDisableRedirections = False
        .Refresh BackgroundQuery:=False
    End With
    
    Sheets("Single Game Analysis").QueryTables(1).Delete
    
    Team1 = Sheets("Single Game Analysis").Cells(10, 2)
    Team2 = Sheets("Single Game Analysis").Cells(11, 2)
    
    TableTm1 = Team1 & "_advanced"
    TableTm2 = Team2 & "_advanced"
    
    Set QueryT1 = Sheets("Single Game Analysis").QueryTables.Add(Connection:=URLgame, Destination:=Sheets("Single Game Analysis").Range("$B$16"))
    
    With QueryT1
        .Name = "Team_1_Stats"
        .FieldNames = True
        .RowNumbers = False
        .FillAdjacentFormulas = False
        .PreserveFormatting = True
        .RefreshOnFileOpen = False
        .BackgroundQuery = False
        .RefreshStyle = xlOverwriteCells
        .SavePassword = False
        .SaveData = True
        .AdjustColumnWidth = False
        .RefreshPeriod = 0
        .WebSelectionType = xlSpecifiedTables
        .WebFormatting = xlWebFormattingNone
'        .WebTables = """CLE_advanced"""
        .WebTables = TableTm1
        .WebPreFormattedTextToColumns = True
        .WebConsecutiveDelimitersAsOne = True
        .WebSingleBlockTextImport = False
        .WebDisableDateRecognition = True
        .WebDisableRedirections = False
        .Refresh BackgroundQuery:=False
    End With
    
    Sheets("Single Game Analysis").QueryTables(1).Delete

    Set QueryT2 = Sheets("Single Game Analysis").QueryTables.Add(Connection:=URLgame, Destination:=Sheets("Single Game Analysis").Range("$B$39"))

    With QueryT2
        .Name = "Team_2_Stats"
        .FieldNames = True
        .RowNumbers = False
        .FillAdjacentFormulas = False
        .PreserveFormatting = True
        .RefreshOnFileOpen = False
        .BackgroundQuery = False
        .RefreshStyle = xlOverwriteCells
        .SavePassword = False
        .SaveData = True
        .AdjustColumnWidth = False
        .RefreshPeriod = 0
        .WebSelectionType = xlSpecifiedTables
        .WebFormatting = xlWebFormattingNone
'        .WebTables = """OKC_advanced"""
        .WebTables = TableTm2
        .WebPreFormattedTextToColumns = True
        .WebConsecutiveDelimitersAsOne = True
        .WebSingleBlockTextImport = False
        .WebDisableDateRecognition = True
        .WebDisableRedirections = False
        .Refresh BackgroundQuery:=False
    End With
    
    Sheets("Single Game Analysis").QueryTables(1).Delete

End Sub
Developer of Box Plus/Minus
APBRmetrics Forum Administrator
Twitter.com/DSMok1
agentkirb
Posts: 21
Joined: Thu Dec 29, 2011 8:56 pm

Re: Can anyone recommend a stats data formatting program?

Post by agentkirb »

DSMok1 wrote:Well, it's not very pretty, but it can be done. Basketball Reference works pretty well with Excel. This code reads from a given URL and imports the Four Factor Box Score, and Advanced Box Score for each team.

Obviously, tweaks can be made to put it inside a loop and grab a bunch of box scores and place them sequentially on a page (with a game ID for each boxscore). I've did that a while back, but I don't have the code in front of me.

Code: Select all

Sub Single_Gm_DLoad()
'
' Single_Gm_DLoad Macro
'

'
    Dim URLgame As String
    Dim QueryBox As QueryTable
    Dim QueryT1 As QueryTable
    Dim QueryT2 As QueryTable
    Dim Team1 As String
    Dim Team2 As String
    Dim TableTm1 As String
    Dim TableTm2 As String
    Dim GameDate As Date
    Dim Today As Date
    
    Today = DateValue(Now)
    GameDate = Sheets("Single Game Analysis").Range("S5")
    
    If Not Today > GameDate Then
    MsgBox ("Game has not yet been completed and box score posted")
    Exit Sub
    End If
    
    URLgame = "URL;" & Sheets("Single Game Analysis").Cells(5, 14)

    Sheets("Single Game Analysis").Range("B8:H11").ClearContents
    Sheets("Single Game Analysis").Range("B16:O34").ClearContents
    Sheets("Single Game Analysis").Range("B39:O57").ClearContents
    
    Set QueryBox = Sheets("Single Game Analysis").QueryTables.Add(Connection:=URLgame, Destination:=Sheets("Single Game Analysis").Range("$B$8"))
        
    With QueryBox
        .Name = "Box_Score"
        .FieldNames = True
        .RowNumbers = False
        .FillAdjacentFormulas = False
        .PreserveFormatting = True
        .RefreshOnFileOpen = False
        .BackgroundQuery = False
        .RefreshStyle = xlOverwriteCells
        .SavePassword = False
        .SaveData = True
        .AdjustColumnWidth = False
        .RefreshPeriod = 0
        .WebSelectionType = xlSpecifiedTables
        .WebFormatting = xlWebFormattingNone
        .WebTables = """four_factors"""
        .WebPreFormattedTextToColumns = True
        .WebConsecutiveDelimitersAsOne = True
        .WebSingleBlockTextImport = False
        .WebDisableDateRecognition = True
        .WebDisableRedirections = False
        .Refresh BackgroundQuery:=False
    End With
    
    Sheets("Single Game Analysis").QueryTables(1).Delete
    
    Team1 = Sheets("Single Game Analysis").Cells(10, 2)
    Team2 = Sheets("Single Game Analysis").Cells(11, 2)
    
    TableTm1 = Team1 & "_advanced"
    TableTm2 = Team2 & "_advanced"
    
    Set QueryT1 = Sheets("Single Game Analysis").QueryTables.Add(Connection:=URLgame, Destination:=Sheets("Single Game Analysis").Range("$B$16"))
    
    With QueryT1
        .Name = "Team_1_Stats"
        .FieldNames = True
        .RowNumbers = False
        .FillAdjacentFormulas = False
        .PreserveFormatting = True
        .RefreshOnFileOpen = False
        .BackgroundQuery = False
        .RefreshStyle = xlOverwriteCells
        .SavePassword = False
        .SaveData = True
        .AdjustColumnWidth = False
        .RefreshPeriod = 0
        .WebSelectionType = xlSpecifiedTables
        .WebFormatting = xlWebFormattingNone
'        .WebTables = """CLE_advanced"""
        .WebTables = TableTm1
        .WebPreFormattedTextToColumns = True
        .WebConsecutiveDelimitersAsOne = True
        .WebSingleBlockTextImport = False
        .WebDisableDateRecognition = True
        .WebDisableRedirections = False
        .Refresh BackgroundQuery:=False
    End With
    
    Sheets("Single Game Analysis").QueryTables(1).Delete

    Set QueryT2 = Sheets("Single Game Analysis").QueryTables.Add(Connection:=URLgame, Destination:=Sheets("Single Game Analysis").Range("$B$39"))

    With QueryT2
        .Name = "Team_2_Stats"
        .FieldNames = True
        .RowNumbers = False
        .FillAdjacentFormulas = False
        .PreserveFormatting = True
        .RefreshOnFileOpen = False
        .BackgroundQuery = False
        .RefreshStyle = xlOverwriteCells
        .SavePassword = False
        .SaveData = True
        .AdjustColumnWidth = False
        .RefreshPeriod = 0
        .WebSelectionType = xlSpecifiedTables
        .WebFormatting = xlWebFormattingNone
'        .WebTables = """OKC_advanced"""
        .WebTables = TableTm2
        .WebPreFormattedTextToColumns = True
        .WebConsecutiveDelimitersAsOne = True
        .WebSingleBlockTextImport = False
        .WebDisableDateRecognition = True
        .WebDisableRedirections = False
        .Refresh BackgroundQuery:=False
    End With
    
    Sheets("Single Game Analysis").QueryTables(1).Delete

End Sub
I thought the issue with Basketball Reference was that they don't have like up to the day updates?
Mike G
Posts: 6154
Joined: Fri Apr 15, 2011 12:02 am
Location: Asheville, NC

Re: Can anyone recommend a stats data formatting program?

Post by Mike G »

It's 6:45 AM in the East, and b-r.com has last night's games updated.
J.E.
Posts: 852
Joined: Fri Apr 15, 2011 8:28 am

Re: Can anyone recommend a stats data formatting program?

Post by J.E. »

Evan, how do you handle stuff like two "Johnsons"s being on the court at the same time, and bbv just calling the player in question "Johnson"?
I got PBP data in bbv format dating back to 2002 if someone wants them
EvanZ
Posts: 912
Joined: Thu Apr 14, 2011 10:41 pm
Location: The City
Contact:

Re: Can anyone recommend a stats data formatting program?

Post by EvanZ »

J.E. wrote:Evan, how do you handle stuff like two "Johnsons"s being on the court at the same time, and bbv just calling the player in question "Johnson"?
I got PBP data in bbv format dating back to 2002 if someone wants them
I think this would only be a problem if there were two Johnsons with the same first initial playing for the same team on the court at the same time.
J.E.
Posts: 852
Joined: Fri Apr 15, 2011 8:28 am

Re: Can anyone recommend a stats data formatting program?

Post by J.E. »

EvanZ wrote:
J.E. wrote:Evan, how do you handle stuff like two "Johnsons"s being on the court at the same time, and bbv just calling the player in question "Johnson"?
I got PBP data in bbv format dating back to 2002 if someone wants them
I think this would only be a problem if there were two Johnsons with the same first initial playing for the same team on the court at the same time.
bbv definitely doesn't always list initials when it would have been necessary..
Mike G
Posts: 6154
Joined: Fri Apr 15, 2011 12:02 am
Location: Asheville, NC

Re: Can anyone recommend a stats data formatting program?

Post by Mike G »

The Nets have 3 or possibly 4 Williamses, including Shelden and Shawne.
EvanZ
Posts: 912
Joined: Thu Apr 14, 2011 10:41 pm
Location: The City
Contact:

Re: Can anyone recommend a stats data formatting program?

Post by EvanZ »

As far as I can tell, Shawne is in the pbp data as "Sha. Williams", and Shelden is in there as "She. Williams". This season, anyway.
Post Reply