Page 2 of 2

Re: Garbage Time

Posted: Thu Dec 31, 2020 7:46 pm
by DSMok1
vzografos wrote: Thu Dec 31, 2020 5:31 pm My question was how can you evaluate your prediction at any time during the game based only on the game outcome?
You can't.

Re: Garbage Time

Posted: Thu Dec 31, 2020 11:45 pm
by vzografos
DSMok1 wrote: Thu Dec 31, 2020 7:46 pm You can't.
Yes....and No ;)

So I was thinking that you cannot evaluate the accuracy (since ground truth estimates do not exist for every time point t but only for the end result), but maybe you can at least evaluate how better your probabilities are calibrated relative to another prediction.

So imagine a game that the Away team won and we are predicting the chance for the Away team to win at different times during the game.
At given time t your model predicts, say 60% for team Away, and a competing model predicts 57%. Knowing that team Away won, we can calculate the Brier score of our prediction and that of the competing model at time t. We can do so again at times t+1, t+2, etc....

At the end of the game the Brier score should approach 0 for both methods (lower Brier is better) and assuming no last minute upsets we should get a sequence of Brier scores, one for each method at the different time points.

If we average the Brier scores over the time then we will have an average Brier score for the two methods. The one with the lowest score should be the best.

Of course last minute upsets where there is a huge volatility in the win probability might make the whole evaluation idea problematic.

But I cannot think of a better way for evaluating live time prediction methods.


You can definitely train them from past data as rainmantrail said but evaluating them is I think much trickier

Re: Garbage Time

Posted: Fri Jan 01, 2021 3:12 am
by rainmantrail
vzografos wrote: Thu Dec 31, 2020 11:45 pm But I cannot think of a better way for evaluating live time prediction methods.

You can definitely train them from past data as rainmantrail said but evaluating them is I think much trickier
I would be more interested in evaluating the performance of model A vs model B. This is very easy to evaluate. Just run an ANOVA between them. There are other ways to evaluate performance as well, but it's a fairly straight-forward process. It is not an ill-formed problem to solve either. We can predict at time t=37 what the outcome of the game will be at time t=48, and the outcome is well-defined (win or loss). Competing models that predict these outcomes will have varying degrees of performance. A model that is more performant than another (yields more accurate predictions) will have more area underneath the ROC curve than the competing model. It is in this sense that I would state with confidence that one model's predictions are "better" than the others. An outcome for time t=37 is not necessary. We are only interested in the outcome at time t=48.

Re: Garbage Time

Posted: Fri Jan 01, 2021 3:44 am
by vzografos
rainmantrail wrote: Fri Jan 01, 2021 3:12 am
vzografos wrote: Thu Dec 31, 2020 11:45 pm But I cannot think of a better way for evaluating live time prediction methods.

You can definitely train them from past data as rainmantrail said but evaluating them is I think much trickier
I would be more interested in evaluating the performance of model A vs model B. This is very easy to evaluate. Just run an ANOVA between them. There are other ways to evaluate performance as well, but it's a fairly straight-forward process. It is not an ill-formed problem to solve either. We can predict at time t=37 what the outcome of the game will be at time t=48, and the outcome is well-defined (win or loss). Competing models that predict these outcomes will have varying degrees of performance. A model that is more performant than another (yields more accurate predictions) will have more area underneath the ROC curve than the competing model. It is in this sense that I would state with confidence that one model's predictions are "better" than the others. An outcome for time t=37 is not necessary. We are only interested in the outcome at time t=48.

I really disagree with this way of thinking (i.e. evaluate temporal performance only w.r.t. the final outcome). You can always construct counterexamples of models that, acording to your evaluation logic, can have a high prediction accuracy at t=48. Imagine a trivial model that predicts 0.5 from t=0 until t=46 and at t=47 it predicts the outcome purely on the point differential on the last second.

Or take any example of a a game with an outcome that flips on the last 2 seconds (high volatility of the prediction probabilities in those last few seconds). How is the prediction of your model up until that point even relevant?

Re: Garbage Time

Posted: Fri Jan 01, 2021 4:27 am
by rainmantrail
vzografos wrote: Fri Jan 01, 2021 3:44 am
I really disagree with this way of thinking (i.e. evaluate temporal performance only w.r.t. the final outcome). You can always construct counterexamples of models that, acording to your evaluation logic, can have a high prediction accuracy at t=48. Imagine a trivial model that predicts 0.5 from t=0 until t=46 and at t=47 it predicts the outcome purely on the point differential on the last second.

Or take any example of a a game with an outcome that flips on the last 2 seconds (high volatility of the prediction probabilities in those last few seconds). How is the prediction of your model up until that point even relevant?
It seems to me as though you are interested in solving a different problem than the one I'm stating. The question that I'm aiming to answer with my models is "what is team A's probability of winning the game, given the current state?". By definition, I am interested in the outcome of the game.

What question are you trying to answer, or what problem are you aiming to solve? It seems like we're talking past one another on this topic.

Re: Garbage Time

Posted: Fri Jan 01, 2021 4:35 am
by rainmantrail
Perhaps you're just pointing out that there is a better way to approach building a model that predicts game outcomes than the approach I've taken. I would certainly agree with that. However, my use case is simply to help me be able to flag which possesions occur during "garbage time", so directional accuracy is all that's needed. If I were trying to beat the live in-game Vegas lines, that would be a different story. I'd have to build a much more robust model than what I'm building if that were my goal.

Re: Garbage Time

Posted: Fri Jan 01, 2021 4:40 am
by vzografos
rainmantrail wrote: Fri Jan 01, 2021 4:35 amdirectional accuracy
Define that please.



No I am also interested in a part of that problem. Maybe different aspect of it. But I have second thoughts about the determination of accuracy at a specific point in time, given the absense of ground truth data (i.e. imagine if you like a ground truth temporal curve which we dont have to compare with at every time t).

In any case. To avoid this thread draggin on I ll stop here but I will think about this problem and maybe come back to it in the future.

Re: Garbage Time

Posted: Fri Jan 01, 2021 5:35 am
by rainmantrail
rainmantrail wrote: Fri Jan 01, 2021 4:35 amdirectional accuracy
vzografos wrote: Fri Jan 01, 2021 4:40 am Define that please.
I just mean that I want my model's output to be somewhat close to what has happened historically for other games with the same number of minutes remaining and the same point spreads. So, if the model sees that tonight's game between HOU and SAC where HOU is ahead by 3 points with 1:30 remaining, and it predicts that HOU is 86% to win, then I'd want to know that if I were to look up all previous games in my database where a team had a 3 point lead with 1:30 remaining, that approximately 86% of them (+/- some small margin of error) indeed won the game. If it turned out that only 55% ended up winning, I'd consider my model to be a failure. If it turned out that 83.7% ended up winning, I'd say it's directionally accurate, and good enough for my use case. If I were betting on games with it though, then I'd probably want something better than that 83.7% performance. But that isn't the goal of this model.
vzografos wrote: Fri Jan 01, 2021 4:40 am
No I am also interested in a part of that problem. Maybe different aspect of it. But I have second thoughts about the determination of accuracy at a specific point in time, given the absense of ground truth data (i.e. imagine if you like a ground truth temporal curve which we dont have to compare with at every time t).

In any case. To avoid this thread draggin on I ll stop here but I will think about this problem and maybe come back to it in the future.
OK, sounds good. We can discuss this topic via PM so that we don't force everyone else to read through the congestion.

Re: Garbage Time

Posted: Fri Jan 01, 2021 5:44 am
by vzografos
rainmantrail wrote: Fri Jan 01, 2021 5:35 am
I just mean that I want my model's output to be somewhat close to what has happened historically for other games with the same number of minutes remaining and the same point spreads. So, if the model sees that tonight's game between HOU and SAC where HOU is ahead by 3 points with 1:30 remaining, and it predicts that HOU is 86% to win, then I'd want to know that if I were to look up all previous games in my database where a team had a 3 point lead with 1:30 remaining, that approximately 86% of them (+/- some small margin of error) indeed won the game. If it turned out that only 55% ended up winning, I'd consider my model to be a failure. If it turned out that 83.7% ended up winning, I'd say it's directionally accurate, and good enough for my use case. If I were betting on games with it though, then I'd probably want something better than that 83.7% performance. But that isn't the goal of this model.
ok you meant calibrated. Understood