Stabilize all-in-one metrics
Posted: Sun Feb 25, 2024 9:50 am
I want to predict not-started season contribution of player by completed season data.
Predicted ratings are good or bad extremely when players who play few minutes record good or bad stats extremely.
So, I make a method that stats are regressed to mean, then calculate all-in-one metrics. I called this methods 'linear padding'.
Linear padding is realized using linear regression models. For example, let's consider prediction 2022-23 season EFF (efficiency), the simplest all-in-one metric , by older data. In this example, stats are per 36 minutes adjusted then EFF is calculated.
To calculate linear padded pts, pts each player in 2021-22 season are regressed by pts each player in 2020-21. Regressions are performed each stats that using to calculate EFF. These regression models are used to predict each stats in 2022-23 season by 2021-22 season stats. Predicted 2022-23 season stats are used to calculate EFF.
Using data in Japanese professional basketball league (B-League) 2021-22 season, I compared correlation coeficient between linear padded EFF or raw EFF and 2022-23 season EFF. I also calculate RMSE. In results, using linear padding makes correlation higher(0.869 vs. 0.854, p<0.01) and RMSE smaller(2.41 vs. 2.53).
Using linear padding, between season correlation of EFF is higher and predictive error is smaller than using raw stats. So, I guess not only EFF but also more sophisticated metrics improve stability using linear padding.
Please tell me other methods that improve all-in-one metrics' stability. Thanks.
Predicted ratings are good or bad extremely when players who play few minutes record good or bad stats extremely.
So, I make a method that stats are regressed to mean, then calculate all-in-one metrics. I called this methods 'linear padding'.
Linear padding is realized using linear regression models. For example, let's consider prediction 2022-23 season EFF (efficiency), the simplest all-in-one metric , by older data. In this example, stats are per 36 minutes adjusted then EFF is calculated.
To calculate linear padded pts, pts each player in 2021-22 season are regressed by pts each player in 2020-21. Regressions are performed each stats that using to calculate EFF. These regression models are used to predict each stats in 2022-23 season by 2021-22 season stats. Predicted 2022-23 season stats are used to calculate EFF.
Using data in Japanese professional basketball league (B-League) 2021-22 season, I compared correlation coeficient between linear padded EFF or raw EFF and 2022-23 season EFF. I also calculate RMSE. In results, using linear padding makes correlation higher(0.869 vs. 0.854, p<0.01) and RMSE smaller(2.41 vs. 2.53).
Using linear padding, between season correlation of EFF is higher and predictive error is smaller than using raw stats. So, I guess not only EFF but also more sophisticated metrics improve stability using linear padding.
Please tell me other methods that improve all-in-one metrics' stability. Thanks.