RAPM Ideas/Questions
Posted: Wed Apr 15, 2026 6:55 am
Currently(sporadically) working on an implementation for a soccer RAPM model. On the back burner also want to make some improvements to the RAPM script I have for the NBA. So here are a list of questions or ideas I've had during the process, some probably not very relevant to the NBA.
__________________________________________________
Part 1: The Basics & Targets
Targets
What makes a good target for RAPM? My understanding is it has to be countable, occur frequently, and have credit distributed amongst players on the court. Given that what are some good and bad targets for RAPM that have not been explored. I know there is 4 factor RAPM and wonder how this is done. I've had an idea of running a rim fga rapm for example where on defense lower rim fga impact should be a proxy of deterrence. Another idea is to look at rim fg% or shot quality and see player effects. Multi target rapm seems unexplored.
On-Off vs RAPM Spectrum
On-Off has no adjustment for teammate help, while rapm looks to adjust for everything possible, what's the in between of these two frameworks. Outside SPMs and raw APM.
Weighting Choices for Stints
Possession weighting, minute weighting, and raw stint weighting all give different results. Possession weighting is most common but the choice isn't obvious. Short stints of 1-2 possessions are almost pure noise. Is there a minimum stint length threshold that helps, or is downweighting better than filtering.
__________________________________________________
Part 2: Model Architecture & Math
Model
What makes a good prior and how exactly do priors affect runs and convergence. How else can we model the effects of players, do non linear models work? I've seen bayesian RAPMs but running it is infeasible(at least for me). What difference does summed stints vs individual stints make. And differences in programming language, or package. Confidence intervals are important as well. How do Neural Network models handle rapm? How do data sources matter and how you define stints. Can we ever compare rapm models from different providers. Are two different rapm runs with the same conditions guaranteed to give the same output? Uncertainty quantification.
Multicollinearity and Separability
A core statistical challenge with RAPM is that certain players almost never separate in the data. Starters who always share the court produce highly collinear columns in the design matrix, making it hard to attribute credit individually. How do you diagnose this, and beyond regularization what can be done about it? Related: regularization path analysis, plotting how each player's coefficient changes as lambda moves from large to small, is a useful diagnostic for seeing which signals are real and which only appear when regularization is weak. Which players emerge first and what does that tell us.
Penalty Type
Ridge is standard for RAPM but there's a question of whether elastic net or LASSO has a place. LASSO introduces sparsity, effectively saying some players have zero marginal impact. Elastic net blends the two. Is there a case for sparsity in RAPM, or does ridge's assumption that everyone has some nonzero effect better match reality.
Synergy
I've seen conflicting effects of synergy in basketball. Implementation should be pairing up players and using them as your variables. Can do 2,3 and 4 man pairings. Would be interested to see player role and synergy effects. What does rapm say about double big lineups. Then how does one perform lineup optimization given an rapm model.
Design Weighting Choices
Standard rapm uses dummy values for each player and lets the model figure it out from there but do we get better convergence/values from hand picking weightings? Some ideas are guards getting a higher coeff than centers on offense but how you choose these weights make or break your model.
__________________________________________________
Part 3: Time, Aging & Trajectories
Convergence
Common knowledge states rapm takes a long time to converge. How can we quantify it and what does it mean for that target. Aside from priors a fix to convergence is luck-adjustment. Is this ever useful and how are its effect different as your rapm length increases.
Age Adjustment and Time Decay
I have some idea for how these are technically implemented but there seems to be lots of ways to do this. Whats the right way. On time decay how do you handle professional breaks in schedule. I think, in theory, with these two one can create a model that looks at every stint ever in the NBA has a current estimate for a players value at time X and use that for the model, rather than a fixed window value. Can query this model potentially for a player's value at time x, peak value and maybe career value.
Career Trajectory
How is an RAPM career trajectory graph done(similar to darko graphs)? My idea is one run for each month, year, game window etc, and fit a curve to that. But that seems naive.
Infinite RAPM
More of a fun question, but what happens if you take current rapm values and use them as the prior for a new run. How many times can you do this before degradation. Is this a better model?
Modeling Changing Roles
A player's role can shift over time as they get more minutes, take on a larger share of shots, or change positions. How do we capture this in RAPM? The same player in a 15 minute bench role vs a 35 minute starting role might have different per-possession impacts. Can we model these regime changes or does RAPM just average over them.
__________________________________________________
Part 4: Environmental Adjustments & Other Ideas
Adjustments / Improvements
There are lots of ways one can change what adjustments are considered with RAPM models, which are good and bad? The most common good ideas I've seen are adjusting for the rubber band effect, home advantage, additional agents/players like coaches, rest days, garbage time/low leverage filtering, priors, replacement level definitions, age adjusting and time decay. Lambda selection is also important, and using different lambdas for offense and defense is an idea. Adjusting for referee might be an idea.
Seen intra-game fatigue adjustments as well. Would be interesting to see if foul trouble effects could be modeled. What are things that make rapm better(or worse). More importantly given two runs of an rapm model, how does one test which performs better. One question is how to handle injuries. An idea is a learned coefficient per injury which either fades out once a player is back. Would be cool to see how much playing with an injury affects player performance and which injuries are worse.
Playoffs vs Regular Season
Playoff basketball is different, tighter rotations, more preparation, different effort levels. How do we capture this? Separate models, a playoff indicator variable, or re-weighting? And how much do player RAPM values actually shift between regular season and playoffs.
Other Ideas
Does how the stint start/end matter for the model? What can we infer from this. Asymmetrical Personnel, in soccer you can have 11 v 10 man stints how is this accounted for. Physical environments(altitude, travel distance, pitch conditions, number of fans). Use of spatial or tracking data in rapm. Maybe get a spacing metric from these.
__________________________________________________
Anyways these are way too many ideas about rapm one should have and could serve as a starting point for experimentation. Let me know if I missed anything.
__________________________________________________
Part 1: The Basics & Targets
Targets
What makes a good target for RAPM? My understanding is it has to be countable, occur frequently, and have credit distributed amongst players on the court. Given that what are some good and bad targets for RAPM that have not been explored. I know there is 4 factor RAPM and wonder how this is done. I've had an idea of running a rim fga rapm for example where on defense lower rim fga impact should be a proxy of deterrence. Another idea is to look at rim fg% or shot quality and see player effects. Multi target rapm seems unexplored.
On-Off vs RAPM Spectrum
On-Off has no adjustment for teammate help, while rapm looks to adjust for everything possible, what's the in between of these two frameworks. Outside SPMs and raw APM.
Weighting Choices for Stints
Possession weighting, minute weighting, and raw stint weighting all give different results. Possession weighting is most common but the choice isn't obvious. Short stints of 1-2 possessions are almost pure noise. Is there a minimum stint length threshold that helps, or is downweighting better than filtering.
__________________________________________________
Part 2: Model Architecture & Math
Model
What makes a good prior and how exactly do priors affect runs and convergence. How else can we model the effects of players, do non linear models work? I've seen bayesian RAPMs but running it is infeasible(at least for me). What difference does summed stints vs individual stints make. And differences in programming language, or package. Confidence intervals are important as well. How do Neural Network models handle rapm? How do data sources matter and how you define stints. Can we ever compare rapm models from different providers. Are two different rapm runs with the same conditions guaranteed to give the same output? Uncertainty quantification.
Multicollinearity and Separability
A core statistical challenge with RAPM is that certain players almost never separate in the data. Starters who always share the court produce highly collinear columns in the design matrix, making it hard to attribute credit individually. How do you diagnose this, and beyond regularization what can be done about it? Related: regularization path analysis, plotting how each player's coefficient changes as lambda moves from large to small, is a useful diagnostic for seeing which signals are real and which only appear when regularization is weak. Which players emerge first and what does that tell us.
Penalty Type
Ridge is standard for RAPM but there's a question of whether elastic net or LASSO has a place. LASSO introduces sparsity, effectively saying some players have zero marginal impact. Elastic net blends the two. Is there a case for sparsity in RAPM, or does ridge's assumption that everyone has some nonzero effect better match reality.
Synergy
I've seen conflicting effects of synergy in basketball. Implementation should be pairing up players and using them as your variables. Can do 2,3 and 4 man pairings. Would be interested to see player role and synergy effects. What does rapm say about double big lineups. Then how does one perform lineup optimization given an rapm model.
Design Weighting Choices
Standard rapm uses dummy values for each player and lets the model figure it out from there but do we get better convergence/values from hand picking weightings? Some ideas are guards getting a higher coeff than centers on offense but how you choose these weights make or break your model.
__________________________________________________
Part 3: Time, Aging & Trajectories
Convergence
Common knowledge states rapm takes a long time to converge. How can we quantify it and what does it mean for that target. Aside from priors a fix to convergence is luck-adjustment. Is this ever useful and how are its effect different as your rapm length increases.
Age Adjustment and Time Decay
I have some idea for how these are technically implemented but there seems to be lots of ways to do this. Whats the right way. On time decay how do you handle professional breaks in schedule. I think, in theory, with these two one can create a model that looks at every stint ever in the NBA has a current estimate for a players value at time X and use that for the model, rather than a fixed window value. Can query this model potentially for a player's value at time x, peak value and maybe career value.
Career Trajectory
How is an RAPM career trajectory graph done(similar to darko graphs)? My idea is one run for each month, year, game window etc, and fit a curve to that. But that seems naive.
Infinite RAPM
More of a fun question, but what happens if you take current rapm values and use them as the prior for a new run. How many times can you do this before degradation. Is this a better model?
Modeling Changing Roles
A player's role can shift over time as they get more minutes, take on a larger share of shots, or change positions. How do we capture this in RAPM? The same player in a 15 minute bench role vs a 35 minute starting role might have different per-possession impacts. Can we model these regime changes or does RAPM just average over them.
__________________________________________________
Part 4: Environmental Adjustments & Other Ideas
Adjustments / Improvements
There are lots of ways one can change what adjustments are considered with RAPM models, which are good and bad? The most common good ideas I've seen are adjusting for the rubber band effect, home advantage, additional agents/players like coaches, rest days, garbage time/low leverage filtering, priors, replacement level definitions, age adjusting and time decay. Lambda selection is also important, and using different lambdas for offense and defense is an idea. Adjusting for referee might be an idea.
Seen intra-game fatigue adjustments as well. Would be interesting to see if foul trouble effects could be modeled. What are things that make rapm better(or worse). More importantly given two runs of an rapm model, how does one test which performs better. One question is how to handle injuries. An idea is a learned coefficient per injury which either fades out once a player is back. Would be cool to see how much playing with an injury affects player performance and which injuries are worse.
Playoffs vs Regular Season
Playoff basketball is different, tighter rotations, more preparation, different effort levels. How do we capture this? Separate models, a playoff indicator variable, or re-weighting? And how much do player RAPM values actually shift between regular season and playoffs.
Other Ideas
Does how the stint start/end matter for the model? What can we infer from this. Asymmetrical Personnel, in soccer you can have 11 v 10 man stints how is this accounted for. Physical environments(altitude, travel distance, pitch conditions, number of fans). Use of spatial or tracking data in rapm. Maybe get a spacing metric from these.
__________________________________________________
Anyways these are way too many ideas about rapm one should have and could serve as a starting point for experimentation. Let me know if I missed anything.