Positions in 2D
Positions in 2D
As part of my work on revising Box Plus/Minus, I am investigating using a two-dimensional position spectrum rather than the conventional positions.
I started down this road with BPM 2.0 and I am moving further that direction with the next revision of BPM.
The two dimensions I am proposing are generally Size and Creation.
These generally line up with the first two components of various principal component analyzes I have done or seen when evaluating types and roles of basketball players.
I am defining the size dimension based on on the percentage of the team's rebounds and blocks the player accumulates when they are on the floor. Defining everything in the context of the team allows the focus to be on the role of the player, hence the position of the player on that team. It also allows this approach to be flexible across any league.
I am defining the creation dimension based on the percentage of the team's points and assists the player accumulates when they are on the floor--with an added bonus for the points being efficient relative to the team's average true shooting percentage. This creation dimension does generally indicate a player is good on offense, in general.
For both of these dimensions I am adjusting for the variance of the two metrics being averaged. Assists and blocks are both having their variance divided by two because there is a much greater spread in those percentages. (This basically means I'm using z-scores of the two components, but informally.)
As always for percentages of team production, the league average is 20%.
I am then transforming the resultant percentages for each of these two dimensions into a one to five scale, capping outliers at 1.0 and 5.0 in that dimension. The exact capping bounds I have not nailed down yet.
Additionally, I am converting the resultant creation position to a letter for discussion purposes. A pure creator would be an A, with a secondary creator as a B, continuing on to E for the players with no creation ability.
Note that because of the way I am evaluating these, both of these dimensions do not have players evenly distributed on them. There are much fewer A creators then D or E creators. Similarly there are much fewer pure 5s in the size dimension than 1s and 2s.
Here is a visualization showing all of the players in the NBA over the last 43 years and where they would fall on this two-dimensional position spectrum:
https://public.tableau.com/views/ofNBAS ... share_link
Thoughts?
			
			
									
						
										
						I started down this road with BPM 2.0 and I am moving further that direction with the next revision of BPM.
The two dimensions I am proposing are generally Size and Creation.
These generally line up with the first two components of various principal component analyzes I have done or seen when evaluating types and roles of basketball players.
I am defining the size dimension based on on the percentage of the team's rebounds and blocks the player accumulates when they are on the floor. Defining everything in the context of the team allows the focus to be on the role of the player, hence the position of the player on that team. It also allows this approach to be flexible across any league.
I am defining the creation dimension based on the percentage of the team's points and assists the player accumulates when they are on the floor--with an added bonus for the points being efficient relative to the team's average true shooting percentage. This creation dimension does generally indicate a player is good on offense, in general.
For both of these dimensions I am adjusting for the variance of the two metrics being averaged. Assists and blocks are both having their variance divided by two because there is a much greater spread in those percentages. (This basically means I'm using z-scores of the two components, but informally.)
As always for percentages of team production, the league average is 20%.
I am then transforming the resultant percentages for each of these two dimensions into a one to five scale, capping outliers at 1.0 and 5.0 in that dimension. The exact capping bounds I have not nailed down yet.
Additionally, I am converting the resultant creation position to a letter for discussion purposes. A pure creator would be an A, with a secondary creator as a B, continuing on to E for the players with no creation ability.
Note that because of the way I am evaluating these, both of these dimensions do not have players evenly distributed on them. There are much fewer A creators then D or E creators. Similarly there are much fewer pure 5s in the size dimension than 1s and 2s.
Here is a visualization showing all of the players in the NBA over the last 43 years and where they would fall on this two-dimensional position spectrum:
https://public.tableau.com/views/ofNBAS ... share_link
Thoughts?
Re: Positions in 2D
What are the scales of the axes in the visualization (and why not listed)? 
The data is normalized? What is the brief explanation of why and how?
What would you think of a 3rd dimension being the context of % time mainly with starters to time with mainly bench? Is that already a factor simply off % minutes and average utilization patterns?
Size & creation by position overall... versus % of time at each different position with different relative size and possibly different creation levels and then an average position and size / creation description based on the specific positions?
Whose position list are you starting with?
Still declining to use pbp and tracking data to facilitate application of formula to non-NBA contexts? How available / acceptable /usable to you are pbp and tracking data for NCAA? Is anyone using BPM beyond NBA / G league/ NCAA? Could you still have a standard baseline BPM and bolt on pbp and tracking data for a BPM+ for NBA?
D-BPM and the underlying shot defense component is really weak as currently based on all of team minutes and not exclusively when player is on court.
Tracking data and possibly pbp could get at player "position / location" on the court, especially at time of possession final action.
Your definition of size is exclusively determined by defensive markers? Why not size classification separately by behavior on both sides of the court? Cases where the 2 are not the same are important / interesting.
"I am defining the size dimension based on on the percentage of the team's rebounds and blocks the player accumulates when they are on the floor." So are you using pbp then or is it still share based on total team time?
Dividing player into big / small on size and more / less on creation, which quads gain or lose on average in New BPM vs. current? Are the hybrids "hurt" / "more fairly scored"?
Now or later, what other aspects of BPM formula are under review for possible change / possible discussion?
			
			
									
						
										
						The data is normalized? What is the brief explanation of why and how?
What would you think of a 3rd dimension being the context of % time mainly with starters to time with mainly bench? Is that already a factor simply off % minutes and average utilization patterns?
Size & creation by position overall... versus % of time at each different position with different relative size and possibly different creation levels and then an average position and size / creation description based on the specific positions?
Whose position list are you starting with?
Still declining to use pbp and tracking data to facilitate application of formula to non-NBA contexts? How available / acceptable /usable to you are pbp and tracking data for NCAA? Is anyone using BPM beyond NBA / G league/ NCAA? Could you still have a standard baseline BPM and bolt on pbp and tracking data for a BPM+ for NBA?
D-BPM and the underlying shot defense component is really weak as currently based on all of team minutes and not exclusively when player is on court.
Tracking data and possibly pbp could get at player "position / location" on the court, especially at time of possession final action.
Your definition of size is exclusively determined by defensive markers? Why not size classification separately by behavior on both sides of the court? Cases where the 2 are not the same are important / interesting.
"I am defining the size dimension based on on the percentage of the team's rebounds and blocks the player accumulates when they are on the floor." So are you using pbp then or is it still share based on total team time?
Dividing player into big / small on size and more / less on creation, which quads gain or lose on average in New BPM vs. current? Are the hybrids "hurt" / "more fairly scored"?
Now or later, what other aspects of BPM formula are under review for possible change / possible discussion?
Re: Positions in 2D
DSMok1 wrote: ↑Sat Jan 06, 2024 4:21 pm As part of my work on revising Box Plus/Minus, I am investigating using a two-dimensional position spectrum rather than the conventional positions.
I started down this road with BPM 2.0 and I am moving further that direction with the next revision of BPM.
The two dimensions I am proposing are generally Size and Creation.
These generally line up with the first two components of various principal component analyzes I have done or seen when evaluating types and roles of basketball players.
I am defining the size dimension based on on the percentage of the team's rebounds and blocks the player accumulates when they are on the floor. Defining everything in the context of the team allows the focus to be on the role of the player, hence the position of the player on that team. It also allows this approach to be flexible across any league.
I am defining the creation dimension based on the percentage of the team's points and assists the player accumulates when they are on the floor--with an added bonus for the points being efficient relative to the team's average true shooting percentage. This creation dimension does generally indicate a player is good on offense, in general.
For both of these dimensions I am adjusting for the variance of the two metrics being averaged. Assists and blocks are both having their variance divided by two because there is a much greater spread in those percentages. (This basically means I'm using z-scores of the two components, but informally.)
As always for percentages of team production, the league average is 20%.
I am then transforming the resultant percentages for each of these two dimensions into a one to five scale, capping outliers at 1.0 and 5.0 in that dimension. The exact capping bounds I have not nailed down yet.
Additionally, I am converting the resultant creation position to a letter for discussion purposes. A pure creator would be an A, with a secondary creator as a B, continuing on to E for the players with no creation ability.
Note that because of the way I am evaluating these, both of these dimensions do not have players evenly distributed on them. There are much fewer A creators then D or E creators. Similarly there are much fewer pure 5s in the size dimension than 1s and 2s.
Here is a visualization showing all of the players in the NBA over the last 43 years and where they would fall on this two-dimensional position spectrum:
https://public.tableau.com/views/ofNBAS ... share_link
Thoughts?
In the context of roles, it would of course not work before this data existed but maybe incorporating Offensive synergy playtype frequency to get even more in depth.
IIRC BBI uses second spectrum for some of their defensive roles too (Im not sure how else they would define a "Chaser" for exmaple) and maybe thats something to look into as well
Re: Positions in 2D
How will these meta-positions be used in a new version of BPM?
Is there already some 'positional' influence in the current BPM?
LeBron is shown at b-r.com as playing all 5 positions in his career. In 6 seasons as a Laker, he's shown at 4 positions. In order of BPM:WS falls in the same order, while PER does not quite.
I don't remember LeBron playing C at all. That season was his highest 3PAr, Blk%, and Off/Def ratio in both BPM and WS. It was also his lowest Reb% and Ast% in LA.
Can you anticipate and describe briefly how a position designation would affect his BPM over this span?
EDIT -- From totaling positions as designated at b-r.com, I get these disparities in "wins" rates, per 48 min. Showing fraction of total minutes (.200 being avg) this season:
A quarter-million player minutes here.
While BPM is less disparate, it still begs the question of why the positions with higher proficiency get fewer minutes on avg? Are PG and C just elite and not enough of their minutes available?
Or is it just rare to have 2 C or 2 PG in the game at one time? while we see lineups of 4 wings and a PG (or a big) pretty often.
			
			
									
						
										
						Is there already some 'positional' influence in the current BPM?
LeBron is shown at b-r.com as playing all 5 positions in his career. In 6 seasons as a Laker, he's shown at 4 positions. In order of BPM:
Code: Select all
BPM    yr   pos.  WS/48   PER
8.4   2020   PG   .204   25.5
8.1   2021   PG   .179   24.2
8.0   2019   SF   .179   25.6
7.7   2022    C   .172   26.2
7.5   2024   PF   .159   23.9
6.1   2023   PF   .138   23.9I don't remember LeBron playing C at all. That season was his highest 3PAr, Blk%, and Off/Def ratio in both BPM and WS. It was also his lowest Reb% and Ast% in LA.
Can you anticipate and describe briefly how a position designation would affect his BPM over this span?
EDIT -- From totaling positions as designated at b-r.com, I get these disparities in "wins" rates, per 48 min. Showing fraction of total minutes (.200 being avg) this season:
Code: Select all
pos   %min   per   WS/48   bpm
C    .176   .140   .152   .129
PF   .203   .102   .098   .100
SF   .217   .078   .080   .078
SG   .215   .086   .081   .091
PG   .189   .112   .106   .118
While BPM is less disparate, it still begs the question of why the positions with higher proficiency get fewer minutes on avg? Are PG and C just elite and not enough of their minutes available?
Or is it just rare to have 2 C or 2 PG in the game at one time? while we see lineups of 4 wings and a PG (or a big) pretty often.
Re: Positions in 2D
The scale is vertically average(% TRB, % BLK_normalized), where normalized means the spread is divided by 2 so the stdev of % TRB and % BLK are similar.
The scale is horizontally is average(% Pts_Adj, % AST_normalized), where normalized means the spread is divided by 2 so the stdev of % Pts and % AST are similar.
I reduced assists and blocks so that they do not dominate the combined metric.
The 3rd principal component of team role would be shooting, measured by FT% and 3 pointers. But that's at least as much quality as role on the team. I'm not super concerned about time with or against starters, although I understand that could influence things somewhat.Crow wrote: ↑Sun Jan 07, 2024 12:09 amWhat would you think of a 3rd dimension being the context of % time mainly with starters to time with mainly bench? Is that already a factor simply off % minutes and average utilization patterns?
Size & creation by position overall... versus % of time at each different position with different relative size and possibly different creation levels and then an average position and size / creation description based on the specific positions?
Basketball Reference
Correct. BPM is and will remain purely a box-score metric. Others can and have used additional information to build more accurate models exclusively targeted at the modern NBA. My goal is for this to be as robust and flexible a model as possible when data is limited. BPM has been used for a number of other leagues around the world, some of which I am aware of.Crow wrote: ↑Sun Jan 07, 2024 12:09 amStill declining to use pbp and tracking data to facilitate application of formula to non-NBA contexts? How available / acceptable /usable to you are pbp and tracking data for NCAA? Is anyone using BPM beyond NBA / G league/ NCAA? Could you still have a standard baseline BPM and bolt on pbp and tracking data for a BPM+ for NBA?
D-BPM and the underlying shot defense component is really weak as currently based on all of team minutes and not exclusively when player is on court.
Tracking data and possibly pbp could get at player "position / location" on the court, especially at time of possession final action.
I did look at other statistics as part of these regressions but found little benefit. Size as defined by TRB and BLK really does a good job of capturing a player's functional size on court.Crow wrote: ↑Sun Jan 07, 2024 12:09 amYour definition of size is exclusively determined by defensive markers? Why not size classification separately by behavior on both sides of the court? Cases where the 2 are not the same are important / interesting.
"I am defining the size dimension based on on the percentage of the team's rebounds and blocks the player accumulates when they are on the floor." So are you using pbp then or is it still share based on total team time?
Dividing player into big / small on size and more / less on creation, which quads gain or lose on average in New BPM vs. current? Are the hybrids "hurt" / "more fairly scored"?
Now or later, what other aspects of BPM formula are under review for possible change / possible discussion?
This is simply a refinement of the 2D position spectrum that was being used for BPM 2.0. I don't know yet how it will impact the final BPM regression. I am open to tweaking anything about BPM other than its basic definition of what it should be as a statistic.
Re: Positions in 2D
I have never used position designations in BPM, but rather a position spectrum estimated from the box score.Mike G wrote: ↑Sun Jan 07, 2024 2:49 pm How will these meta-positions be used in a new version of BPM?
Is there already some 'positional' influence in the current BPM?
LeBron is shown at b-r.com as playing all 5 positions in his career. In 6 seasons as a Laker, he's shown at 4 positions. In order of BPM:WS falls in the same order, while PER does not quite.Code: Select all
BPM yr pos. WS/48 PER 8.4 2020 PG .204 25.5 8.1 2021 PG .179 24.2 8.0 2019 SF .179 25.6 7.7 2022 C .172 26.2 7.5 2024 PF .159 23.9 6.1 2023 PF .138 23.9
I don't remember LeBron playing C at all. That season was his highest 3PAr, Blk%, and Off/Def ratio in both BPM and WS. It was also his lowest Reb% and Ast% in LA.
Can you anticipate and describe briefly how a position designation would affect his BPM over this span?
Interestingly, LeBron has never really shifted around much in the 2D position spectrum:
https://public.tableau.com/shared/QSRGF ... share_link
He's squarely a 3A player, no matter what he's called. He always plays about the same role.
When looking at the distributions, certain stats are not in a bell curve. Assists and blocks are not--there is a strong right skew. These skills seem to be more rare and valuable. It feels that players without defined roles (and perhaps not great skills) end up in the SF and SG category.Mike G wrote: ↑Sun Jan 07, 2024 2:49 pmEDIT -- From totaling positions as designated at b-r.com, I get these disparities in "wins" rates, per 48 min. Showing fraction of total minutes (.200 being avg) this season:A quarter-million player minutes here.Code: Select all
pos %min per WS/48 bpm C .176 .140 .152 .129 PF .203 .102 .098 .100 SF .217 .078 .080 .078 SG .215 .086 .081 .091 PG .189 .112 .106 .118
While BPM is less disparate, it still begs the question of why the positions with higher proficiency get fewer minutes on avg? Are PG and C just elite and not enough of their minutes available?
Or is it just rare to have 2 C or 2 PG in the game at one time? while we see lineups of 4 wings and a PG (or a big) pretty often.
Thoughts?
Re: Positions in 2D
The basic question is how (and why) does 'position spectrum' influence once and future BPM?
Are rebounds/assists/ etc. more or less influential for 5A, 1C etc?
			
			
									
						
										
						Are rebounds/assists/ etc. more or less influential for 5A, 1C etc?
Re: Positions in 2D
In light of the discussion here and at RealGM, I have revised my methodology significantly.
Therefore, I set the position designations accordingly. I selected 10% intervals for the boundaries between positions. 10%, 20% (which is by definition average), 30%, and 40% are the separating points between the position designations (1,2,3,4,5) and (E,D,C,B,A).
For the continuous spectrum, the positions dimensions will be bounded by 5% and 45%, which correspond to the range 1.0 to 5.0.
https://public.tableau.com/views/ofNBAS ... zHome=no#3
Features of the visualization:
Interestingly, Magic Johnson (1991) is also in the top 5 Creation seasons. Recall this dataset only goes back to 1990.
Think about this "Offensive Creation" dimension as who the defense will prioritize to stop in their gameplan. Who they are most concerned about. Interestingly, Rudy Gobert's best seasons show up above league average creation (dimension 5C)--because his efficient rim scoring was a significant offensive weapon for his team.
			
			
									
						
										
						- For the Size dimension, the new methodology is to take % of team's (TRB + 3*BLK). In other words, blocks are worth 3x rebounds. This was found by regressing on actual player size (height and wingspan), based on a measurement dataset. Using an additive approach makes this setup far more robust. Hassan Whiteside's 2015 season is still an outlier, but that can't be helped...
- For the Offensive Creation dimension, I revamped the methodology and the regression basis.  I compiled a large dataset from PBPstats.com to accurately measure creation--location of assists, self creation, and shooting vs. location were all included, along with team context.  Using this superior basis (incidentally, Steve Nash had 4 of the top 6 seasons), I found a completely different approach to the offensive creation dimension was better.  
 
 The new methodology is to take % of team's (AST + Pts scored above 0.85*TmTS%). In other words, the baseline is 85% of the team's true shooting percentage--points scored above this threshold are indicative of creation. Anything less is just...somebody shooting. This really highlights the importance of efficient scoring.
Therefore, I set the position designations accordingly. I selected 10% intervals for the boundaries between positions. 10%, 20% (which is by definition average), 30%, and 40% are the separating points between the position designations (1,2,3,4,5) and (E,D,C,B,A).
For the continuous spectrum, the positions dimensions will be bounded by 5% and 45%, which correspond to the range 1.0 to 5.0.
https://public.tableau.com/views/ofNBAS ... zHome=no#3
Features of the visualization:
- Clicking on any point will provide a tooltip about that player-season and also highlight all other seasons by that player between 1990 and 2023.
- Filter by year, team, and by minutes played.
- Colors indicate the player's BPM, or at least their BPM as currently formulated.
Interestingly, Magic Johnson (1991) is also in the top 5 Creation seasons. Recall this dataset only goes back to 1990.
Think about this "Offensive Creation" dimension as who the defense will prioritize to stop in their gameplan. Who they are most concerned about. Interestingly, Rudy Gobert's best seasons show up above league average creation (dimension 5C)--because his efficient rim scoring was a significant offensive weapon for his team.
Re: Positions in 2D
In BPM 2.0, a simple version of adjusting by position was already used for the coefficients. It makes sense to me, at least, that different roles on a team will lead to some of the box score statistics being more or less indicative of actual impact on team performance. See the writeup here: https://www.basketball-reference.com/about/bpm2.html
Re: Positions in 2D
Pulled up Tableau chart for Thunder 2023. 2D position for some players on chart do not match the pull up screen.
Giddey, Bazley, JRE off by a letter. Joe off by a number.
			
			
									
						
										
						Giddey, Bazley, JRE off by a letter. Joe off by a number.
Re: Positions in 2D
Ah good point.  I forgot to update the tool tip variable. I'll fix that tomorrow.
EDIT: Fixed
			
			
									
						
										
						EDIT: Fixed
- 
				nbacouchside
- Posts: 151
- Joined: Sun Jul 14, 2013 4:58 am
- Contact:
Re: Positions in 2D
Isn't the use of Pts scored above baseline introducing too much of a quality component to this? For instance, a player like Scoot Henderson this year is clearly charged with a lot of creation, but he is bad at it, so his scoring is below the .85*TmTS% threshold so he gets no credit for creation despite clearly having that role on the team. Why not use True Shot attempts with Assists? Or why not use Usage and AST?DSMok1 wrote: ↑Tue Jan 16, 2024 12:43 am The new methodology is to take % of team's (AST + Pts scored above 0.85*TmTS%). In other words, the baseline is 85% of the team's true shooting percentage--points scored above this threshold are indicative of creation. Anything less is just...somebody shooting. This really highlights the importance of efficient scoring.[/list]
Using these revamped position dimensions, I plotted again the 2-Dimensional position spectrum. As before, the distribution is right-skewed, particularly for the size dimension.
Re: Positions in 2D
That is a very fair point.nbacouchside wrote: ↑Wed Jan 17, 2024 6:23 amIsn't the use of Pts scored above baseline introducing too much of a quality component to this? For instance, a player like Scoot Henderson this year is clearly charged with a lot of creation, but he is bad at it, so his scoring is below the .85*TmTS% threshold so he gets no credit for creation despite clearly having that role on the team. Why not use True Shot attempts with Assists? Or why not use Usage and AST?DSMok1 wrote: ↑Tue Jan 16, 2024 12:43 am The new methodology is to take % of team's (AST + Pts scored above 0.85*TmTS%). In other words, the baseline is 85% of the team's true shooting percentage--points scored above this threshold are indicative of creation. Anything less is just...somebody shooting. This really highlights the importance of efficient scoring.[/list]
Using these revamped position dimensions, I plotted again the 2-Dimensional position spectrum. As before, the distribution is right-skewed, particularly for the size dimension.
When I just ran this regression, the fit was definitely a lot better with this efficiency based approach. But my regression basis was actual creation, actual advantages created. Not attempted creation. So Scoot is currently attempting to create advantages but not necessarily succeeding.
We are obviously in the same situation with assists, where assists are measuring some sort of success in attempted creation.
What would be the regression basis for attempted creation versus actual creation?
Actually, I think the data set I already got from pbpstats.com would work pretty well. Assisted field goal attempts should give most of the credit to the assister... Remove shots off of offensive rebounds... And I still think I don't want to give much credit for mid-range shots. That's not much creation if you're getting a mid-range shot. You don't have to create an advantage at all to shoot a mid-ranger....
Re: Positions in 2D
It looks like a formulation of % of team's (TSA + 5*AST) works well for offensive load/attempted creation.
Here is the Visualization with the revision made:
https://public.tableau.com/views/ofNBAS ... zHome=no#2
Thanks for helping steer me the right direction, Kevin!
			
			
									
						
										
						Here is the Visualization with the revision made:
https://public.tableau.com/views/ofNBAS ... zHome=no#2
Thanks for helping steer me the right direction, Kevin!
Re: Positions in 2D
I dunno if there was another navigation option (should be), but to look at one team, instead of the pre-selection of all teams you have to know to click downward triangle icon at top right of team click and switch to "single value list".
A really non-obvious pathway likely to evade and annoy many.
I had to hunt to figure it out.
			
			
									
						
										
						A really non-obvious pathway likely to evade and annoy many.
I had to hunt to figure it out.