Clustering players (Ed Küpfer, 2006)
Posted: Mon Apr 18, 2011 5:51 am
Author Message
Ed Küpfer
Joined: 30 Dec 2004
Posts: 764
Location: Toronto
PostPosted: Sun Apr 09, 2006 4:38 pm Post subject: Clustering Players Reply with quote
One of my hobby horses is classifying players by their roles, in contrast to their positions. I have long suspected that traditional position designations were not very useful, a relic of an earlier game much different than the one we watch today. I believe that we can do better, that we can come up with player classifications that are more useful in the context of the modern game, the same way we dumped FG% in favour of measures that better reflect what we see on the court today.
To that end, I've been playing around with cluster analysis. CA is a family of algorithms that cluster observations together automatically based solely on their stats, without regard to how we would cluster them -- that is, there is no dependant variable in CA. I won't describe here how cluster analysis works, but I will say that the logic and the math behind it is very simple, and anyone who wants to read up on it will probably grok the concepts very easily.
There are different types of cluster analysis, and one can break them into two kinds: hierarchical (HCA) and non-hierarchical (NHCA). The former returns results in the form of a tree diagram, which is really helpful. If you went to a large family reunion and ran a HCA, it would probably begin by clustering siblings together, and then clustering those siblings with their parents, and then clustering the parents/sibs cluster with other parents/sibs clusters based on the proximity of familial relations. You would end up with something like a family tree diagram. You can also run a HCA on a group of players, using their stats as input variables, and seeing how the "family tree" of players looks. I've done this many times, and essentially the tree looks like this:
Code:
|
+--------------------+-------------------+
| |
Frontcourt Backcourt
| |
+---------+----------+ +---------+----------+
| | | |
Primary offensive non-offesnive Primary offensive non-offesnive
options players options players
Kirilenko Griffin Ford RBowen
Slater Kaman Cassell Augmon
Yao Voskuhl Hudson BBowen
That is a very, very rough tree, based on only three seasons of data. It's still very useful, and maybe I'll talk more about the HCA tree later.
But what I really wanted to look at is if players' classifications changed throughout their careers. To do that, I needed to use more data. The problem is that running a hierarchical cluster analysis is very computationally intensive — it would take forever to run the HCA on the dataset I wanted to used, which included every player season from 1978 on. Luckily, statisticians have developed other algorithms which don't require as much computation. I file these under the heading non-hierarchical cluster analysis. The most well known is called k-means, where k is the number of clusters you want the computer to return. Unlike HCA, k-means doesn't settle on an optimal number of clusters — it wants you to tell it how many clusters there are. I don't really want to use this, because we don't really have an idea of how many "natural" clusters there are.
Fortunately, there are other options. SPSS has a NHCA called two-step cluster analysis. I don't really know what the two steps are, but the algorithm settles automatically on the number of "natural" clusters, which makes it very useful for me.
Right. The data. Here are the stats I used:
HT: Player height
WT: Player weight
2Att: 2-point attempts per min
3Att: 3-point attempts per min
FTA: FT attempts per min
PF: Personal fouls per min
USAGE: Usage rate
OReb : Offensive rebounding percentage
DReb : Defensive rebounding percentage
TO: Turnover percentage
AST: Percentage of teammate attempts assisted
BLK: Percentage of opponent shots blocked
STL: Steals per opponent possessions
qAST: Percentage of own attempts assisted
All stats pace adjusted to team/league averages. For players who played for more than one team in a season I used the average, weighted by the minutes played.
The cluster analysis settled on 7 clusters. I've named these clusters Post Players, Driving Swingmen, Human Victory Cigars, Miscellaneous Role Players, Defensive Specialists, Backcourt Ballhandlers, and Outside Shooters. These names are just convenient titles, capturing what I see as the clusters' most salient characteristics along with the things that separate them most from the other clusters.
Each cluster has a stats "profile." For example, the POST PLAYER cluster is characterised by high totals in defensive rebounding, two-point attempt rate, and FT attempt rate, average totals in PFs, turnovers, assists, and few 3-point attempts. Like this:
POST PLAYERS
Code:
High: DReb, 2Att, FTA, WT, HT, USAGE, OReb, qAST, BLK
Avg: PF, TO, STL, AST
Low: 3Att
Eddy Curry, Dirk Nowitzki, Drew Gooden, Stromile Swift, Juwan Howard, Zendon Hamilton, Rasheed Wallace, Patrick Ewing, Elton Brand, Lamar Odom
The ten players' at the end are drawn randomly from the POST cluster. This cluster is probably the most intuitively satisfying one. Here are the remaining clusters.
DRIVING SWINGMEN
Code:
High: 2Att, USAGE, FTA
Avg: STL, qAST, AST, 3Att, OReb, HT
Low: DReb, WT, PF, BLK, TO
Lebron James, Jeryl Sasser, Allen Iverson, Ronald Murray, Ricky Davis, Richard Jefferson, James Cotton, Isiah Rider, Kobe Bryant, Ron Mercer
You won't see Jeryl Sasser's name appear next to Iverson's too often, but I think we can visualise this type of player easily enough.
HUMAN VICTORY CIGARS
Code:
High: PF, TO, STL, FTA, USAGE
Avg: OReb, qAST, 2Att, 3Att, HT, WT, DReb, BLK
Low: AST
Lawrence Funderburke, Tim James, Rusty LaRue, Terry Mills, Jermaine Jackson, Rashard Lewis, Tierre Brown, Damone Brown, Jason Hart, Jerome James
This is the most diffuse cluster, containing players who you'd think have very little in common. In fact, at the end of this post, I'll show a map of all these clusters, and while the others have pretty well defined borders and territories, the CIGARS are in fact all over the place. The one thing they share unambiguously is a high games played/minutes per game ratio, a diagnostic I've used before to flag garbage time players. The players in this cluster played an average of 5 minutes per game, far lower than the second lowest (DEFENSIVE SPECIALISTS - 15 mpg). It's important to remember that MPG was not a stat I used as an input. The clustering algorithm classified these garbage time players without knowing their playing time ahead of time. To me, this is a good external verification of the existence of this cluster.
In know what you're thinking: what the hell is Rashard Lewis doing on a list of garbage time players? All I can say is that Lewis is represented here by his 1999 season, when he only played 145 minutes. His full career trajectory looks like this: CIGAR, ROLE PLAYER, ROLE PLAYER, SHOOTER, DRIVER, SHOOTER, SHOOTER.
MISC ROLE PLAYERS
Code:
High: qAST
Avg: HT, OReb, WT, PF, DReb, STL, TO, BLK, 3Att, 2Att, FTA
Low: USAGE, AST
Michael Curry, Marcus Haislip, Kenny Thomas, Detlef Schrempf, Jonathan Bender, Kevin Edwards, Robert Horry, Carlos Rogers, Ansu Sesay, Vincent Yarbrough
The "miscellaneous" is apt, I think. These players are defined by their inability to create much in the way of offense, but are otherwise average in other stats. Other than the cigars, this is the cluster that contains the greatest range of player positions.
Proportion of ROLE PLAYERS from each traditional player position:
Code:
PG 1%
G 5%
SG 6%
GF 16%
SF 14%
F 28%
PF 16%
FC 13%
C 2%
The characteristic that separates the players in this cluster from the CIGARS is that these players get much more action. These players obviously have some ability, although it doesn't show up much in the stats I used.
DEFENSIVE SPECIALISTS
Code:
High: WT, HT, BLK, DReb, OReb, PF, qAST
Avg: TO, FTA
Low: USAGE, 3Att, AST, STL, 2Att
Charles Oakley, Jahidi White, Clarence Weatherspoon, Dennis Rodman, Hakeem Olajuwon, Jackson Vroman, Joe Kleine, Rasho Nesterovic, Maciej Lampe, Reggie Slater
I could also have called this cluster DEFENSIVE BIG MEN. The similarities between these players are pretty obvious: PFs and centers, lots of rebounds, lots of fouls, few assists and field goal attempts. Very straightforward. That said, Reggie Slater? I loved him on Saved By The Bell, but from his days with the Raptors, I don't remember him playing much defense.
BACKCOURT BALLHANDLERS
Code:
High: AST, STL, TO
Avg: 3Att, USAGE, FTA, 2Att
Low: qAST, HT, WT, OReb, DReb, BLK, PF
Chris Childs, Kevin Ollie, Allen Iverson, Keyon Dooling, Charlie Ward, Will Avery, Speedy Claxton, Tony Parker, Mike James, Kenny Anderson
As you'll see in the map below, the BALLHANDLERS are closely related to the DRIVERS, separated mostly by their assists and turnovers. This is the cluster most similar to a traditional position: point guards.
OUTSIDE SHOOTERS
Code:
High: 3Att
Avg: AST, STL, qAST, USAGE
Low: OReb, 2Att, PF, DReb, BLK, FTA, HT, WT, TO
Bobby Phills, James Robinson, Glen Rice, Sean Elliott, Hubert Davis, Jim Jackson, Rasual Butler, Pat Garrity, Matt Bullard, Johnny Newman
The SHOOTERS. Defined mostly by their predilection for outside shooting, and by the low numbers in virtually every other stat category. Mostly smaller players, despite the presence of Garrity above:
Code:
PG 12%
G 19%
SG 28%
GF 21%
SF 9%
F 8%
PF 2%
FC 1%
C 1%
* * * * * * * * * * * * * * * * * * * * * *
The relationship between these clusters can be displayed on a 2-D "map", by plotting the first two discriminant functions. I love ascii graphics, so here you go:
Code:
+-------------------------------------------------------------+
|_)('.)('.)('.)('.)('.)('.)('.)('.)( |_____|_____|_____|_____||
|/ )('.)('.)('.)('.)('.)('.)('.)( ____|_____|_____|_____|___|
| \_\ )('.)('.)('.)('.)('.)('.)('.)( |_____|_____|_____|_____||
|__ _ )('.)('.)('.)('.)('.)('.)( ____|_____|_____|_____|___|
|/ \/ /('.)('.)('.'.'.'.'.'('.)('.)( |_____| ___ |_____|_____||
| \_\/ \)('.)('.'.DRIVING '.)('.)( ____|__.POST..__|_____|___|
|__ ___ )('.)( SWINGMEN )('.)('.)( |____.PLAYERS.____|_____||
|/ \/ / \/ )('.)'.'.'.'.'..)('.)( ____|_____ _____|_____|___|
| \_\/ \_\/('.)('.)('.)('.)('.)('.)( |_____|_____|_____|_____||
|__ ___ ___('.)('.)('.)('.)('.)( ____|_____|_____|_____|___|
|/ \/ / \/ / \)('.)('.)('.)('.)('.)( |_____|_____|_____|_____||
| \_\/ \_\/ \_\ )('.)('.)('.)('.)( ____|_____|_____|_____|___|
|__ ___ ___ __ )('.)('...'.)('.)(_|_____|_____|_____|_____||
|/ \/ / \/ / \/ /('.)('.).2.)('.)( ____ ___|_____|_____|___|
| \.\/ \.\/ \.\/ \)('.)('...'.)('.)(_|__ 1 |_____|_____|_____||
|_..BACKCOURT..___ )('.)('.)('.)(|_____ _____|_____|____|_.-.|
|/ BALLHANDLERS / \ )('.)('.)...)( |_____|___.-._'-._,-'_.-.|
| \_\/ \_\/ \_./ \_\)('.)('.)( .3.******|._,-'_.-._'-._,-'_.-.|
|__ ___ ___.6.__ __)('.)('.)-..*******._,-'_.-._'-._,-'_.-.|
|/ \/ / \/ / \. / \/ /***********.4:*****._,-'_.-._'-._,-'_.-.|
| \_\/ \_\/ \_\/ \_\/ ***********..******._,....-._'-._,-'_.-.|
|__ ___ ___ ___ _*****'*'.ROLE.******._,.5.;-._'-._,-'_.-.|
|/ \/ / \/ / \/ / \/ _****:PLAYERS.***-._,-._.-._'-._,-'_.-.|
| \_\/ \_\/ \_\/ \_\ 7. ***'''''''''***-._,-.;.;.;.;._,-'_.-.|
|__ ___ ___ ___ _.-**************-._,.DEFENSIVE.;-'_.-.|
|/ \/ / \/ / \/ /-._,-' *************'-._.SPECIALISTS.;'_.-.|
| \_\/ \_\/ \_\/ _.-.************'-._,..;.;.;.;.;,-'_.-.|
|__ ___ ___ '-._,-' ***********'-._,-'_.-._'-._,-'_.-.|
|/ \/ / \/ /.-._OUTSIDE_.-._***********'-._,-'_.-._'-._,-'_.-.|
| \_\/ \_\/ SHOOTERS '*********_'-._,-'_.-._'-._,-'_.-.|
|__ ___ _.-._ _.-._ ********_'-._,-'_.-._'-._,-'_.-.|
|/ \/ /_,-' '-._,-' '-********_'-._,-'_.-._'-._,-'_.-.|
| \_\/ _.-._ _.-._ ******._'-._,-'_.-._'-._,-'_.-.|
| '-._,-' '-._,-' '-._*****._'-._,-'_.-._'-._,-'_.-.|
| _ _.-._ _.-._ **'**._'-._,-'_.-._'-._,-'_.-.|
| '-._,-' '-._,-' '-._,****._'-._,-'_.-._'-._,-'_.-.|
|-._ _.-._ _.-._ **-._'-._,-'_.-._'-._,-'_.-.|
+-------------------------------------------------------------+
Symbol Label
------ --------------------
1 POST PLAYERS
2 DRIVING SWINGMEN
3 HUMAN CIGARS
4 MISC ROLE PLAYERS
5 DEFENSIVE SPECIALISTS
6 BACKCOURT BALLHANDLERS
7 OUTSIDE SHOOTERS
Group centroid displayed by group#
The numbers on the map show the cluster centroids. The territories for each cluster are well defined, except for the HUMAN CIGARS. That is because those players are not well defined themselves, except for the garbage time quality. You can see in the following chart how spread out they are:
[/img]
One last thing before I go. Here are some charts showing the relative diagnostic value of each stats used in determining cluster membership:














_________________
ed
Back to top
View user's profile Send private message Send e-mail
Mark
Joined: 20 Aug 2005
Posts: 807
PostPosted: Sun Apr 09, 2006 7:44 pm Post subject: Cluster analysis Reply with quote
This is great Ed. Learning more about players compared to their peers by role is very important.
One definitional question:
AST: Percentage of teammate attempts assisted
Is that % of teammates attempts assisted by that player or assisted at all by anyone and is for all teammates and all time or just teammates on the floor concurrently with the player being studied?
Height and weight displays of each cluster would have value and then even more so various key performance metrics displayed at their physical location of the map to see if height and weight are positively correlated and how they are for different metrics and different clusters.
I’d also be curious to see % of total team time on court by these clusters, revealing team player type biases and weaknesses and then look at W-L records by these and note the patterns and think about how much meaning they have.
I wonder in how many cases key “misc. role players” actually counterbalance / address team minus them weaknesses. Is he the right role player for that team or just a role player with enough total quality points to contribute regardless of category and need. Most are forwards, I assume almost all teams have one in at least top 7 players but it would be interesting to note if any teams eliminate this type player and what type they substitute.
It is not surprising that in some cases who your teammates are and how strong they are on certain metrics can affect the clarity of your role cluster assignment. A two guard set evenly sharing the responsibilities might really share the ball handler and shooter cluster assignments and have lower than average ties to each. A PF/C combo with closer than normal relative post scoring ability might essentially share the post and defender roles which are usually divided. Shooter/ wing slasher can be shared as well.
Misc. role players may be misc. role players by nature or the other guys just may have taken most of the stats and left them that way, even though they could fill other roles if given that role opportunity.
Perhaps some of the best of the misc. role player lot may be undervalued.
Or the other way: The mid 90s Sonics definitely seemed to get a lot of value from Schrempf (went from Indiana where he was a little more of a post player to more strongly a point forward alongside Kemp) as the current Suns do Diaw, Horry on his various championship teams, etc. A misc. role player who can and does give you what you need game by game to win (shooting inside/outside, passing, rebounding, stops steals, etc.) is a very valuable thing.
A ball handler who can take and make threepointers at a good clip (above their cluster average) is quite valuable because of that if you want the threepoint game to play a larger than average role for the team. It is either that a higher volume 3 pt shooting forwards or both.
There must not be many average size guard defensive specialists as the cluster average height is over 6’ 8.5”. Ball handling and shooting needs usually trump? I assume of course some ball handlers and shooters are also good defensive players but just dont cluster as defensive specialist as their other attributes direct them more strongly to those clusters. A versatile 2/3, at least 6-7 is the main defensive specialist in the perimeter subset and some of the best like Bowen and Artest can cover 4 positions.
(With this work, as with Mike G.'s EWin work I would want to recommend use of a eFG% allowed and or points allowed number to address the missing one on one shot defense (and use of adjusted team def. rating on/off as a proxy for help defense) but I know many do not use the current 82games product because of concerns about quality and won't dwell on it. Some day it would be good if we all got over that hump satifactorily by full and careful use of video. Until then I'll use the best available data and homemade meta-ratings.)
Back to top
View user's profile Send private message
gabefarkas
Joined: 31 Dec 2004
Posts: 1291
Location: Durham, NC
PostPosted: Sun Apr 09, 2006 9:26 pm Post subject: Reply with quote
Ed, I heart you. This is truly phenomenal and thought-provoking.
My first instinct: instead of trying to entirely label players with only one category, do you think it would be possible to dole out "cluster credits" or something like that, where a player has a total of 100 points that are distributed by how they fit each criteria. For example, a player in the bottom center of your ascii graph (love it, btw), would probably be something like:
Outside Shooter = 27
Role Player = 20
Defensive Specialist = 25
Cigar = 5
Backcourt Handler = 12
Driving Swingman = 6
Post Player = 5
Does that make sense?
Back to top
View user's profile Send private message Send e-mail AIM Address
Ed Küpfer
Joined: 30 Dec 2004
Posts: 764
Location: Toronto
PostPosted: Sun Apr 09, 2006 9:29 pm Post subject: Re: Cluster analysis Reply with quote
Mark wrote:
One definitional question:
AST: Percentage of teammate attempts assisted
Is that % of teammates attempts assisted by that player or assisted at all by anyone and is for all teammates and all time or just teammates on the floor concurrently with the player being studied?
AST% = AST / ((TeamFGM - PlayerFGM)/(TeamMinutes/5))
IOW the proportion of teammates' made field goals (scaled to player's minutes) assisted by said player.
Quote:
Height and weight displays of each cluster would have value and then even more so various key performance metrics displayed at their physical location of the map to see if height and weight are positively correlated and how they are for different metrics and different clusters.
Among the mass of verbiage, I actually posted a chart of heights and weights by cluster. You must have missed it. Here's height:

Quote:
I wonder in how many cases key “misc. role players” actually counterbalance / address team minus them weaknesses.
One thing to keep in mind is the Role player cluster sits in the middle of the map, sharing "borders" with all the other clusters. This suggests to me that Role Players are drawn from every other cluster, presumably depending on the needs of the team, in addition to the changing abilities of the player. Just like a PG may shift over to the off-guard, or even the 3, based on the needs of the team (teammate injuries, matchups to be exploited, strategic surprise, etc) players of all types may move in and out of the Role Player cluster. This may or may not represent a change in the player's ability.
Keep in mind that the clusters are descriptive. The computer looked at a player's stats and said, oh, you were a role player last year, but this year you were an outside shooter. But once classified as a shooter, the player need not feel any impulse to remain in that role. The clusters were simply an after-the-fact description of what took place. If it turned out that winning teams had, say, more than the usual number of shooters, that does not suggest that teams should be looking to stock up on shooters. What it means is that winning teams tended to have players who played like shooters -- it doesn't mean those players were shooters.
I think the most interesting thing that can come of this is the study of player interactions at the game level. I don't think you can do too much by looking at teams at the season-level.
Quote:
Misc. role players may be misc. role players by nature or the other guys just may have taken most of the stats and left them that way, even though they could fill other roles if given that role opportunity.
Crap. You already said what I wrote.
Quote:
A ball handler who can take and make threepointers at a good clip (above their cluster average) is quite valuable because of that if you want the threepoint game to play a larger than average role for the team. It is either that a higher volume 3 pt shooting forwards or both.
I just want to make clear, so there's no confusion, that the stats I used, at least the shooting stats, were not efficiency stats, in that they don't say anything about how well a player shot, only how much he shot.
_________________
ed
Back to top
View user's profile Send private message Send e-mail
Ed Küpfer
Joined: 30 Dec 2004
Posts: 764
Location: Toronto
PostPosted: Sun Apr 09, 2006 9:43 pm Post subject: Reply with quote
gabefarkas wrote:
My first instinct: instead of trying to entirely label players with only one category, do you think it would be possible to dole out "cluster credits" or something like that, where a player has a total of 100 points that are distributed by how they fit each criteria.
There is, in fact, a clustering algorithm known as fuzzy k-means which does exactly what you're talking about, although it will reinvent its own clusters, which probably won't match the ones I have above.
In any case, before I get to the point where I can hand out cluster credits, I have to figure out a way to classify players into clusters based on their stats. What I did above was simply point out the existence of the clusters, but while I really like the clusters the computer discovered, the way the computer classified the players leaves much room for improvement. I'll probably end up going with a discriminant analysis approach, but a workable approach may also be to go the other way: just to classify players intuitively, based on the high-avg-low stats profiles I displayed above. I'm growing to like this approach -- it kinda makes sense, in that players are rarely classified into positions based solely on their stats. We don't look at a player' numbers and conclude that he is a small forward -- we use other information to classify his position. This is probably what we should do for clusters as well.
_________________
ed
Back to top
View user's profile Send private message Send e-mail
Mark
Joined: 20 Aug 2005
Posts: 807
PostPosted: Mon Apr 10, 2006 1:20 am Post subject: Re: Cluster analysis Reply with quote
[quote="Ed Küpfer
"I actually posted a chart of heights and weights by cluster. You must have missed it. "
I read what you provided. I should have said a cluster specific height and weight chart with individual player names shown would be interesting in additional to your second chart of just dots (hard to react to that other than to say there is substantial variation) or the cluster average charts (which I appreciate as now you can say whether a player is bigger/smaller than role average and think about side by side with their production variances). General ones (not cluster specific) have been produced before I think by you and or Kevin P. so that itself isnt a big deal.
But as I said showing distibutions of names along with key performance metrics (central to that cluster's main role) still seems like it could have value especially if you got into FG% or TS% or certainly rebounding and some others where player size is a key input variable. You have summarized the averages for the clusters for many variables, I was just saying I would also have interest in the level of detail below that but of course then you are getting into many more charts. As much you care to share is welcome. And in addition to sharing it here I could see an article in the 82games/SI series or a sports journal if you care to publish anywhere else.
"I just want to make clear, so there's no confusion, that the stats I used, at least the shooting stats, were not efficiency stats, in that they don't say anything about how well a player shot, only how much he shot."
I know I was projecting beyond that but taking a quantity of three pointers and making a good percentage are both important. I was looking at the first characteristic on your chart but then would of course check the percentage made beyond what you have provided so far.
Thanks for your fine work and time responding.
Back to top
View user's profile Send private message
Ed Küpfer
Joined: 30 Dec 2004
Posts: 764
Location: Toronto
PostPosted: Mon Apr 10, 2006 12:32 pm Post subject: Reply with quote
I just want to add some thoughts to my first post here, about my intentions. What I wanted is to come up with an alternative to traditional positions, something that may prove more useful. The statistics used above are only there to confirm the existence of player clusters, but I could have just as easily come up with similar groupings just by thinking about it for awhile. For example, I tend to group players like Shaq, Bosh, Duncan into a category separate from Battie, the Collins twins, Mihm -- even though those players are traditionally categorised as centers or power forwards. In my head, I have a separate group for Kobe, Vince, Carmelo that I do for Mo Peterson, Adrian Griffin, Bruce Bowen. The small forward/off guard positional designations simply do not capture the diversity of these players. Isn't it more useful to think of the first group as scoring swingmen, and the second as perimeter defenders?
So I came up with some clusters, seven in fact, that I think represent natural player types. These aren't the only possible player types, but 7 is a reasonable number of categories to use. What I want is these clusters to be an alternative to traditional positions when analysing players. For example, ranking players within each clusters seems to me to be more reasonable than ranking players by position -- of course Vince is more productive than Bowen, but why are we comparing those two players? Shouldn't we compare Vince to other scorers, and Bowen to other defenders?
Different roles. It's important to keep these in mind. I don't want to be dogmatic about membership in each cluster. I spent a few hours yesterday trying to come up with methods of determining cluster membership for each player, but why should I? We don't spend much time looking at player stats to determine what position they play -- this is just something we know. That's how I want us to look at player clusters, as something we just naturally know. To that end, I'm not going to add to the confusion by posting a membership test. All you should really need to determine what cluster a player belongs to is your intuition (having a good prior knowledge of the definition of each cluster, of course). If you're still in doubt, use the player stats profiles I posted above.
POST PLAYERS
Code:
High: DReb, 2Att, FTA, WT, HT, USAGE, OReb, qAST, BLK
Avg: PF, TO, STL, AST
Low: 3Att
DRIVING SWINGMEN
Code:
High: 2Att, USAGE, FTA
Avg: STL, qAST, AST, 3Att, OReb, HT
Low: DReb, WT, PF, BLK, TO
ROLE PLAYERS -- PERIMETER DEFENDER
Code:
High: qAST
Avg: HT, OReb, WT, PF, DReb, STL, TO, BLK, 3Att, 2Att, FTA
Low: USAGE, AST
DEFENSIVE SPECIALISTS -- REBOUNDERS
Code:
High: WT, HT, BLK, DReb, OReb, PF, qAST
Avg: TO, FTA
Low: USAGE, 3Att, AST, STL, 2Att
BACKCOURT BALLHANDLERS -- DISTRIBUTORS
Code:
High: AST, STL, TO
Avg: 3Att, USAGE, FTA, 2Att
Low: qAST, HT, WT, OReb, DReb, BLK, PF
OUTSIDE SHOOTERS
Code:
High: 3Att
Avg: AST, STL, qAST, USAGE
Low: OReb, 2Att, PF, DReb, BLK, FTA, HT, WT, TO
HUMAN VICTORY CIGARS
Code:
High: PF, TO, STL, FTA, USAGE
Avg: OReb, qAST, 2Att, 3Att, HT, WT, DReb, BLK
Low: AST
These clusters came from statistical analysis. But there's no reason the definitions have to remain static. Where, for example, are the perimeter defenders? The stats for these players don't really capture the nature of their ability, unfortunately, so the cluster analysis didn't "find" them. I believe they would be split among the OUTSIDE SHOOTER and ROLE PLAYER clusters. But perimeter defenders are easily conceptualized, even if the stats don't see them. I'm going to change the ROLE PLAYER cluster to include them. I'm also going to take a suggestion from DeanO and rename the DEFENSIVE SPECIALISTS cluster to REBOUNDERS, and the BALLHANDLERS to DISTRIBUTORS.
The last thing is the CIGARS. This is, almost by definition, a garbage can cluster, comprising mostly of players who don't fit into the other clusters. Essentially, there are six clusters, plus one for the players who don't get much playing time. I think in most analysis we can ignore the CIGARS, which would mean 6 categories of players.
_________________
ed
Back to top
View user's profile Send private message Send e-mail
Ed Küpfer
Joined: 30 Dec 2004
Posts: 764
Location: Toronto
PostPosted: Mon Apr 10, 2006 11:19 pm Post subject: Reply with quote
Just in case someone is dying to have an automatic way of classifying these players, here's a relatively painless method. To calculcate the probablity for each player belonging to any particular cluster, you'll need the following:
Code:
HEIGHT = Player height in inches minus 60
2ATT = 2-point attempts per minute
3ATT = 3-point attempts per minute
FTA = Free throw attempts per minute
OR = offensive rebounds per minute
DR = defensive rebounds per minute
TO = turnovers per minute
AST = assists per minute
BLK = blocks per minute
The probability of a player belonging to a cluster is calculated by the logistic equation
probablity of belonging to cluster C = 1/(1 + EXP (-(B)))
The variable coefficients for each player cluster follow.
Code:
CONSTANT HEIGHT 2ATT 3ATT FTA OR DR TO AST BLK
CIGAR -2.9 -0.035 -4.9 5.2 8.3 9.5 -9.0 36.2 -15.4 -23.4
DISTRIBUTOR 4.1 -0.514 -8.3 -9.6 -4.4 -17.1 -1.4 6.3 52.3 -19.9
DRIVER -4.6 -0.005 21.2 -19.4 7.2 -14.6 -22.2 -8.1 -9.8 -34.9
PERIMETERD 2.5 0.075 -4.7 -13.4 -7.9 -1.1 -6.2 -12.6 -10.2 -34.3
POST -16.8 0.324 14.6 -21.4 12.9 2.6 18.2 -6.1 -2.5 -8.0
REBOUNDER -6.3 0.450 -17.7 -27.3 -15.6 17.7 13.8 3.0 -27.8 25.1
SHOOTER 1.1 0.157 -4.2 45.0 -8.5 -36.9 -15.0 -36.2 -15.7 -24.8
For example, here is Ilgauskas (2005):
Code:
PLAYER HEIGHT 2ATT 3ATT FTA OR DR TO AST BLK
Z Ilgauskas 27 0.374 0.003 0.192 0.114 0.143 0.073 0.038 0.063
To calculate his probablity of beloging to the CIGARS cluster, multiply each of those numbers by the CIGARS coeficients from the table above, and then add them all together (including the constant). I get -3.73 as my sum.This sum is the B that goes into the logistic equation:
probablity of belonging to CIGARS cluster = 1/(1 + EXP (-(-3.73))) = 2%
Now go through that procedure for each of the clusters.
Code:
CLUSTER B p(Cluster)
CIGAR -3.73 2%
DISTRIBUTOR -14.70 0%
DRIVER -3.46 3%
PERIMETERD -3.27 4%
POST 1.62 84%
REBOUNDER 0.81 69%
SHOOTER -8.97 0%
And there you go. That's the easy method — trust me, you don't want to see the complicated way. This way does an excellent job (80%+ accuracy) in identifying all cluster except CIGARS, where it doesn't do a good job at all. But that's okay. The other cluster that maybe needs some improvement is PERIMETER-D, where my short method has an accuracy of only 60%. This is the cluster that most requires subjective judgment on the part of the observer to identify, since most of the things these players do well don't show up as stats.
_________________
ed
Back to top
View user's profile Send private message Send e-mail
Ben F.
Joined: 07 Mar 2005
Posts: 391
PostPosted: Tue Apr 11, 2006 3:20 am Post subject: Reply with quote
Ed,
You say that you hope these role designations are more "useful". Does that mean you are trying to come up with a better description of what the players actually do, or that you are trying to come up with something that could be used to help a team?
If it's the latter, I'd like to see how you could use this analysis (which I think is incredible, by the way) to answer what I call the "Boris Diaw Dilemma". (Maybe this should be a separate thread if it can't be answered by this analysis.)
The prevailing theory of Boris Diaw's incredible improvement this year is not just simple work ethic - it's a matter of roles. The Hawks obviously did not know how to use Diaw effectively. They had him at PG (or "distributor") often. Now, D'Antoni has him in the forward and center positions, and it's entirely changed his game. So the question is twofold:
1) How can we identify players who are used in the wrong roles?
2) How can we identify what role they'd be most effective in?
The idea that Diaw can make this incredible leap makes me wonder how many diamonds in the rough there are out there, who are just being misused - especially in this era of players that can do everything.
Back to top
View user's profile Send private message
jeffpotts77
Joined: 18 Feb 2005
Posts: 150
Location: Cambridge, MA
PostPosted: Tue Apr 11, 2006 10:30 am Post subject: Reply with quote
I have nothing to add except to say that this is brilliant! Thanks for you hard work, Ed! Your point about comparing Vince Carter to Bruce Bowen is spot-on. Will you be publishing this anywhere else for non-apbr members to see?
Back to top
View user's profile Send private message Yahoo Messenger
Ed Küpfer
Joined: 30 Dec 2004
Posts: 764
Location: Toronto
PostPosted: Tue Apr 11, 2006 1:55 pm Post subject: Reply with quote
Using the method outlined above, I classified the players for three different 2006 teams. Each player is has a CLUSTER membership, which is simply the identity of the cluster with the highest probability. Those probabilities are shown in the 6 rightmost columns, as percentages. For example:
Code:
PLAYER TEAM PS MPG CLUSTER DIST DRIV PERI POST REB SHOOT
Boris Diaw PHO SF 35 DISTRIBUTOR 19 0 6 2 1 0
Eddie House PHO SG 17 SHOOTER 12 0 2 0 0 100
Diaw is classified as a DISTRIBUTOR, because of his 19% probability of belonging to that cluster. But 19% is fairly low -- it would be more accurate to say that none of the clusters captures his stats profile very well at all (in his two previous seasons, he was classified as a PERIMETER-D and DISTRIBUTOR, again with very low probability scores). Eddie House on the other hand is classified unambiguously as an OUTSIDE SHOOTER, with a small nod to DISTRIBUTOR. I think this matches up with reality fairly closely.
Code:
PLAYER CLUSTER DIST DRIV PERI POST REB SHOOT
Boris Diaw DISTRIBUTOR 19 0 6 2 1 0
Eddie House SHOOTER 12 0 2 0 0 100
James Jones SHOOTER 0 0 15 0 2 100
Kurt Thomas REBOUNDER 0 0 19 17 81 0
Leandrinho Barbosa SHOOTER 10 1 11 0 0 84
Pat Burke REBOUNDER 0 2 9 16 33 0
Raja Bell SHOOTER 1 0 24 0 0 100
Shawn Marion REBOUNDER 0 0 4 4 5 4
Steve Nash DISTRIBUTOR 100 0 1 0 0 3
Tim Thomas SHOOTER 0 0 10 0 1 97
Code:
PLAYER CLUSTER DIST DRIV PERI POST REB SHOOT
Beno Udrih DISTRIBUTOR 57 1 4 0 0 46
Brent Barry SHOOTER 3 0 12 0 1 98
Bruce Bowen SHOOTER 0 0 56 0 12 97
Emmanuel Ginobili SHOOTER 5 1 2 0 0 34
Fabricio Oberto REBOUNDER 0 0 43 1 92 0
Michael Finley SHOOTER 0 0 26 0 0 99
Nazr Mohammed REBOUNDER 0 0 17 19 89 0
Nick Van Exel SHOOTER 62 0 8 0 0 97
Rasho Nesterovic REBOUNDER 0 0 42 2 99 2
Robert Horry SHOOTER 0 0 12 0 28 77
Tim Duncan POST 0 1 1 90 4 0
Tony Parker DISTRIBUTOR 86 55 5 1 0 0
Code:
PLAYER CLUSTER DIST DRIV PERI POST REB SHOOT
Andre Barrett DISTRIBUTOR 100 6 8 0 0 2
Antonio Davis REBOUNDER 0 1 57 2 80 1
Charlie Villanueva SHOOTER 0 0 13 2 10 22
Chris Bosh POST 0 15 6 78 2 0
Darrick Martin DISTRIBUTOR 99 0 11 0 0 76
Eric Williams PERIMETER-D 0 0 45 0 7 42
Joey Graham PERIMETER-D 0 1 36 0 7 23
Jose Calderon DISTRIBUTOR 99 0 19 0 0 2
Loren Woods REBOUNDER 0 0 23 4 100 0
Matt Bonner SHOOTER 0 0 33 0 22 99
Mike James DISTRIBUTOR 76 1 3 0 0 57
Morris Peterson SHOOTER 0 0 17 0 0 98
Pape Sow REBOUNDER 0 0 37 3 95 0
Rafael Araujo REBOUNDER 0 0 50 5 91 0
This method doesn't like to find POST players -- it only classified 20 players that way in 2006. By tweaking the constant in the equation for POST, you can increase this number, or you can simply choose to see POST players as ELITE POST players, and 20 as a reasonable number.
Toronto is the team I know best, and looking at the results, I am very happy with the classificiations. Villanueva's game is hard to classify, and so I'm content with the low scores he registered as a SHOOTER, PERIMETER DEFENDER, and REBOUNDER. Bosh is of course an elite POST player, and Calderon is a DISTRIBTOR in the old-school PG style. Mike James gets high scores as a DISTRIBUTOR and SHOOTER, which is just right.
_________________
ed
Back to top
View user's profile Send private message Send e-mail
94by50
Joined: 01 Jan 2006
Posts: 499
Location: Phoenix
PostPosted: Tue Apr 11, 2006 3:17 pm Post subject: Reply with quote
How useful would all this knowledge be in developing a method for judging how similar two players are? Strict statistical similarity scores are helpful, but perhaps this could be another step in that direction.
Back to top
View user's profile Send private message
THWilson
Joined: 19 Jul 2005
Posts: 164
Location: phoenix
PostPosted: Tue Apr 11, 2006 4:12 pm Post subject: Reply with quote
Ed Küpfer wrote:
This method doesn't like to find POST players -- it only classified 20 players that way in 2006. By tweaking the constant in the equation for POST, you can increase this number, or you can simply choose to see POST players as ELITE POST players, and 20 as a reasonable number.
It also seems to have trouble with Drivers. These are the two high-usage groups, and usage isn't in the linear weights...any connection? I was really surprised to see Manu only get a 1 for driver...
Commendable work, btw.
Back to top
View user's profile Send private message
ziller
Joined: 30 Jun 2005
Posts: 126
Location: Sac Metro
PostPosted: Tue Apr 11, 2006 5:20 pm Post subject: Reply with quote
I took the liberty of running another team through Ed's magnificent gauntlet.

(I can't for the life of me get columns lined up properly. So you get an image. Sorry.)
The various "perimeter-d" cluster members worry me - it makes more sense in this instance for the original role-player tag. The only other problem would be the low marks for "driver" for Kevin Martin and Bonzi Wells. Martin gets to the line plenty, so that's not hurting him. Perhaps here it's a relative lack of two-point field goals - Martin is either taking a three or slashing, a la Joe Johnson 2005. Bonzi doesn't get to the line consistently, however, so that could be his low driver reason.
_________________
SactownRoyalty.com
tziller@gmail.com
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Mark
Joined: 20 Aug 2005
Posts: 807
PostPosted: Tue Apr 11, 2006 5:34 pm Post subject: Reply with quote
The point distribution detail is interesting (almost essential to see really to me).
"There are 6 clusters". OK if you are already set on that based on what the analysis is telling you. You've indicated in the post that follows you essentially dont want a hybrid system (even though the clusters are a mix of role and physical attributes (arent they pretty powerful membership tests?). Wouldnt a pure "role system" remove height and weight? And the cluster then just be distribute, outside shoot, drive, defend, rebound, post without qualifiers by size or location on the court? Have you run the data to produce clusters without height and weight? I assume it is messier to look at -compared to traditional position classification- but isn't that good? I recall in other threads you wrestled with these questions so would welcome any further explanations you care to add about your thinking on this issue now.
I'll still use some form of position/role dual way of thinking about players. Position isnt enough alone, case closed. Role using 6 clusters (found with weight/ height) certainly is valuable. It still seems like a hybrid system to me and distance from the cluster dividing line seems to carry a lot of importance and that suggests the main cluster label isnt doing enough alone. A system with more clusters is a reach for a unified system. Wouldn't they be tighter to the cluster means that way? I started some discussion of options for a 8,9 or 12 cluster hybrid direction but if that moves too far from what you are doing I will hold off on that here and now.
Last edited by Mark on Wed Apr 12, 2006 10:43 am; edited 8 times in total
Ed Küpfer
Joined: 30 Dec 2004
Posts: 764
Location: Toronto
PostPosted: Sun Apr 09, 2006 4:38 pm Post subject: Clustering Players Reply with quote
One of my hobby horses is classifying players by their roles, in contrast to their positions. I have long suspected that traditional position designations were not very useful, a relic of an earlier game much different than the one we watch today. I believe that we can do better, that we can come up with player classifications that are more useful in the context of the modern game, the same way we dumped FG% in favour of measures that better reflect what we see on the court today.
To that end, I've been playing around with cluster analysis. CA is a family of algorithms that cluster observations together automatically based solely on their stats, without regard to how we would cluster them -- that is, there is no dependant variable in CA. I won't describe here how cluster analysis works, but I will say that the logic and the math behind it is very simple, and anyone who wants to read up on it will probably grok the concepts very easily.
There are different types of cluster analysis, and one can break them into two kinds: hierarchical (HCA) and non-hierarchical (NHCA). The former returns results in the form of a tree diagram, which is really helpful. If you went to a large family reunion and ran a HCA, it would probably begin by clustering siblings together, and then clustering those siblings with their parents, and then clustering the parents/sibs cluster with other parents/sibs clusters based on the proximity of familial relations. You would end up with something like a family tree diagram. You can also run a HCA on a group of players, using their stats as input variables, and seeing how the "family tree" of players looks. I've done this many times, and essentially the tree looks like this:
Code:
|
+--------------------+-------------------+
| |
Frontcourt Backcourt
| |
+---------+----------+ +---------+----------+
| | | |
Primary offensive non-offesnive Primary offensive non-offesnive
options players options players
Kirilenko Griffin Ford RBowen
Slater Kaman Cassell Augmon
Yao Voskuhl Hudson BBowen
That is a very, very rough tree, based on only three seasons of data. It's still very useful, and maybe I'll talk more about the HCA tree later.
But what I really wanted to look at is if players' classifications changed throughout their careers. To do that, I needed to use more data. The problem is that running a hierarchical cluster analysis is very computationally intensive — it would take forever to run the HCA on the dataset I wanted to used, which included every player season from 1978 on. Luckily, statisticians have developed other algorithms which don't require as much computation. I file these under the heading non-hierarchical cluster analysis. The most well known is called k-means, where k is the number of clusters you want the computer to return. Unlike HCA, k-means doesn't settle on an optimal number of clusters — it wants you to tell it how many clusters there are. I don't really want to use this, because we don't really have an idea of how many "natural" clusters there are.
Fortunately, there are other options. SPSS has a NHCA called two-step cluster analysis. I don't really know what the two steps are, but the algorithm settles automatically on the number of "natural" clusters, which makes it very useful for me.
Right. The data. Here are the stats I used:
HT: Player height
WT: Player weight
2Att: 2-point attempts per min
3Att: 3-point attempts per min
FTA: FT attempts per min
PF: Personal fouls per min
USAGE: Usage rate
OReb : Offensive rebounding percentage
DReb : Defensive rebounding percentage
TO: Turnover percentage
AST: Percentage of teammate attempts assisted
BLK: Percentage of opponent shots blocked
STL: Steals per opponent possessions
qAST: Percentage of own attempts assisted
All stats pace adjusted to team/league averages. For players who played for more than one team in a season I used the average, weighted by the minutes played.
The cluster analysis settled on 7 clusters. I've named these clusters Post Players, Driving Swingmen, Human Victory Cigars, Miscellaneous Role Players, Defensive Specialists, Backcourt Ballhandlers, and Outside Shooters. These names are just convenient titles, capturing what I see as the clusters' most salient characteristics along with the things that separate them most from the other clusters.
Each cluster has a stats "profile." For example, the POST PLAYER cluster is characterised by high totals in defensive rebounding, two-point attempt rate, and FT attempt rate, average totals in PFs, turnovers, assists, and few 3-point attempts. Like this:
POST PLAYERS
Code:
High: DReb, 2Att, FTA, WT, HT, USAGE, OReb, qAST, BLK
Avg: PF, TO, STL, AST
Low: 3Att
Eddy Curry, Dirk Nowitzki, Drew Gooden, Stromile Swift, Juwan Howard, Zendon Hamilton, Rasheed Wallace, Patrick Ewing, Elton Brand, Lamar Odom
The ten players' at the end are drawn randomly from the POST cluster. This cluster is probably the most intuitively satisfying one. Here are the remaining clusters.
DRIVING SWINGMEN
Code:
High: 2Att, USAGE, FTA
Avg: STL, qAST, AST, 3Att, OReb, HT
Low: DReb, WT, PF, BLK, TO
Lebron James, Jeryl Sasser, Allen Iverson, Ronald Murray, Ricky Davis, Richard Jefferson, James Cotton, Isiah Rider, Kobe Bryant, Ron Mercer
You won't see Jeryl Sasser's name appear next to Iverson's too often, but I think we can visualise this type of player easily enough.
HUMAN VICTORY CIGARS
Code:
High: PF, TO, STL, FTA, USAGE
Avg: OReb, qAST, 2Att, 3Att, HT, WT, DReb, BLK
Low: AST
Lawrence Funderburke, Tim James, Rusty LaRue, Terry Mills, Jermaine Jackson, Rashard Lewis, Tierre Brown, Damone Brown, Jason Hart, Jerome James
This is the most diffuse cluster, containing players who you'd think have very little in common. In fact, at the end of this post, I'll show a map of all these clusters, and while the others have pretty well defined borders and territories, the CIGARS are in fact all over the place. The one thing they share unambiguously is a high games played/minutes per game ratio, a diagnostic I've used before to flag garbage time players. The players in this cluster played an average of 5 minutes per game, far lower than the second lowest (DEFENSIVE SPECIALISTS - 15 mpg). It's important to remember that MPG was not a stat I used as an input. The clustering algorithm classified these garbage time players without knowing their playing time ahead of time. To me, this is a good external verification of the existence of this cluster.
In know what you're thinking: what the hell is Rashard Lewis doing on a list of garbage time players? All I can say is that Lewis is represented here by his 1999 season, when he only played 145 minutes. His full career trajectory looks like this: CIGAR, ROLE PLAYER, ROLE PLAYER, SHOOTER, DRIVER, SHOOTER, SHOOTER.
MISC ROLE PLAYERS
Code:
High: qAST
Avg: HT, OReb, WT, PF, DReb, STL, TO, BLK, 3Att, 2Att, FTA
Low: USAGE, AST
Michael Curry, Marcus Haislip, Kenny Thomas, Detlef Schrempf, Jonathan Bender, Kevin Edwards, Robert Horry, Carlos Rogers, Ansu Sesay, Vincent Yarbrough
The "miscellaneous" is apt, I think. These players are defined by their inability to create much in the way of offense, but are otherwise average in other stats. Other than the cigars, this is the cluster that contains the greatest range of player positions.
Proportion of ROLE PLAYERS from each traditional player position:
Code:
PG 1%
G 5%
SG 6%
GF 16%
SF 14%
F 28%
PF 16%
FC 13%
C 2%
The characteristic that separates the players in this cluster from the CIGARS is that these players get much more action. These players obviously have some ability, although it doesn't show up much in the stats I used.
DEFENSIVE SPECIALISTS
Code:
High: WT, HT, BLK, DReb, OReb, PF, qAST
Avg: TO, FTA
Low: USAGE, 3Att, AST, STL, 2Att
Charles Oakley, Jahidi White, Clarence Weatherspoon, Dennis Rodman, Hakeem Olajuwon, Jackson Vroman, Joe Kleine, Rasho Nesterovic, Maciej Lampe, Reggie Slater
I could also have called this cluster DEFENSIVE BIG MEN. The similarities between these players are pretty obvious: PFs and centers, lots of rebounds, lots of fouls, few assists and field goal attempts. Very straightforward. That said, Reggie Slater? I loved him on Saved By The Bell, but from his days with the Raptors, I don't remember him playing much defense.
BACKCOURT BALLHANDLERS
Code:
High: AST, STL, TO
Avg: 3Att, USAGE, FTA, 2Att
Low: qAST, HT, WT, OReb, DReb, BLK, PF
Chris Childs, Kevin Ollie, Allen Iverson, Keyon Dooling, Charlie Ward, Will Avery, Speedy Claxton, Tony Parker, Mike James, Kenny Anderson
As you'll see in the map below, the BALLHANDLERS are closely related to the DRIVERS, separated mostly by their assists and turnovers. This is the cluster most similar to a traditional position: point guards.
OUTSIDE SHOOTERS
Code:
High: 3Att
Avg: AST, STL, qAST, USAGE
Low: OReb, 2Att, PF, DReb, BLK, FTA, HT, WT, TO
Bobby Phills, James Robinson, Glen Rice, Sean Elliott, Hubert Davis, Jim Jackson, Rasual Butler, Pat Garrity, Matt Bullard, Johnny Newman
The SHOOTERS. Defined mostly by their predilection for outside shooting, and by the low numbers in virtually every other stat category. Mostly smaller players, despite the presence of Garrity above:
Code:
PG 12%
G 19%
SG 28%
GF 21%
SF 9%
F 8%
PF 2%
FC 1%
C 1%
* * * * * * * * * * * * * * * * * * * * * *
The relationship between these clusters can be displayed on a 2-D "map", by plotting the first two discriminant functions. I love ascii graphics, so here you go:
Code:
+-------------------------------------------------------------+
|_)('.)('.)('.)('.)('.)('.)('.)('.)( |_____|_____|_____|_____||
|/ )('.)('.)('.)('.)('.)('.)('.)( ____|_____|_____|_____|___|
| \_\ )('.)('.)('.)('.)('.)('.)('.)( |_____|_____|_____|_____||
|__ _ )('.)('.)('.)('.)('.)('.)( ____|_____|_____|_____|___|
|/ \/ /('.)('.)('.'.'.'.'.'('.)('.)( |_____| ___ |_____|_____||
| \_\/ \)('.)('.'.DRIVING '.)('.)( ____|__.POST..__|_____|___|
|__ ___ )('.)( SWINGMEN )('.)('.)( |____.PLAYERS.____|_____||
|/ \/ / \/ )('.)'.'.'.'.'..)('.)( ____|_____ _____|_____|___|
| \_\/ \_\/('.)('.)('.)('.)('.)('.)( |_____|_____|_____|_____||
|__ ___ ___('.)('.)('.)('.)('.)( ____|_____|_____|_____|___|
|/ \/ / \/ / \)('.)('.)('.)('.)('.)( |_____|_____|_____|_____||
| \_\/ \_\/ \_\ )('.)('.)('.)('.)( ____|_____|_____|_____|___|
|__ ___ ___ __ )('.)('...'.)('.)(_|_____|_____|_____|_____||
|/ \/ / \/ / \/ /('.)('.).2.)('.)( ____ ___|_____|_____|___|
| \.\/ \.\/ \.\/ \)('.)('...'.)('.)(_|__ 1 |_____|_____|_____||
|_..BACKCOURT..___ )('.)('.)('.)(|_____ _____|_____|____|_.-.|
|/ BALLHANDLERS / \ )('.)('.)...)( |_____|___.-._'-._,-'_.-.|
| \_\/ \_\/ \_./ \_\)('.)('.)( .3.******|._,-'_.-._'-._,-'_.-.|
|__ ___ ___.6.__ __)('.)('.)-..*******._,-'_.-._'-._,-'_.-.|
|/ \/ / \/ / \. / \/ /***********.4:*****._,-'_.-._'-._,-'_.-.|
| \_\/ \_\/ \_\/ \_\/ ***********..******._,....-._'-._,-'_.-.|
|__ ___ ___ ___ _*****'*'.ROLE.******._,.5.;-._'-._,-'_.-.|
|/ \/ / \/ / \/ / \/ _****:PLAYERS.***-._,-._.-._'-._,-'_.-.|
| \_\/ \_\/ \_\/ \_\ 7. ***'''''''''***-._,-.;.;.;.;._,-'_.-.|
|__ ___ ___ ___ _.-**************-._,.DEFENSIVE.;-'_.-.|
|/ \/ / \/ / \/ /-._,-' *************'-._.SPECIALISTS.;'_.-.|
| \_\/ \_\/ \_\/ _.-.************'-._,..;.;.;.;.;,-'_.-.|
|__ ___ ___ '-._,-' ***********'-._,-'_.-._'-._,-'_.-.|
|/ \/ / \/ /.-._OUTSIDE_.-._***********'-._,-'_.-._'-._,-'_.-.|
| \_\/ \_\/ SHOOTERS '*********_'-._,-'_.-._'-._,-'_.-.|
|__ ___ _.-._ _.-._ ********_'-._,-'_.-._'-._,-'_.-.|
|/ \/ /_,-' '-._,-' '-********_'-._,-'_.-._'-._,-'_.-.|
| \_\/ _.-._ _.-._ ******._'-._,-'_.-._'-._,-'_.-.|
| '-._,-' '-._,-' '-._*****._'-._,-'_.-._'-._,-'_.-.|
| _ _.-._ _.-._ **'**._'-._,-'_.-._'-._,-'_.-.|
| '-._,-' '-._,-' '-._,****._'-._,-'_.-._'-._,-'_.-.|
|-._ _.-._ _.-._ **-._'-._,-'_.-._'-._,-'_.-.|
+-------------------------------------------------------------+
Symbol Label
------ --------------------
1 POST PLAYERS
2 DRIVING SWINGMEN
3 HUMAN CIGARS
4 MISC ROLE PLAYERS
5 DEFENSIVE SPECIALISTS
6 BACKCOURT BALLHANDLERS
7 OUTSIDE SHOOTERS
Group centroid displayed by group#
The numbers on the map show the cluster centroids. The territories for each cluster are well defined, except for the HUMAN CIGARS. That is because those players are not well defined themselves, except for the garbage time quality. You can see in the following chart how spread out they are:

One last thing before I go. Here are some charts showing the relative diagnostic value of each stats used in determining cluster membership:














_________________
ed
Back to top
View user's profile Send private message Send e-mail
Mark
Joined: 20 Aug 2005
Posts: 807
PostPosted: Sun Apr 09, 2006 7:44 pm Post subject: Cluster analysis Reply with quote
This is great Ed. Learning more about players compared to their peers by role is very important.
One definitional question:
AST: Percentage of teammate attempts assisted
Is that % of teammates attempts assisted by that player or assisted at all by anyone and is for all teammates and all time or just teammates on the floor concurrently with the player being studied?
Height and weight displays of each cluster would have value and then even more so various key performance metrics displayed at their physical location of the map to see if height and weight are positively correlated and how they are for different metrics and different clusters.
I’d also be curious to see % of total team time on court by these clusters, revealing team player type biases and weaknesses and then look at W-L records by these and note the patterns and think about how much meaning they have.
I wonder in how many cases key “misc. role players” actually counterbalance / address team minus them weaknesses. Is he the right role player for that team or just a role player with enough total quality points to contribute regardless of category and need. Most are forwards, I assume almost all teams have one in at least top 7 players but it would be interesting to note if any teams eliminate this type player and what type they substitute.
It is not surprising that in some cases who your teammates are and how strong they are on certain metrics can affect the clarity of your role cluster assignment. A two guard set evenly sharing the responsibilities might really share the ball handler and shooter cluster assignments and have lower than average ties to each. A PF/C combo with closer than normal relative post scoring ability might essentially share the post and defender roles which are usually divided. Shooter/ wing slasher can be shared as well.
Misc. role players may be misc. role players by nature or the other guys just may have taken most of the stats and left them that way, even though they could fill other roles if given that role opportunity.
Perhaps some of the best of the misc. role player lot may be undervalued.
Or the other way: The mid 90s Sonics definitely seemed to get a lot of value from Schrempf (went from Indiana where he was a little more of a post player to more strongly a point forward alongside Kemp) as the current Suns do Diaw, Horry on his various championship teams, etc. A misc. role player who can and does give you what you need game by game to win (shooting inside/outside, passing, rebounding, stops steals, etc.) is a very valuable thing.
A ball handler who can take and make threepointers at a good clip (above their cluster average) is quite valuable because of that if you want the threepoint game to play a larger than average role for the team. It is either that a higher volume 3 pt shooting forwards or both.
There must not be many average size guard defensive specialists as the cluster average height is over 6’ 8.5”. Ball handling and shooting needs usually trump? I assume of course some ball handlers and shooters are also good defensive players but just dont cluster as defensive specialist as their other attributes direct them more strongly to those clusters. A versatile 2/3, at least 6-7 is the main defensive specialist in the perimeter subset and some of the best like Bowen and Artest can cover 4 positions.
(With this work, as with Mike G.'s EWin work I would want to recommend use of a eFG% allowed and or points allowed number to address the missing one on one shot defense (and use of adjusted team def. rating on/off as a proxy for help defense) but I know many do not use the current 82games product because of concerns about quality and won't dwell on it. Some day it would be good if we all got over that hump satifactorily by full and careful use of video. Until then I'll use the best available data and homemade meta-ratings.)
Back to top
View user's profile Send private message
gabefarkas
Joined: 31 Dec 2004
Posts: 1291
Location: Durham, NC
PostPosted: Sun Apr 09, 2006 9:26 pm Post subject: Reply with quote
Ed, I heart you. This is truly phenomenal and thought-provoking.
My first instinct: instead of trying to entirely label players with only one category, do you think it would be possible to dole out "cluster credits" or something like that, where a player has a total of 100 points that are distributed by how they fit each criteria. For example, a player in the bottom center of your ascii graph (love it, btw), would probably be something like:
Outside Shooter = 27
Role Player = 20
Defensive Specialist = 25
Cigar = 5
Backcourt Handler = 12
Driving Swingman = 6
Post Player = 5
Does that make sense?
Back to top
View user's profile Send private message Send e-mail AIM Address
Ed Küpfer
Joined: 30 Dec 2004
Posts: 764
Location: Toronto
PostPosted: Sun Apr 09, 2006 9:29 pm Post subject: Re: Cluster analysis Reply with quote
Mark wrote:
One definitional question:
AST: Percentage of teammate attempts assisted
Is that % of teammates attempts assisted by that player or assisted at all by anyone and is for all teammates and all time or just teammates on the floor concurrently with the player being studied?
AST% = AST / ((TeamFGM - PlayerFGM)/(TeamMinutes/5))
IOW the proportion of teammates' made field goals (scaled to player's minutes) assisted by said player.
Quote:
Height and weight displays of each cluster would have value and then even more so various key performance metrics displayed at their physical location of the map to see if height and weight are positively correlated and how they are for different metrics and different clusters.
Among the mass of verbiage, I actually posted a chart of heights and weights by cluster. You must have missed it. Here's height:

Quote:
I wonder in how many cases key “misc. role players” actually counterbalance / address team minus them weaknesses.
One thing to keep in mind is the Role player cluster sits in the middle of the map, sharing "borders" with all the other clusters. This suggests to me that Role Players are drawn from every other cluster, presumably depending on the needs of the team, in addition to the changing abilities of the player. Just like a PG may shift over to the off-guard, or even the 3, based on the needs of the team (teammate injuries, matchups to be exploited, strategic surprise, etc) players of all types may move in and out of the Role Player cluster. This may or may not represent a change in the player's ability.
Keep in mind that the clusters are descriptive. The computer looked at a player's stats and said, oh, you were a role player last year, but this year you were an outside shooter. But once classified as a shooter, the player need not feel any impulse to remain in that role. The clusters were simply an after-the-fact description of what took place. If it turned out that winning teams had, say, more than the usual number of shooters, that does not suggest that teams should be looking to stock up on shooters. What it means is that winning teams tended to have players who played like shooters -- it doesn't mean those players were shooters.
I think the most interesting thing that can come of this is the study of player interactions at the game level. I don't think you can do too much by looking at teams at the season-level.
Quote:
Misc. role players may be misc. role players by nature or the other guys just may have taken most of the stats and left them that way, even though they could fill other roles if given that role opportunity.
Crap. You already said what I wrote.
Quote:
A ball handler who can take and make threepointers at a good clip (above their cluster average) is quite valuable because of that if you want the threepoint game to play a larger than average role for the team. It is either that a higher volume 3 pt shooting forwards or both.
I just want to make clear, so there's no confusion, that the stats I used, at least the shooting stats, were not efficiency stats, in that they don't say anything about how well a player shot, only how much he shot.
_________________
ed
Back to top
View user's profile Send private message Send e-mail
Ed Küpfer
Joined: 30 Dec 2004
Posts: 764
Location: Toronto
PostPosted: Sun Apr 09, 2006 9:43 pm Post subject: Reply with quote
gabefarkas wrote:
My first instinct: instead of trying to entirely label players with only one category, do you think it would be possible to dole out "cluster credits" or something like that, where a player has a total of 100 points that are distributed by how they fit each criteria.
There is, in fact, a clustering algorithm known as fuzzy k-means which does exactly what you're talking about, although it will reinvent its own clusters, which probably won't match the ones I have above.
In any case, before I get to the point where I can hand out cluster credits, I have to figure out a way to classify players into clusters based on their stats. What I did above was simply point out the existence of the clusters, but while I really like the clusters the computer discovered, the way the computer classified the players leaves much room for improvement. I'll probably end up going with a discriminant analysis approach, but a workable approach may also be to go the other way: just to classify players intuitively, based on the high-avg-low stats profiles I displayed above. I'm growing to like this approach -- it kinda makes sense, in that players are rarely classified into positions based solely on their stats. We don't look at a player' numbers and conclude that he is a small forward -- we use other information to classify his position. This is probably what we should do for clusters as well.
_________________
ed
Back to top
View user's profile Send private message Send e-mail
Mark
Joined: 20 Aug 2005
Posts: 807
PostPosted: Mon Apr 10, 2006 1:20 am Post subject: Re: Cluster analysis Reply with quote
[quote="Ed Küpfer
"I actually posted a chart of heights and weights by cluster. You must have missed it. "
I read what you provided. I should have said a cluster specific height and weight chart with individual player names shown would be interesting in additional to your second chart of just dots (hard to react to that other than to say there is substantial variation) or the cluster average charts (which I appreciate as now you can say whether a player is bigger/smaller than role average and think about side by side with their production variances). General ones (not cluster specific) have been produced before I think by you and or Kevin P. so that itself isnt a big deal.
But as I said showing distibutions of names along with key performance metrics (central to that cluster's main role) still seems like it could have value especially if you got into FG% or TS% or certainly rebounding and some others where player size is a key input variable. You have summarized the averages for the clusters for many variables, I was just saying I would also have interest in the level of detail below that but of course then you are getting into many more charts. As much you care to share is welcome. And in addition to sharing it here I could see an article in the 82games/SI series or a sports journal if you care to publish anywhere else.
"I just want to make clear, so there's no confusion, that the stats I used, at least the shooting stats, were not efficiency stats, in that they don't say anything about how well a player shot, only how much he shot."
I know I was projecting beyond that but taking a quantity of three pointers and making a good percentage are both important. I was looking at the first characteristic on your chart but then would of course check the percentage made beyond what you have provided so far.
Thanks for your fine work and time responding.
Back to top
View user's profile Send private message
Ed Küpfer
Joined: 30 Dec 2004
Posts: 764
Location: Toronto
PostPosted: Mon Apr 10, 2006 12:32 pm Post subject: Reply with quote
I just want to add some thoughts to my first post here, about my intentions. What I wanted is to come up with an alternative to traditional positions, something that may prove more useful. The statistics used above are only there to confirm the existence of player clusters, but I could have just as easily come up with similar groupings just by thinking about it for awhile. For example, I tend to group players like Shaq, Bosh, Duncan into a category separate from Battie, the Collins twins, Mihm -- even though those players are traditionally categorised as centers or power forwards. In my head, I have a separate group for Kobe, Vince, Carmelo that I do for Mo Peterson, Adrian Griffin, Bruce Bowen. The small forward/off guard positional designations simply do not capture the diversity of these players. Isn't it more useful to think of the first group as scoring swingmen, and the second as perimeter defenders?
So I came up with some clusters, seven in fact, that I think represent natural player types. These aren't the only possible player types, but 7 is a reasonable number of categories to use. What I want is these clusters to be an alternative to traditional positions when analysing players. For example, ranking players within each clusters seems to me to be more reasonable than ranking players by position -- of course Vince is more productive than Bowen, but why are we comparing those two players? Shouldn't we compare Vince to other scorers, and Bowen to other defenders?
Different roles. It's important to keep these in mind. I don't want to be dogmatic about membership in each cluster. I spent a few hours yesterday trying to come up with methods of determining cluster membership for each player, but why should I? We don't spend much time looking at player stats to determine what position they play -- this is just something we know. That's how I want us to look at player clusters, as something we just naturally know. To that end, I'm not going to add to the confusion by posting a membership test. All you should really need to determine what cluster a player belongs to is your intuition (having a good prior knowledge of the definition of each cluster, of course). If you're still in doubt, use the player stats profiles I posted above.
POST PLAYERS
Code:
High: DReb, 2Att, FTA, WT, HT, USAGE, OReb, qAST, BLK
Avg: PF, TO, STL, AST
Low: 3Att
DRIVING SWINGMEN
Code:
High: 2Att, USAGE, FTA
Avg: STL, qAST, AST, 3Att, OReb, HT
Low: DReb, WT, PF, BLK, TO
ROLE PLAYERS -- PERIMETER DEFENDER
Code:
High: qAST
Avg: HT, OReb, WT, PF, DReb, STL, TO, BLK, 3Att, 2Att, FTA
Low: USAGE, AST
DEFENSIVE SPECIALISTS -- REBOUNDERS
Code:
High: WT, HT, BLK, DReb, OReb, PF, qAST
Avg: TO, FTA
Low: USAGE, 3Att, AST, STL, 2Att
BACKCOURT BALLHANDLERS -- DISTRIBUTORS
Code:
High: AST, STL, TO
Avg: 3Att, USAGE, FTA, 2Att
Low: qAST, HT, WT, OReb, DReb, BLK, PF
OUTSIDE SHOOTERS
Code:
High: 3Att
Avg: AST, STL, qAST, USAGE
Low: OReb, 2Att, PF, DReb, BLK, FTA, HT, WT, TO
HUMAN VICTORY CIGARS
Code:
High: PF, TO, STL, FTA, USAGE
Avg: OReb, qAST, 2Att, 3Att, HT, WT, DReb, BLK
Low: AST
These clusters came from statistical analysis. But there's no reason the definitions have to remain static. Where, for example, are the perimeter defenders? The stats for these players don't really capture the nature of their ability, unfortunately, so the cluster analysis didn't "find" them. I believe they would be split among the OUTSIDE SHOOTER and ROLE PLAYER clusters. But perimeter defenders are easily conceptualized, even if the stats don't see them. I'm going to change the ROLE PLAYER cluster to include them. I'm also going to take a suggestion from DeanO and rename the DEFENSIVE SPECIALISTS cluster to REBOUNDERS, and the BALLHANDLERS to DISTRIBUTORS.
The last thing is the CIGARS. This is, almost by definition, a garbage can cluster, comprising mostly of players who don't fit into the other clusters. Essentially, there are six clusters, plus one for the players who don't get much playing time. I think in most analysis we can ignore the CIGARS, which would mean 6 categories of players.
_________________
ed
Back to top
View user's profile Send private message Send e-mail
Ed Küpfer
Joined: 30 Dec 2004
Posts: 764
Location: Toronto
PostPosted: Mon Apr 10, 2006 11:19 pm Post subject: Reply with quote
Just in case someone is dying to have an automatic way of classifying these players, here's a relatively painless method. To calculcate the probablity for each player belonging to any particular cluster, you'll need the following:
Code:
HEIGHT = Player height in inches minus 60
2ATT = 2-point attempts per minute
3ATT = 3-point attempts per minute
FTA = Free throw attempts per minute
OR = offensive rebounds per minute
DR = defensive rebounds per minute
TO = turnovers per minute
AST = assists per minute
BLK = blocks per minute
The probability of a player belonging to a cluster is calculated by the logistic equation
probablity of belonging to cluster C = 1/(1 + EXP (-(B)))
The variable coefficients for each player cluster follow.
Code:
CONSTANT HEIGHT 2ATT 3ATT FTA OR DR TO AST BLK
CIGAR -2.9 -0.035 -4.9 5.2 8.3 9.5 -9.0 36.2 -15.4 -23.4
DISTRIBUTOR 4.1 -0.514 -8.3 -9.6 -4.4 -17.1 -1.4 6.3 52.3 -19.9
DRIVER -4.6 -0.005 21.2 -19.4 7.2 -14.6 -22.2 -8.1 -9.8 -34.9
PERIMETERD 2.5 0.075 -4.7 -13.4 -7.9 -1.1 -6.2 -12.6 -10.2 -34.3
POST -16.8 0.324 14.6 -21.4 12.9 2.6 18.2 -6.1 -2.5 -8.0
REBOUNDER -6.3 0.450 -17.7 -27.3 -15.6 17.7 13.8 3.0 -27.8 25.1
SHOOTER 1.1 0.157 -4.2 45.0 -8.5 -36.9 -15.0 -36.2 -15.7 -24.8
For example, here is Ilgauskas (2005):
Code:
PLAYER HEIGHT 2ATT 3ATT FTA OR DR TO AST BLK
Z Ilgauskas 27 0.374 0.003 0.192 0.114 0.143 0.073 0.038 0.063
To calculate his probablity of beloging to the CIGARS cluster, multiply each of those numbers by the CIGARS coeficients from the table above, and then add them all together (including the constant). I get -3.73 as my sum.This sum is the B that goes into the logistic equation:
probablity of belonging to CIGARS cluster = 1/(1 + EXP (-(-3.73))) = 2%
Now go through that procedure for each of the clusters.
Code:
CLUSTER B p(Cluster)
CIGAR -3.73 2%
DISTRIBUTOR -14.70 0%
DRIVER -3.46 3%
PERIMETERD -3.27 4%
POST 1.62 84%
REBOUNDER 0.81 69%
SHOOTER -8.97 0%
And there you go. That's the easy method — trust me, you don't want to see the complicated way. This way does an excellent job (80%+ accuracy) in identifying all cluster except CIGARS, where it doesn't do a good job at all. But that's okay. The other cluster that maybe needs some improvement is PERIMETER-D, where my short method has an accuracy of only 60%. This is the cluster that most requires subjective judgment on the part of the observer to identify, since most of the things these players do well don't show up as stats.
_________________
ed
Back to top
View user's profile Send private message Send e-mail
Ben F.
Joined: 07 Mar 2005
Posts: 391
PostPosted: Tue Apr 11, 2006 3:20 am Post subject: Reply with quote
Ed,
You say that you hope these role designations are more "useful". Does that mean you are trying to come up with a better description of what the players actually do, or that you are trying to come up with something that could be used to help a team?
If it's the latter, I'd like to see how you could use this analysis (which I think is incredible, by the way) to answer what I call the "Boris Diaw Dilemma". (Maybe this should be a separate thread if it can't be answered by this analysis.)
The prevailing theory of Boris Diaw's incredible improvement this year is not just simple work ethic - it's a matter of roles. The Hawks obviously did not know how to use Diaw effectively. They had him at PG (or "distributor") often. Now, D'Antoni has him in the forward and center positions, and it's entirely changed his game. So the question is twofold:
1) How can we identify players who are used in the wrong roles?
2) How can we identify what role they'd be most effective in?
The idea that Diaw can make this incredible leap makes me wonder how many diamonds in the rough there are out there, who are just being misused - especially in this era of players that can do everything.
Back to top
View user's profile Send private message
jeffpotts77
Joined: 18 Feb 2005
Posts: 150
Location: Cambridge, MA
PostPosted: Tue Apr 11, 2006 10:30 am Post subject: Reply with quote
I have nothing to add except to say that this is brilliant! Thanks for you hard work, Ed! Your point about comparing Vince Carter to Bruce Bowen is spot-on. Will you be publishing this anywhere else for non-apbr members to see?
Back to top
View user's profile Send private message Yahoo Messenger
Ed Küpfer
Joined: 30 Dec 2004
Posts: 764
Location: Toronto
PostPosted: Tue Apr 11, 2006 1:55 pm Post subject: Reply with quote
Using the method outlined above, I classified the players for three different 2006 teams. Each player is has a CLUSTER membership, which is simply the identity of the cluster with the highest probability. Those probabilities are shown in the 6 rightmost columns, as percentages. For example:
Code:
PLAYER TEAM PS MPG CLUSTER DIST DRIV PERI POST REB SHOOT
Boris Diaw PHO SF 35 DISTRIBUTOR 19 0 6 2 1 0
Eddie House PHO SG 17 SHOOTER 12 0 2 0 0 100
Diaw is classified as a DISTRIBUTOR, because of his 19% probability of belonging to that cluster. But 19% is fairly low -- it would be more accurate to say that none of the clusters captures his stats profile very well at all (in his two previous seasons, he was classified as a PERIMETER-D and DISTRIBUTOR, again with very low probability scores). Eddie House on the other hand is classified unambiguously as an OUTSIDE SHOOTER, with a small nod to DISTRIBUTOR. I think this matches up with reality fairly closely.
Code:
PLAYER CLUSTER DIST DRIV PERI POST REB SHOOT
Boris Diaw DISTRIBUTOR 19 0 6 2 1 0
Eddie House SHOOTER 12 0 2 0 0 100
James Jones SHOOTER 0 0 15 0 2 100
Kurt Thomas REBOUNDER 0 0 19 17 81 0
Leandrinho Barbosa SHOOTER 10 1 11 0 0 84
Pat Burke REBOUNDER 0 2 9 16 33 0
Raja Bell SHOOTER 1 0 24 0 0 100
Shawn Marion REBOUNDER 0 0 4 4 5 4
Steve Nash DISTRIBUTOR 100 0 1 0 0 3
Tim Thomas SHOOTER 0 0 10 0 1 97
Code:
PLAYER CLUSTER DIST DRIV PERI POST REB SHOOT
Beno Udrih DISTRIBUTOR 57 1 4 0 0 46
Brent Barry SHOOTER 3 0 12 0 1 98
Bruce Bowen SHOOTER 0 0 56 0 12 97
Emmanuel Ginobili SHOOTER 5 1 2 0 0 34
Fabricio Oberto REBOUNDER 0 0 43 1 92 0
Michael Finley SHOOTER 0 0 26 0 0 99
Nazr Mohammed REBOUNDER 0 0 17 19 89 0
Nick Van Exel SHOOTER 62 0 8 0 0 97
Rasho Nesterovic REBOUNDER 0 0 42 2 99 2
Robert Horry SHOOTER 0 0 12 0 28 77
Tim Duncan POST 0 1 1 90 4 0
Tony Parker DISTRIBUTOR 86 55 5 1 0 0
Code:
PLAYER CLUSTER DIST DRIV PERI POST REB SHOOT
Andre Barrett DISTRIBUTOR 100 6 8 0 0 2
Antonio Davis REBOUNDER 0 1 57 2 80 1
Charlie Villanueva SHOOTER 0 0 13 2 10 22
Chris Bosh POST 0 15 6 78 2 0
Darrick Martin DISTRIBUTOR 99 0 11 0 0 76
Eric Williams PERIMETER-D 0 0 45 0 7 42
Joey Graham PERIMETER-D 0 1 36 0 7 23
Jose Calderon DISTRIBUTOR 99 0 19 0 0 2
Loren Woods REBOUNDER 0 0 23 4 100 0
Matt Bonner SHOOTER 0 0 33 0 22 99
Mike James DISTRIBUTOR 76 1 3 0 0 57
Morris Peterson SHOOTER 0 0 17 0 0 98
Pape Sow REBOUNDER 0 0 37 3 95 0
Rafael Araujo REBOUNDER 0 0 50 5 91 0
This method doesn't like to find POST players -- it only classified 20 players that way in 2006. By tweaking the constant in the equation for POST, you can increase this number, or you can simply choose to see POST players as ELITE POST players, and 20 as a reasonable number.
Toronto is the team I know best, and looking at the results, I am very happy with the classificiations. Villanueva's game is hard to classify, and so I'm content with the low scores he registered as a SHOOTER, PERIMETER DEFENDER, and REBOUNDER. Bosh is of course an elite POST player, and Calderon is a DISTRIBTOR in the old-school PG style. Mike James gets high scores as a DISTRIBUTOR and SHOOTER, which is just right.
_________________
ed
Back to top
View user's profile Send private message Send e-mail
94by50
Joined: 01 Jan 2006
Posts: 499
Location: Phoenix
PostPosted: Tue Apr 11, 2006 3:17 pm Post subject: Reply with quote
How useful would all this knowledge be in developing a method for judging how similar two players are? Strict statistical similarity scores are helpful, but perhaps this could be another step in that direction.
Back to top
View user's profile Send private message
THWilson
Joined: 19 Jul 2005
Posts: 164
Location: phoenix
PostPosted: Tue Apr 11, 2006 4:12 pm Post subject: Reply with quote
Ed Küpfer wrote:
This method doesn't like to find POST players -- it only classified 20 players that way in 2006. By tweaking the constant in the equation for POST, you can increase this number, or you can simply choose to see POST players as ELITE POST players, and 20 as a reasonable number.
It also seems to have trouble with Drivers. These are the two high-usage groups, and usage isn't in the linear weights...any connection? I was really surprised to see Manu only get a 1 for driver...
Commendable work, btw.
Back to top
View user's profile Send private message
ziller
Joined: 30 Jun 2005
Posts: 126
Location: Sac Metro
PostPosted: Tue Apr 11, 2006 5:20 pm Post subject: Reply with quote
I took the liberty of running another team through Ed's magnificent gauntlet.

(I can't for the life of me get columns lined up properly. So you get an image. Sorry.)
The various "perimeter-d" cluster members worry me - it makes more sense in this instance for the original role-player tag. The only other problem would be the low marks for "driver" for Kevin Martin and Bonzi Wells. Martin gets to the line plenty, so that's not hurting him. Perhaps here it's a relative lack of two-point field goals - Martin is either taking a three or slashing, a la Joe Johnson 2005. Bonzi doesn't get to the line consistently, however, so that could be his low driver reason.
_________________
SactownRoyalty.com
tziller@gmail.com
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Mark
Joined: 20 Aug 2005
Posts: 807
PostPosted: Tue Apr 11, 2006 5:34 pm Post subject: Reply with quote
The point distribution detail is interesting (almost essential to see really to me).
"There are 6 clusters". OK if you are already set on that based on what the analysis is telling you. You've indicated in the post that follows you essentially dont want a hybrid system (even though the clusters are a mix of role and physical attributes (arent they pretty powerful membership tests?). Wouldnt a pure "role system" remove height and weight? And the cluster then just be distribute, outside shoot, drive, defend, rebound, post without qualifiers by size or location on the court? Have you run the data to produce clusters without height and weight? I assume it is messier to look at -compared to traditional position classification- but isn't that good? I recall in other threads you wrestled with these questions so would welcome any further explanations you care to add about your thinking on this issue now.
I'll still use some form of position/role dual way of thinking about players. Position isnt enough alone, case closed. Role using 6 clusters (found with weight/ height) certainly is valuable. It still seems like a hybrid system to me and distance from the cluster dividing line seems to carry a lot of importance and that suggests the main cluster label isnt doing enough alone. A system with more clusters is a reach for a unified system. Wouldn't they be tighter to the cluster means that way? I started some discussion of options for a 8,9 or 12 cluster hybrid direction but if that moves too far from what you are doing I will hold off on that here and now.
Last edited by Mark on Wed Apr 12, 2006 10:43 am; edited 8 times in total