Using Spark to calculate RAPM
Using Spark to calculate RAPM
Thought folks here might be interested in this:
http://nyloncalculus.com/2015/07/29/gue ... -a-how-to/
http://nyloncalculus.com/2015/07/29/gue ... -a-how-to/
Re: Using Spark to calculate RAPM
Thank you. I hope to eventually get around to trying it, if I can assemble all the necessary tools. Mimicking with slight adjustments seems potentially less intimidating than from scratch.
RAM requirements for this level of standalone processing? Is there free access to cluster computing? Exactly where?
RAM requirements for this level of standalone processing? Is there free access to cluster computing? Exactly where?
Re: Using Spark to calculate RAPM
Because it's using sparse vectors, you shouldn't need that much RAM actually. Probably will easily work on a computer with at least 4 GB.
I think AWS gives something like 750 free computer hours to new users. You could try that out, if you haven't already used them up.
I think AWS gives something like 750 free computer hours to new users. You could try that out, if you haven't already used them up.

Re: Using Spark to calculate RAPM
New computer purchase might allow but not present.
I see that AWS means Amazon Web Services http://aws.amazon.com/free/
I haven't gone that far yet.
I see that AWS means Amazon Web Services http://aws.amazon.com/free/
I haven't gone that far yet.
Re: Using Spark to calculate RAPM
I was curious how many nodes you had in the cluster. Also, in this example what is the performance when running the code on a single node versus against the whole cluster?
I think this is good demonstration for organizations, but I agree that using the machine learning labs on AWS or Azure is a good place for the enthusiast that probably doesn't have their own personal cluster. I just starting using what Azure has for a Kaggle project and have enjoyed using it thus far.
I think this is good demonstration for organizations, but I agree that using the machine learning labs on AWS or Azure is a good place for the enthusiast that probably doesn't have their own personal cluster. I just starting using what Azure has for a Kaggle project and have enjoyed using it thus far.
Re: Using Spark to calculate RAPM
As I said in the article, I'm not running this on a cluster, so I don't have those benchmarks. For what I'm doing here, almost certainly it would take longer on AWS to start the cluster than to run the analysis.
-
- Posts: 262
- Joined: Sun Nov 23, 2014 6:18 pm
Re: Using Spark to calculate RAPM
I haven't had the time to read the article yet, but if it is what I think it is, then this is one of the most important contributions to APBR in a very long time.
This has the potential to make a big impact on our community. Thank you so incredibly much Evan!
This has the potential to make a big impact on our community. Thank you so incredibly much Evan!
Re: Using Spark to calculate RAPM
Thanks! I should probably tell you not to read it so you can be left with that sparkling impression (pun intended). 

-
- Posts: 21
- Joined: Fri Jan 17, 2014 7:33 pm
Re: Using Spark to calculate RAPM
Definitely thanks to Evan for writing that up for us!
Re: Using Spark to calculate RAPM
Oh, dear. Kyle Korver is better than LeBron James.
Re: Using Spark to calculate RAPM
Great write up! Thanks Evan!

Oh dear, that's not how the results should be interpreted.Mike G wrote:Oh, dear. Kyle Korver is better than LeBron James.

Re: Using Spark to calculate RAPM
Well please enlighten us on how to interpret the rankings?
Saying "that's not it" doesn't add anything to the understanding of it.
Saying "that's not it" doesn't add anything to the understanding of it.
Re: Using Spark to calculate RAPM
Mike, the way to interpret the rankings is that they are the result of a calculation with very explicit (unbiased) methodology.
If you would take the time to understand the methodology and view the results as simply that, maybe you could come to peace with RAPM.
tl;dr version
It is what it is.
Perhaps you should be more concerned about why your metric did so poorly in predicting wins this season.
If you would take the time to understand the methodology and view the results as simply that, maybe you could come to peace with RAPM.
tl;dr version
It is what it is.
Perhaps you should be more concerned about why your metric did so poorly in predicting wins this season.
Re: Using Spark to calculate RAPM
this statement was made:
Oh, dear. Kyle Korver is better than LeBron James.
and the reply was:
Oh dear, that's not how the results should be interpreted.
followed by this:
Well please enlighten us on how to interpret the rankings? Saying "that's not it" doesn't add anything to the understanding of it.
which is an honest straight-forward question - how should the results be interpreted? if someone is going to present a listing - and it is not a listing of who is better or who played better - then what is it a list of?...
Perhaps you should be more concerned about why your metric did so poorly in predicting wins this season.
was this really necessary? you presented your methodology results and said:
Thought folks here might be interested in this:
why did you post this if you did not want a discussion of it's results? not everyone understands the methodology but can't we discuss the listing as you presented and ask what it means?...
these numbers associated with these players were shown:
Chris Paul,1.162,11837
LeBron James,1.074,13264
Stephen Curry,1.067,13357
Russell Westbrook,0.990,9028
James Harden,0.985,13393
Kyle Korver,0.985,10931
Kyle Lowry,0.953,10999
Draymond Green,0.917,10709
Kevin Durant,0.848,9736
Dirk Nowitzki,0.839,10432
Zach Randolph,0.826,10529
Kawhi Leonard,0.822,9657
Danny Green,0.813,9113
LaMarcus Aldridge,0.799,11057
Andre Iguodala,0.796,9961
Anthony Davis,0.789,9552
George Hill,0.774,8283
Monta Ellis,0.725,12210
Marcin Gortat,0.713,11289
Klay Thompson,0.686,12735
Dwight Howard,0.673,8776
Kevin Love,0.660,10803
Carmelo Anthony,0.644,8252
Deron Williams,0.619,9145
and these statements were made:
the idea going forward should be pretty clear... Winning basketball will ensue as a result...
so my question is what do the numbers mean if - as you claim - the results mean winning basketball?...
for example anthony davis just had a season where his combination of rebounding, steals, blocks, and scoring has only ever been duplicated by 3 other players - jabbar, olajuwon, and david robinson, and davis did it with better offensive efficiency than any of these 3 ever had. he was also all-D 2nd team and 4th in DPOY voting...
by how i rate players i have anthony davis' 14-15 season as quite probably the best ever by a PF (on a per minute basis) over the last 4 decades. but this listing shows PFs dirk nowitzki, lamarcus aldridge, zach randolph, and draymond green above davis. so if this listing is not saying they are either better players or simply had a better season than davis, then what do the numbers mean?...
davis played similar minutes, with similar rebounding, but scored better, shot much better, with fewer turnovers (and thus much better offensive efficiency), with over twice as many steals and almost 3 times as many blocks as aldridge. this listing has aldridge above davis, so what do those numbers mean?...
lastly, do these numbers when normalized to minutes played add up to either a team's W-L percentage or average per game point differential?...
Oh, dear. Kyle Korver is better than LeBron James.
and the reply was:
Oh dear, that's not how the results should be interpreted.
followed by this:
Well please enlighten us on how to interpret the rankings? Saying "that's not it" doesn't add anything to the understanding of it.
which is an honest straight-forward question - how should the results be interpreted? if someone is going to present a listing - and it is not a listing of who is better or who played better - then what is it a list of?...
Perhaps you should be more concerned about why your metric did so poorly in predicting wins this season.
was this really necessary? you presented your methodology results and said:
Thought folks here might be interested in this:
why did you post this if you did not want a discussion of it's results? not everyone understands the methodology but can't we discuss the listing as you presented and ask what it means?...
these numbers associated with these players were shown:
Chris Paul,1.162,11837
LeBron James,1.074,13264
Stephen Curry,1.067,13357
Russell Westbrook,0.990,9028
James Harden,0.985,13393
Kyle Korver,0.985,10931
Kyle Lowry,0.953,10999
Draymond Green,0.917,10709
Kevin Durant,0.848,9736
Dirk Nowitzki,0.839,10432
Zach Randolph,0.826,10529
Kawhi Leonard,0.822,9657
Danny Green,0.813,9113
LaMarcus Aldridge,0.799,11057
Andre Iguodala,0.796,9961
Anthony Davis,0.789,9552
George Hill,0.774,8283
Monta Ellis,0.725,12210
Marcin Gortat,0.713,11289
Klay Thompson,0.686,12735
Dwight Howard,0.673,8776
Kevin Love,0.660,10803
Carmelo Anthony,0.644,8252
Deron Williams,0.619,9145
and these statements were made:
the idea going forward should be pretty clear... Winning basketball will ensue as a result...
so my question is what do the numbers mean if - as you claim - the results mean winning basketball?...
for example anthony davis just had a season where his combination of rebounding, steals, blocks, and scoring has only ever been duplicated by 3 other players - jabbar, olajuwon, and david robinson, and davis did it with better offensive efficiency than any of these 3 ever had. he was also all-D 2nd team and 4th in DPOY voting...
by how i rate players i have anthony davis' 14-15 season as quite probably the best ever by a PF (on a per minute basis) over the last 4 decades. but this listing shows PFs dirk nowitzki, lamarcus aldridge, zach randolph, and draymond green above davis. so if this listing is not saying they are either better players or simply had a better season than davis, then what do the numbers mean?...
davis played similar minutes, with similar rebounding, but scored better, shot much better, with fewer turnovers (and thus much better offensive efficiency), with over twice as many steals and almost 3 times as many blocks as aldridge. this listing has aldridge above davis, so what do those numbers mean?...
lastly, do these numbers when normalized to minutes played add up to either a team's W-L percentage or average per game point differential?...
Re: Using Spark to calculate RAPM
I could spend more time on my own analyses and also spend more time studying others' -- but I'd have to spend less time on everything else in life. Not an option at the moment.
When you observe errors or inconsistencies in ratings, people then may make corrections. Recently I noticed that bk-ref.com had systematically messed up a lot of playoff VORP. They looked into it and fixed it. I assume they appreciate these observations and feedback. I also wonder why nobody else noticed and said anything.
When you observe errors or inconsistencies in ratings, people then may make corrections. Recently I noticed that bk-ref.com had systematically messed up a lot of playoff VORP. They looked into it and fixed it. I assume they appreciate these observations and feedback. I also wonder why nobody else noticed and said anything.