Using Spark to calculate RAPM

Home for all your discussion of basketball statistical analysis.
Crow
Posts: 10536
Joined: Thu Apr 14, 2011 11:10 pm

Re: Using Spark to calculate RAPM

Post by Crow »

Thank you. I hope to eventually get around to trying it, if I can assemble all the necessary tools. Mimicking with slight adjustments seems potentially less intimidating than from scratch.

RAM requirements for this level of standalone processing? Is there free access to cluster computing? Exactly where?
EvanZ
Posts: 912
Joined: Thu Apr 14, 2011 10:41 pm
Location: The City
Contact:

Re: Using Spark to calculate RAPM

Post by EvanZ »

Because it's using sparse vectors, you shouldn't need that much RAM actually. Probably will easily work on a computer with at least 4 GB.

I think AWS gives something like 750 free computer hours to new users. You could try that out, if you haven't already used them up. :)
Crow
Posts: 10536
Joined: Thu Apr 14, 2011 11:10 pm

Re: Using Spark to calculate RAPM

Post by Crow »

New computer purchase might allow but not present.

I see that AWS means Amazon Web Services http://aws.amazon.com/free/
I haven't gone that far yet.
nileriver
Posts: 63
Joined: Thu Jul 18, 2013 3:24 pm
Location: Vancouver, WA

Re: Using Spark to calculate RAPM

Post by nileriver »

I was curious how many nodes you had in the cluster. Also, in this example what is the performance when running the code on a single node versus against the whole cluster?

I think this is good demonstration for organizations, but I agree that using the machine learning labs on AWS or Azure is a good place for the enthusiast that probably doesn't have their own personal cluster. I just starting using what Azure has for a Kaggle project and have enjoyed using it thus far.
EvanZ
Posts: 912
Joined: Thu Apr 14, 2011 10:41 pm
Location: The City
Contact:

Re: Using Spark to calculate RAPM

Post by EvanZ »

As I said in the article, I'm not running this on a cluster, so I don't have those benchmarks. For what I'm doing here, almost certainly it would take longer on AWS to start the cluster than to run the analysis.
ampersand5
Posts: 262
Joined: Sun Nov 23, 2014 6:18 pm

Re: Using Spark to calculate RAPM

Post by ampersand5 »

I haven't had the time to read the article yet, but if it is what I think it is, then this is one of the most important contributions to APBR in a very long time.

This has the potential to make a big impact on our community. Thank you so incredibly much Evan!
EvanZ
Posts: 912
Joined: Thu Apr 14, 2011 10:41 pm
Location: The City
Contact:

Re: Using Spark to calculate RAPM

Post by EvanZ »

Thanks! I should probably tell you not to read it so you can be left with that sparkling impression (pun intended). ;)
sethypooh21
Posts: 21
Joined: Fri Jan 17, 2014 7:33 pm

Re: Using Spark to calculate RAPM

Post by sethypooh21 »

Definitely thanks to Evan for writing that up for us!
Mike G
Posts: 6144
Joined: Fri Apr 15, 2011 12:02 am
Location: Asheville, NC

Re: Using Spark to calculate RAPM

Post by Mike G »

Oh, dear. Kyle Korver is better than LeBron James.
mystic
Posts: 470
Joined: Mon Apr 18, 2011 10:09 am
Contact:

Re: Using Spark to calculate RAPM

Post by mystic »

Great write up! Thanks Evan!
Mike G wrote:Oh, dear. Kyle Korver is better than LeBron James.
Oh dear, that's not how the results should be interpreted. ;)
Mike G
Posts: 6144
Joined: Fri Apr 15, 2011 12:02 am
Location: Asheville, NC

Re: Using Spark to calculate RAPM

Post by Mike G »

Well please enlighten us on how to interpret the rankings?

Saying "that's not it" doesn't add anything to the understanding of it.
EvanZ
Posts: 912
Joined: Thu Apr 14, 2011 10:41 pm
Location: The City
Contact:

Re: Using Spark to calculate RAPM

Post by EvanZ »

Mike, the way to interpret the rankings is that they are the result of a calculation with very explicit (unbiased) methodology.

If you would take the time to understand the methodology and view the results as simply that, maybe you could come to peace with RAPM.

tl;dr version

It is what it is.

Perhaps you should be more concerned about why your metric did so poorly in predicting wins this season.
bchaikin
Posts: 307
Joined: Thu May 12, 2011 2:09 am

Re: Using Spark to calculate RAPM

Post by bchaikin »

this statement was made:

Oh, dear. Kyle Korver is better than LeBron James.

and the reply was:

Oh dear, that's not how the results should be interpreted.

followed by this:

Well please enlighten us on how to interpret the rankings? Saying "that's not it" doesn't add anything to the understanding of it.

which is an honest straight-forward question - how should the results be interpreted? if someone is going to present a listing - and it is not a listing of who is better or who played better - then what is it a list of?...

Perhaps you should be more concerned about why your metric did so poorly in predicting wins this season.

was this really necessary? you presented your methodology results and said:

Thought folks here might be interested in this:

why did you post this if you did not want a discussion of it's results? not everyone understands the methodology but can't we discuss the listing as you presented and ask what it means?...

these numbers associated with these players were shown:

Chris Paul,1.162,11837
LeBron James,1.074,13264
Stephen Curry,1.067,13357
Russell Westbrook,0.990,9028
James Harden,0.985,13393
Kyle Korver,0.985,10931
Kyle Lowry,0.953,10999
Draymond Green,0.917,10709
Kevin Durant,0.848,9736
Dirk Nowitzki,0.839,10432
Zach Randolph,0.826,10529
Kawhi Leonard,0.822,9657
Danny Green,0.813,9113
LaMarcus Aldridge,0.799,11057
Andre Iguodala,0.796,9961
Anthony Davis,0.789,9552
George Hill,0.774,8283
Monta Ellis,0.725,12210
Marcin Gortat,0.713,11289
Klay Thompson,0.686,12735
Dwight Howard,0.673,8776
Kevin Love,0.660,10803
Carmelo Anthony,0.644,8252
Deron Williams,0.619,9145


and these statements were made:

the idea going forward should be pretty clear... Winning basketball will ensue as a result...

so my question is what do the numbers mean if - as you claim - the results mean winning basketball?...

for example anthony davis just had a season where his combination of rebounding, steals, blocks, and scoring has only ever been duplicated by 3 other players - jabbar, olajuwon, and david robinson, and davis did it with better offensive efficiency than any of these 3 ever had. he was also all-D 2nd team and 4th in DPOY voting...

by how i rate players i have anthony davis' 14-15 season as quite probably the best ever by a PF (on a per minute basis) over the last 4 decades. but this listing shows PFs dirk nowitzki, lamarcus aldridge, zach randolph, and draymond green above davis. so if this listing is not saying they are either better players or simply had a better season than davis, then what do the numbers mean?...

davis played similar minutes, with similar rebounding, but scored better, shot much better, with fewer turnovers (and thus much better offensive efficiency), with over twice as many steals and almost 3 times as many blocks as aldridge. this listing has aldridge above davis, so what do those numbers mean?...

lastly, do these numbers when normalized to minutes played add up to either a team's W-L percentage or average per game point differential?...
Mike G
Posts: 6144
Joined: Fri Apr 15, 2011 12:02 am
Location: Asheville, NC

Re: Using Spark to calculate RAPM

Post by Mike G »

I could spend more time on my own analyses and also spend more time studying others' -- but I'd have to spend less time on everything else in life. Not an option at the moment.

When you observe errors or inconsistencies in ratings, people then may make corrections. Recently I noticed that bk-ref.com had systematically messed up a lot of playoff VORP. They looked into it and fixed it. I assume they appreciate these observations and feedback. I also wonder why nobody else noticed and said anything.
Post Reply