How to use custom classes with Apache Spark (pyspark)?
I have written a class implementing a classifier in python. I would like to use Apache Spark to parallelize classification of a huge number of datapoints using this classifier. I'm set up using Amazon EC2 on a cluster with 10 slaves, based off an ami that comes with python's Anaconda distribution on it. The ami lets me use IPython Notebook remotely. I've defined the class BoTree in a file call BoTree.py on the master in the folder /root/anaconda/lib/python2.7/ which is where all my python modules are I've checked that I can import and use BoTree.py when running command line spark from the