I wrote a simple Flask app to pass some data to Spark. The script works in IPython Notebook, but not when I try to run it in it\'s own server. I don\'t think that the Spark co
I was able to fix this problem by adding the location of PySpark and py4j to the path in my flaskapp.wsgi file. Here's the full content:
import sys
sys.path.insert(0, '/var/www/html/flaskapp')
sys.path.insert(1, '/usr/local/spark-2.0.2-bin-hadoop2.7/python')
sys.path.insert(2, '/usr/local/spark-2.0.2-bin-hadoop2.7/python/lib/py4j-0.10.3-src.zip')
from flaskapp import app as application
Modify your .py file as it is shown in the linked guide 'Using IPython Notebook with Spark' part second point. Insted sys.path.insert use sys.path.append. Try insert this snippet:
import sys
try:
sys.path.append("your/spark/home/python")
from pyspark import context
print ("Successfully imported Spark Modules")
except ImportError as e:
print ("Can not import Spark Modules", e)
Okay, so I'm going to answer my own question in the hope that someone out there won't suffer the same days of frustration! It turns out it was a combination of missing code and bad set up.
Editing the code: I did indeed need to initialise a Spark Context by appending the following in the preamble of my code:
from pyspark import SparkContext
sc = SparkContext('local')
So the full code will be:
from pyspark import SparkContext
sc = SparkContext('local')
from flask import Flask, request
app = Flask(__name__)
@app.route('/whateverYouWant', methods=['POST']) #can set first param to '/'
def toyFunction():
posted_data = sc.parallelize([request.get_data()])
return str(posted_data.collect()[0])
if __name__ == '__main_':
app.run(port=8080) #note set to 8080!
Editing the setup: It is essential that the file (yourrfilename.py) is in the correct directory, namely it must be saved to the folder /home/ubuntu/spark-1.5.0-bin-hadoop2.6.
Then issue the following command within the directory:
./bin/spark-submit yourfilename.py
which initiates the service at 10.0.0.XX:8080/accessFunction/ .
Note that the port must be set to 8080 or 8081: Spark only allows web UI for these ports by default for master and worker respectively
You can test out the service with a restful service or by opening up a new terminal and sending POST requests with cURL commands:
curl --data "DATA YOU WANT TO POST" http://10.0.0.XX/8080/accessFunction/