Access to Spark from Flask app

后端 未结 3 1098
深忆病人
深忆病人 2021-02-02 02:31

I wrote a simple Flask app to pass some data to Spark. The script works in IPython Notebook, but not when I try to run it in it\'s own server. I don\'t think that the Spark co

相关标签:
3条回答
  • 2021-02-02 02:57

    I was able to fix this problem by adding the location of PySpark and py4j to the path in my flaskapp.wsgi file. Here's the full content:

    import sys
    sys.path.insert(0, '/var/www/html/flaskapp')
    sys.path.insert(1, '/usr/local/spark-2.0.2-bin-hadoop2.7/python')
    sys.path.insert(2, '/usr/local/spark-2.0.2-bin-hadoop2.7/python/lib/py4j-0.10.3-src.zip')
    
    from flaskapp import app as application
    
    0 讨论(0)
  • 2021-02-02 02:58

    Modify your .py file as it is shown in the linked guide 'Using IPython Notebook with Spark' part second point. Insted sys.path.insert use sys.path.append. Try insert this snippet:

    import sys
    try:
        sys.path.append("your/spark/home/python")
        from pyspark import context
        print ("Successfully imported Spark Modules")
    except ImportError as e:
        print ("Can not import Spark Modules", e)
    
    0 讨论(0)
  • 2021-02-02 03:05

    Okay, so I'm going to answer my own question in the hope that someone out there won't suffer the same days of frustration! It turns out it was a combination of missing code and bad set up.

    Editing the code: I did indeed need to initialise a Spark Context by appending the following in the preamble of my code:

    from pyspark import SparkContext
    sc = SparkContext('local')
    

    So the full code will be:

    from pyspark import SparkContext
    sc = SparkContext('local')
    
    from flask import Flask, request
    app = Flask(__name__)
    
    @app.route('/whateverYouWant', methods=['POST'])  #can set first param to '/'
    
    def toyFunction():
        posted_data = sc.parallelize([request.get_data()])
        return str(posted_data.collect()[0])
    
    if __name__ == '__main_':
        app.run(port=8080)    #note set to 8080!
    

    Editing the setup: It is essential that the file (yourrfilename.py) is in the correct directory, namely it must be saved to the folder /home/ubuntu/spark-1.5.0-bin-hadoop2.6.

    Then issue the following command within the directory:

    ./bin/spark-submit yourfilename.py
    

    which initiates the service at 10.0.0.XX:8080/accessFunction/ .

    Note that the port must be set to 8080 or 8081: Spark only allows web UI for these ports by default for master and worker respectively

    You can test out the service with a restful service or by opening up a new terminal and sending POST requests with cURL commands:

    curl --data "DATA YOU WANT TO POST" http://10.0.0.XX/8080/accessFunction/
    
    0 讨论(0)
提交回复
热议问题