Write and run pyspark in IntelliJ IDEA

后端 未结 3 1959
Happy的楠姐
Happy的楠姐 2021-02-15 16:26

i am trying to work with Pyspark in IntelliJ but i cannot figure out how to correctly install it/setup the project. I can work with Python in IntelliJ and I can use the pyspark

相关标签:
3条回答
  • 2021-02-15 17:10

    For example, something of this kind:

    from pyspark import SparkContext, SparkConf
    spark_conf = SparkConf().setAppName("scavenge some logs")
    spark_context = SparkContext(conf=spark_conf)
    address = "/path/to/the/log/on/hdfs/*.gz"
    log = spark_context.textFile(address)
    
    my_result = (log.
    
    ...here go your actions and transformations...
    
    ).saveAsTextFile('my_result')
    
    0 讨论(0)
  • 2021-02-15 17:12

    Set the env path for (SPARK_HOME and PYTHONPATH) in your program run/debug configuration.

    For instance:

    SPARK_HOME=/Users/<username>/javalibs/spark-1.5.0-bin-hadoop2.4/python/
    PYTHON_PATH=/Users/<username>/javalibs/spark-1.5.0-bin-hadoop2.4/python/pyspark
    

    See attached snapshot in IntelliJ Idea

    0 讨论(0)
  • 2021-02-15 17:12

    1 problem I encountered was space as in 'Program Files\spark' for environment variables SPARK_HOME and PYTHONPATH (as stated above) so I moved spark binaries to my user directory instead. Thanks to this answer. Also, make sure you installed the packages for the environment. Ensure you see pyspark package in Project Structure -> Platform Settings SDK -> Python SDK (of choice) -> Packages.

    0 讨论(0)
提交回复
热议问题