Apache Spark with Python: error

前端 未结 8 1618
别跟我提以往
别跟我提以往 2021-01-28 16:42

New to Spark. Downloaded everything alright but when I run pyspark I get the following errors:

Type \"help\", \"copyright\", \"credits\" or \"license\" for more          


        
相关标签:
8条回答
  • 2021-01-28 16:49

    I also encountered this issue on Windows 7 with pre-built Spark 2.2. Here is a possible solution for Windows guys:

    1. make sure you get all the environment path set correctly, including SPARK_PATH, HADOOP_HOME, etc.

    2. get the correct version of winutils.exe for the Spark-Hadoop prebuilt package

    3. then open a cmd prompt as Administration, run this command:

      winutils chmod 777 C:\tmp\hive

      Note: The drive might be different depending on where you invoke pyspark or spark-shell

    This link should take the credit: see the answer by timesking

    0 讨论(0)
  • 2021-01-28 16:51

    You need a "winutils" competable in the hadoop bin directory.

    0 讨论(0)
  • 2021-01-28 16:52

    It looks like you've found the answer to the second part of your question in the above answer, but for future users getting here via the 'org.apache.spark.sql.hive.HiveSessionState' error, this class is found in the spark-hive jar file, which does not come bundled with Spark if it isn't built with Hive.

    You can get this jar at:

    http://central.maven.org/maven2/org/apache/spark/spark-hive_${SCALA_VERSION}/${SPARK_VERSION}/spark-hive_${SCALA_VERSION}-${SPARK_VERSION}.jar
    

    You'll have to put it into your SPARK_HOME/jars folder, and then Spark should be able to find all of the Hive classes required.

    0 讨论(0)
  • 2021-01-28 16:58

    If you are doing it from the pyspark console, it may be because your installation did not work.

    If not, it's because most example assume you are testing code in the pyspark console where a default variable 'sc' exist.

    You can create a SparkContext by yourself at the beginning of your script using the following code:

    from pyspark import SparkContext, SparkConf
    
    conf = SparkConf()
    sc = SparkContext(conf=conf)
    
    0 讨论(0)
  • 2021-01-28 17:00

    If you're on a Mac and you've installed Spark (and eventually Hive) through Homebrew the answers from @Eric Pettijohn and @user7772046 will not work. The former due to the fact that Homebrew's Spark contains the aforementioned jar file; the latter because, trivially, it is a pure Windows-based solution.

    Inspired by this link and the permission issues hint, I've come up with the following simple solution: launch pyspark using sudo. No more Hive-related errors.

    0 讨论(0)
  • 2021-01-28 17:04

    With my problem like this, because I have set the Hadoop at yarn model, so my solution is to start the hdfs and the YARN.

    start-dfs.sh
    start-yarn.sh
    
    0 讨论(0)
提交回复
热议问题