Use collect_list and collect_set in Spark SQL

后端 未结 1 724
再見小時候
再見小時候 2020-11-28 12:51

According to the docs, the collect_set and collect_list functions should be available in Spark SQL. However, I cannot get it to work. I\'m running

相关标签:
1条回答
  • 2020-11-28 13:28

    Spark 2.0+:

    SPARK-10605 introduced native collect_list and collect_set implementation. SparkSession with Hive support or HiveContext are no longer required.

    Spark 2.0-SNAPSHOT (before 2016-05-03):

    You have to enable Hive support for a given SparkSession:

    In Scala:

    val spark = SparkSession.builder
      .master("local")
      .appName("testing")
      .enableHiveSupport()  // <- enable Hive support.
      .getOrCreate()
    

    In Python:

    spark = (SparkSession.builder
        .enableHiveSupport()
        .getOrCreate())
    

    Spark < 2.0:

    To be able to use Hive UDFs (see https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF) you have use Spark built with Hive support (this is already covered when you use pre-built binaries what seems to be the case here) and initialize SparkContext using HiveContext.

    In Scala:

    import org.apache.spark.sql.hive.HiveContext
    import org.apache.spark.sql.SQLContext
    
    val sqlContext: SQLContext = new HiveContext(sc) 
    

    In Python:

    from pyspark.sql import HiveContext
    
    sqlContext = HiveContext(sc)
    
    0 讨论(0)
提交回复
热议问题