Aggregation with Group By date in Spark SQL

前端 未结 3 1386
时光说笑
时光说笑 2021-01-03 04:43

I have an RDD containing a timestamp named time of type long:

root
 |-- id: string (nullable = true)
 |-- value1: string (nullable = true)
          


        
3条回答
  •  执念已碎
    2021-01-03 05:34

    I'm using Spark 1.4.0 and since 1.2.0 DATE appears to be present in the Spark SQL API (SPARK-2562). DATE should allow you to group by the time as YYYY-MM-DD.

    I also have a similar data structure, where my created_on is analogous to your time field.

    root
    |-- id: long (nullable = true)
    |-- value1: long (nullable = true)
    |-- created_on: long (nullable = true)
    

    I solved it using FROM_UNIXTIME(created_on,'YYYY-MM-dd') and works well:

    val countQuery = "SELECT FROM_UNIXTIME(created_on,'YYYY-MM-dd') as `date_created`, COUNT(*) AS `count` FROM user GROUP BY FROM_UNIXTIME(created_on,'YYYY-MM-dd')"
    

    From here on you can do the normal operations, execute the query into a dataframe and so on.

    FROM_UNIXTIME worked probably because I have Hive included in my Spark installation and it's a Hive UDF. However it will be included as part of the Spark SQL native syntax in future releases (SPARK-8175).

提交回复
热议问题