How do I truncate a PySpark dataframe of timestamp type to the day?

后端 未结 3 876
悲&欢浪女
悲&欢浪女 2021-01-12 13:01

I have a PySpark dataframe that includes timestamps in a column (call the column \'dt\'), like this:

2018-04-07 16:46:00
2018-03-06 22:18:00
<
3条回答
  •  北荒
    北荒 (楼主)
    2021-01-12 13:42

    You use wrong function. trunc supports only a few formats:

    Returns date truncated to the unit specified by the format.

    :param format: 'year', 'yyyy', 'yy' or 'month', 'mon', 'mm'

    Use date_trunc instead:

    Returns timestamp truncated to the unit specified by the format.

    :param format: 'year', 'yyyy', 'yy', 'month', 'mon', 'mm', 'day', 'dd', 'hour', 'minute', 'second', 'week', 'quarter'

    Example:

    from pyspark.sql.functions import col, date_trunc
    
    df = spark.createDataFrame(["2018-04-07 23:33:21"], "string").toDF("dt").select(col("dt").cast("timestamp"))
    
    df.select(date_trunc("day", "dt")).show()
    # +-------------------+                                                           
    # |date_trunc(day, dt)|
    # +-------------------+
    # |2018-04-07 00:00:00|
    # +-------------------+
    

提交回复
热议问题