How to change the column type from String to Date in DataFrames?

前端 未结 1 1825
独厮守ぢ
独厮守ぢ 2020-12-10 04:09

I have a dataframe that have two columns (C, D) are defined as string column type, but the data in the columns are actually dates. for example column C has the date as \"01-

相关标签:
1条回答
  • 2020-12-10 04:54

    Spark >= 2.2

    You can use to_date:

    import org.apache.spark.sql.functions.{to_date, to_timestamp}
    
    df.select(to_date($"ts", "dd-MMM-yyyy").alias("date"))
    

    or to_timestamp:

    df.select(to_date($"ts", "dd-MMM-yyyy").alias("timestamp"))
    

    with intermediate unix_timestamp call.

    Spark < 2.2

    Since Spark 1.5 you can use unix_timestamp function to parse string to long, cast it to timestamp and truncate to_date:

    import org.apache.spark.sql.functions.{unix_timestamp, to_date}
    
    val df = Seq((1L, "01-APR-2015")).toDF("id", "ts")
    
    df.select(to_date(unix_timestamp(
      $"ts", "dd-MMM-yyyy"
    ).cast("timestamp")).alias("timestamp"))
    

    Note:

    Depending on a Spark version you this may require some adjustments due to SPARK-11724:

    Casting from integer types to timestamp treats the source int as being in millis. Casting from timestamp to integer types creates the result in seconds.

    If you use unpatched version unix_timestamp output requires multiplication by 1000.

    0 讨论(0)
提交回复
热议问题