Better way to convert a string field into timestamp in Spark

前端 未结 7 737
独厮守ぢ
独厮守ぢ 2020-11-27 16:29

I have a CSV in which a field is datetime in a specific format. I cannot import it directly in my Dataframe because it needs to be a timestamp. So I import it as string and

相关标签:
7条回答
  • 2020-11-27 17:21

    I haven't played with Spark SQL yet but I think this would be more idiomatic scala (null usage is not considered a good practice):

    def getTimestamp(s: String) : Option[Timestamp] = s match {
      case "" => None
      case _ => {
        val format = new SimpleDateFormat("MM/dd/yyyy' 'HH:mm:ss")
        Try(new Timestamp(format.parse(s).getTime)) match {
          case Success(t) => Some(t)
          case Failure(_) => None
        }    
      }
    }
    

    Please notice I assume you know Row elements types beforehand (if you read it from a csv file, all them are String), that's why I use a proper type like String and not Any (everything is subtype of Any).

    It also depends on how you want to handle parsing exceptions. In this case, if a parsing exception occurs, a None is simply returned.

    You could use it further on with:

    rows.map(row => Row(row(0),row(1),row(2), getTimestamp(row(3))
    
    0 讨论(0)
提交回复
热议问题