I have a CSV in which a field is datetime in a specific format. I cannot import it directly in my Dataframe because it needs to be a timestamp. So I import it as string and
I haven't played with Spark SQL yet but I think this would be more idiomatic scala (null usage is not considered a good practice):
def getTimestamp(s: String) : Option[Timestamp] = s match {
case "" => None
case _ => {
val format = new SimpleDateFormat("MM/dd/yyyy' 'HH:mm:ss")
Try(new Timestamp(format.parse(s).getTime)) match {
case Success(t) => Some(t)
case Failure(_) => None
}
}
}
Please notice I assume you know Row
elements types beforehand (if you read it from a csv file, all them are String
), that's why I use a proper type like String
and not Any
(everything is subtype of Any
).
It also depends on how you want to handle parsing exceptions. In this case, if a parsing exception occurs, a None
is simply returned.
You could use it further on with:
rows.map(row => Row(row(0),row(1),row(2), getTimestamp(row(3))