How to validate date format in a dataframe column in spark scala

前端 未结 3 1282
栀梦
栀梦 2021-01-20 05:56

I have a dataframe with one DateTime column and many other columns.

All I wanted to do is parse this DateTime column value and check if the format is \"yyyy-MM

3条回答
  •  臣服心动
    2021-01-20 06:29

    You can use filter() to get the valid/invalid records in dataframe. This code can be improvable with scala point of view.

      val DATE_TIME_FORMAT = "yyyy-MM-dd HH:mm:ss"
    
      def validateDf(row: Row): Boolean = try {
        //assume row.getString(1) with give Datetime string
        java.time.LocalDateTime.parse(row.getString(1), java.time.format.DateTimeFormatter.ofPattern(DATE_TIME_FORMAT))
        true
      } catch {
        case ex: java.time.format.DateTimeParseException => {
          // Handle exception if you want
          false
        }
      }
    
    
    
    val session = SparkSession.builder
      .appName("Validate Dataframe")
      .getOrCreate
    
    val df = session. .... //Read from any datasource
    
    import session.implicits._ //implicits provide except() on df  
    
    val validDf = df.filter(validateDf(_))
    val inValidDf = df.except(validDf)
    

提交回复
热议问题