CLOSED!! How i can detect the type from a string in Scala?

空扰寡人 提交于 2019-12-11 04:04:21

问题


I'm trying to parse the csv files and I need to determine the type of each field starting from its string value.

for examples:

val row: Array[String] = Array("1/1/06 0:00","3108 OCCIDENTAL DR","3","3C","1115")

this is what I would get:

row(0) --> Date
row(1) --> String
row(2) --> Int
Ecc....

how can I do?

------------------------------------ SOLUTION ------------------------------------

This is the solution I've found to recognize the fields String, Date, Int, Double and Boolean. I hope that someone can serve in the future.

  def typeDetection(x: String): String = {
    x match {
      // Matches: [12], [-22], [0] Non-Matches: [2.2], [3F]
      case int if int.matches("^-?[0-9]+$") => "Int"
      // Matches: [2,2], [-2.3], [0.2232323232332] Non-Matches: [.2], [,2], [2.2.2]
      case double if double.matches("^-?[0-9]+(,|.)[0-9]+$") => "Double"
        // Matches: [29/02/2004 20:15:27], [29/2/04 8:9:5], [31/3/2004 9:20:17] Non-Matches: [29/02/2003 20:15:15], [2/29/04 20:15:15], [31/3/4 9:20:17]
      case d1 if d1.matches("^((((31\\/(0?[13578]|1[02]))|((29|30)\\/(0?[1,3-9]|1[0-2])))\\/(1[6-9]|[2-9]\\d)?\\d{2})|(29\\/0?2\\/(((1[6-9]|[2-9]\\d)?(0[48]|[2468][048]|[13579][26])|((16|[2468][048]|[3579][26])00))))|(0?[1-9]|1\\d|2[0-8])\\/((0?[1-9])|(1[0-2]))\\/((1[6-9]|[2-9]\\d)?\\d{2})) *(?:(?:([01]?\\d|2[0-3])(\\-|:|\\.))?([0-5]?\\d)(\\-|:|\\.))?([0-5]?\\d)")
        => "Date"
        // Matches: [01.1.02], [11-30-2001], [2/29/2000] Non-Matches: [02/29/01], [13/01/2002], [11/00/02]
      case d2 if d2.matches("^(?:(?:(?:0?[13578]|1[02])(\\/|-|\\.)31)\\1|(?:(?:0?[1,3-9]|1[0-2])(\\/|-|\\.)(?:29|30)\\2))(?:(?:1[6-9]|[2-9]\\d)?\\d{2})$|^(?:0?2(\\/|-|\\.)29\\3(?:(?:(?:1[6-9]|[2-9]\\d)?(?:0[48]|[2468][048]|[13579][26])|(?:(?:16|[2468][048]|[3579][26])00))))$|^(?:(?:0?[1-9])|(?:1[0-2]))(\\/|-|\\.)(?:0?[1-9]|1\\d|2[0-8])\\4(?:(?:1[6-9]|[2-9]\\d)?\\d{2})$")
        => "Date"
        // Matches: [12/01/2002], [12/01/2002 12:32:10] Non-Matches: [32/12/2002], [12/13/2001], [12/02/06]
      case d3 if d3.matches("^(([0-2]\\d|[3][0-1])(\\/|-|\\.)([0]\\d|[1][0-2])(\\/|-|\\.)[2][0]\\d{2})$|^(([0-2]\\d|[3][0-1])(\\/|-|\\.)([0]\\d|[1][0-2])(\\/|-|\\.)[2][0]\\d{2}\\s([0-1]\\d|[2][0-3])\\:[0-5]\\d\\:[0-5]\\d)$")
        => "Date"
      case boolean if boolean.equalsIgnoreCase("true") || boolean.equalsIgnoreCase("false") => "Boolean"
      case _ => "String"
    }

}


回答1:


val row: Array[String] = Array("1/1/06 0:00","3108 OCCIDENTAL DR","3","3C","1115")

val types: Array[String] = row.map(x => x match {
  case string if string.contains("/") => "Date probably"
  case string if string.matches("[0-9]+") => "Int probably"
  case _ => "String probably"
})


types.foreach( x => println(x))

Outputs:

Date probably
String probably
Int probably
String probably
Int probably

But in all honesty I wouldn't use this approach, this is so error prone and there are so many things that could go wrong that I don't even want to think about it, the simplest example is what if a string contains a /, this small piece of code would match that as a Date.

I don't know your use-case but in my experience it's always a bad idea to create something that tries to guess types form unsecure data, if you have control over it you could introduce some identifier, for example "1/1/06 0:00 %d%" where %d% would indicate a date and so on and then remove it from the string, and even then you'll never be 100% sure that this won't fail.




回答2:


For each string: try parsing it into the type you want. You'll have to write a function for each type. Keep trying in order until one of them works, order is important. You can use your favorite Date/Time library.

  import java.util.Date
  def stringdetect (s : String) = {
    dateFromString(s) orElse intFromString(s) getOrElse s
  }

  def arrayDetect(row : Array[String]) = row map stringdetect

  def arrayTypes(row : Array[String]) = {
    arrayDetect(row) map { _ match {
      case x:Int => "Int"
      case x:Date => "Date"
      case x:String => "String"
      case _ => "?"
    } }
  }      

  def intFromString(s : String): Option[Int] = {
    try {
      Some(s.toInt)
    } catch {
      case _ : Throwable => None
    }
  }

  def dateFromString(s : String): Option[Date] = {
    try {
      val formatter = new java.text.SimpleDateFormat("d/M/yy h:mm")
      formatter.format(new java.util.Date)
      Some(formatter.parse(s))
    } catch {
      case _ : Throwable => None
    }
  }

From the REPL / worksheet:

  val row: Array[String] = Array("1/1/06 0:00","3108 OCCIDENTAL DR","3","3C","1115")
        //> row  : Array[String] = Array(1/1/06 0:00, 3108 OCCIDENTAL DR, 3, 3C, 1115)
  arrayDetect(row)
        //> res0: Array[Any] = Array(Sun Jan 01 00:00:00 CST 2006, 3108 OCCIDENTAL DR, 3 , 3C, 1115)
  arrayTypeDisplay(row)
        //> res1: Array[String] = Array(Date, String, Int, String, Int)


来源:https://stackoverflow.com/questions/23656672/closed-how-i-can-detect-the-type-from-a-string-in-scala

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!