We have a very standard Spark job which reads log files from s3 and then does some processing over them. Very basic Spark stuff...
val logs = sc.textFile(somePat
You could make the parser return an Option[Value] instead of a Value. That way you could use flatMap to map the lines to rows and remove those that were invalid.
In rough lines something like this:
def parseLog(line:String):Option[Array[String]] = {
val splitted = log.split("\t")
if (validate(splitted)) Some(splitted) else None
}
val validRows = logs.flatMap(OurRowObject.parseLog(_))