I am using Spark SQL to clean some data read from a CSV file.
The sample data looks like this:
1000616411022471|1000616415711839|0.10||ksetohouv|2