I did come across a mini tutorial for data preprocessing using spark here: http://ampcamp.berkeley.edu/big-data-mini-course/featurization.html
However, this discusses on
Look at this link.
Databrics provides spark-xml library for processing xml data through spark.
Thanks.