I tried a simple example like:
data = sqlContext.read.format(\"csv\").option(\"header\", \"true\").option(\"inferSchema\", \"true\").load(\"/databricks-datasets/
I found the issue: some of the column names contain white spaces before the name itself. So
data = data.select(" timedelta", " shares").map(lambda r: LabeledPoint(r[1], [r[0]])).toDF()
worked. I could catch the white spaces using
assert " " not in ''.join(df.columns)
Now I am thinking of a way to remove the white spaces. Any idea is much appreciated!