问题
I was trying to create a dataframe object on a hdfs file using spark csv lib as shown in this tutorial.
But when i tried to get the count of DataFrame object , it is showing as 0
Here is my file look like,
employee.csv:
empid,empname
1000,Tom
2000,Jerry
I loaded the above file using,
val empDf = sqlContext.read.format("com.databricks.spark.csv").option("header","true").option("delimiter",",").load("hdfs:///user/.../employee.csv");
When i queried like, empDf object.printSchema() is giving proper schema with empid,empname as string fields and i could see that delimiter was read properly.
But when i tried to display the dataFrame using, empDf.show giving only column header and no data in it and when i do empDf.count giving 0 records.
Please correct me if i missed something to do which is very much required here.
回答1:
Be sure that the spark-csv
version and the Scala version with which your Spark distribution is built are the same.
For example, if your Spark distro is built with Scala 2.10 (the default Scala version for Databricks prebuilt Spark distros), you will need spark-csv_2.10
- version spark-csv_2.11
(shown in the mentioned tutorial) will not work, and will return an empty dataframe with only column names - see my answer to this SO question for a similar case.
来源:https://stackoverflow.com/questions/38846422/dataframe-object-is-not-showing-any-data