Options to read large files (pure text, xml, json, csv) from hdfs in RStudio with SparkR 1.5

余生颓废 提交于 2019-12-05 10:54:16

Try

    % hadoop fs -put people.json /
    % sparkR
    > people <- read.df(sqlContext, "/people.json", "json")
    > head(people) 

You probably need a library for parsing other files, like DataBricks CSV library:

https://github.com/databricks/spark-csv

Then you would start R with the package loaded, e.g:

$ sparkR --packages com.databricks:spark-csv_2.10:1.0.3

and load your file like:

> df <- read.df(sqlContext, "cars.csv", source = "com.databricks.spark.csv", inferSchema = "true")

This assumes you have the "cars.csv" test file in your hdfs home directory.

hth

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!