发表新帖

发表新帖

How to read XML files from apache spark framework?

后端未结

关注

 2  708

梦毁少年i 2021-02-08 16:02

I did come across a mini tutorial for data preprocessing using spark here: http://ampcamp.berkeley.edu/big-data-mini-course/featurization.html

However, this discusses on

2条回答

[愿得一人] (楼主)

2021-02-08 16:40
It looks like somebody made an xml datasource for apache-spark.

https://github.com/databricks/spark-xml

This supports to read XML files by specifying tags and infer types e.g.
```
import org.apache.spark.sql.SQLContext

val sqlContext = new SQLContext(sc)
val df = sqlContext.read
    .format("com.databricks.spark.xml")
    .option("rowTag", "book")
    .load("books.xml")
```
You can also use it with spark-shell as below:
```
$ bin/spark-shell --packages com.databricks:spark-xml_2.11:0.3.0
```
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...

热议问题