Reading data from SQL Server using Spark SQL

前端未结

关注

 2  977

陌清茗 2021-02-04 13:10

Is it possible to read data from Microsoft Sql Server (and oracle, mysql, etc.) into an rdd in a Spark application? Or do we need to create an in memory set and parallize that i

2条回答

终归单人心 (楼主)

2021-02-04 13:42

In Spark 1.4.0+ you can now use sqlContext.read.jdbc

That will give you a DataFrame instead of an RDD of Row objects.

The equivalent to the solution you posted above would be

sqlContext.read.jdbc("jdbc:sqlserver://omnimirror;databaseName=moneycorp;integratedSecurity=true;", "TABLE_NAME", "id", 1, 100000, 1000, new java.util.Properties)

It should pick up the schema of the table, but if you'd like to force it, you can use the schema method after read sqlContext.read.schema(...insert schema here...).jdbc(...rest of the things...)

Note that you won't get an RDD of SomeClass here (which is nicer in my view). Instead you'll get a DataFrame of the relevant fields.

More information can be found here: http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases

0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...