Reading data from SQL Server using Spark SQL

前端未结

关注

 2  975

陌清茗 2021-02-04 13:10

Is it possible to read data from Microsoft Sql Server (and oracle, mysql, etc.) into an rdd in a Spark application? Or do we need to create an in memory set and parallize that i

2条回答

予麋鹿 (楼主)

2021-02-04 14:06
Found a solution to this from the mailing list. JdbcRDD can be used to accomplish this. I needed to get the MS Sql Server JDBC driver jar and add it to the lib for my project. I wanted to use integrated security, and so needed to put sqljdbc_auth.dll (available in the same download) in a location that java.library.path can see. Then, the code looks like this:
```
     val rdd = new JdbcRDD[Email](sc,
          () => {DriverManager.getConnection(
 "jdbc:sqlserver://omnimirror;databaseName=moneycorp;integratedSecurity=true;")},
          "SELECT * FROM TABLE_NAME Where ? < X and X < ?",
            1, 100000, 1000,
          (r:ResultSet) => { SomeClass(r.getString("Col1"), 
            r.getString("Col2"), r.getString("Col3")) } )
```
This gives an Rdd of SomeClass.The second, third and fourth parameters are required and are for lower and upper bounds, and number of partitions. In other words, that source data needs to be partitionable by longs for this to work.
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...