pyspark read format jdbc generates ORA-00903: invalid table name Error

前端 未结 1 1442
被撕碎了的回忆
被撕碎了的回忆 2021-01-23 23:41

With a pysqpark running on a remote server, I am able to connect to an Oracle database on another server with jdbc, but any valid query I run returns a ORA-00903: invalid

1条回答
  •  失恋的感觉
    2021-01-24 00:16

    From the documentation for dbtable:

    The JDBC table that should be read from or written into. Note that when using it in the read path anything that is valid in a FROM clause of a SQL query can be used. For example, instead of a full table you could also use a subquery in parentheses.

    So in your examples you could do:

    dbtable = '(SELECT owner, table_name FROM ALL_TABLES)'
    

    optionally with an alias:

    dbtable = '(SELECT owner, table_name FROM ALL_TABLES) t'
    

    As an alternative you could use query instead of (not as well as) dbtable:

    A query that will be used to read data into Spark. The specified query will be parenthesized and used as a subquery in the FROM clause. Spark will also assign an alias to the subquery clause.

    ... so effectively the same thing, but might make your code more understandable (entirely subjective, of course), i.e. something like:

    query = 'SELECT owner, table_name FROM ALL_TABLES'
    

    and then:

      jdbc_df = spark.read.format("jdbc").option("url", url) \
                                         .option("query", query) \
                                         .option("driver", driver) \
                                         ...
    

    0 讨论(0)
提交回复
热议问题