With a pysqpark running on a remote server, I am able to connect to an Oracle database on another server with jdbc, but any valid query I run returns a ORA-00903: invalid
From the documentation for dbtable
:
The JDBC table that should be read from or written into. Note that when using it in the read path anything that is valid in a
FROM
clause of a SQL query can be used. For example, instead of a full table you could also use a subquery in parentheses.
So in your examples you could do:
dbtable = '(SELECT owner, table_name FROM ALL_TABLES)'
optionally with an alias:
dbtable = '(SELECT owner, table_name FROM ALL_TABLES) t'
As an alternative you could use query
instead of (not as well as) dbtable
:
A query that will be used to read data into Spark. The specified query will be parenthesized and used as a subquery in the
FROM
clause. Spark will also assign an alias to the subquery clause.
... so effectively the same thing, but might make your code more understandable (entirely subjective, of course), i.e. something like:
query = 'SELECT owner, table_name FROM ALL_TABLES'
and then:
jdbc_df = spark.read.format("jdbc").option("url", url) \
.option("query", query) \
.option("driver", driver) \
...