How to set jdbc/partitionColumn type to Date in spark 2.4.1

后端 未结 4 1908
清歌不尽
清歌不尽 2021-01-05 20:14

I am trying to retrieve data from oracle using spark-sql-2.4.1 version. I tried to set the JdbcOptions as below :

    .option(\"lowerBound\", \"31-MAR-02\");         


        
相关标签:
4条回答
  • 2021-01-05 20:20

    If you are using Oracle, see https://github.com/apache/spark/blob/master/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala#L441

    val df1 = spark.read.format("jdbc")
          .option("url", jdbcUrl)
          .option("dbtable", "datetimePartitionTest")
          .option("partitionColumn", "d")
          .option("lowerBound", "2018-07-06")
          .option("upperBound", "2018-07-20")
          .option("numPartitions", 3)
          // oracle.jdbc.mapDateToTimestamp defaults to true. If this flag is not disabled, column d
          // (Oracle DATE) will be resolved as Catalyst Timestamp, which will fail bound evaluation of
          // the partition column. E.g. 2018-07-06 cannot be evaluated as Timestamp, and the error
          // message says: Timestamp format must be yyyy-mm-dd hh:mm:ss[.fffffffff].
          .option("oracle.jdbc.mapDateToTimestamp", "false")
          .option("sessionInitStatement", "ALTER SESSION SET NLS_DATE_FORMAT = 'YYYY-MM-DD'")
          .load()
    
    0 讨论(0)
  • 2021-01-05 20:21

    I stumbled on this question as I am solving a similar problem. But in this case Spark 2.4.2 is sending date in format 'yyyy-MM-dd HH:mm:ss.ssss' to Oracle and it returned "Not a valid month" as it expects 'dd-MMM-yy HH:mm:ss.ssss'. To solve that I followed: Spark GitHub Link , it says:

    Override beforeFetch method in OracleDialect to finish the following two things:

    Set Oracle's NLS_TIMESTAMP_FORMAT to "YYYY-MM-DD HH24:MI:SS.FF" to match java.sql.Timestamp format. Set Oracle's NLS_DATE_FORMAT to "YYYY-MM-DD" to match java.sql.Date format.

    And it solved the issue. Hope it helps.

    0 讨论(0)
  • 2021-01-05 20:26

    The given parameters have type timestamp, but you're providing the only date. Timestamp has format as yyyy-mm-dd hh:mm:ss, so you need to provide your dates as 2002-03-31 00:00:00 and 2019-05-01 23:59:59 correspondingly...

    0 讨论(0)
  • 2021-01-05 20:35

    all of the following options must be set in this way in order for it to work:

    spark.read
          .option("header", true)
          .option("inferSchema", true)
          .option("timestampFormat", "MM/dd/yyyy h:mm:ss a")
          .csv("PATH_TO_CSV")
    
    0 讨论(0)
提交回复
热议问题