pandasUDF and pyarrow 0.15.0

前端 未结 1 1367
無奈伤痛
無奈伤痛 2021-01-17 11:22

I have recently started getting a bunch of errors on a number of pyspark jobs running on EMR clusters. The erros are

java.lang.IllegalArgumentE         


        
相关标签:
1条回答
  • 2021-01-17 11:46

    It's not a bug. We made an important protocol change in 0.15.0 that makes the default behavior of pyarrow incompatible with older versions of Arrow in Java -- your Spark environment seems to be using an older version.

    Your options are

    • Set the environment variable ARROW_PRE_0_15_IPC_FORMAT=1 from where you are using Python
    • Downgrade to pyarrow < 0.15.0 for now.

    Hopefully the Spark community will be able to upgrade to 0.15.0 in Java soon so this issue goes away.

    This is discussed in http://arrow.apache.org/blog/2019/10/06/0.15.0-release/

    0 讨论(0)
提交回复
热议问题