PySpark serialization EOFError

后端 未结 3 1046
谎友^
谎友^ 2021-01-01 09:56

I am reading in a CSV as a Spark DataFrame and performing machine learning operations upon it. I keep getting a Python serialization EOFError - any idea why? I thought it mi

3条回答
  •  囚心锁ツ
    2021-01-01 10:07

    The error appears to happen in the pySpark read_int function. Code for which is as follows from spark site :

    def read_int(stream):
    length = stream.read(4)
    if not length:
        raise EOFError
    return struct.unpack("!i", length)[0]
    

    This would mean that when reading 4bytes from the stream, if 0 bytes are read, EOF error is raised. The python docs are here.

提交回复
热议问题