I am reading in a CSV as a Spark DataFrame and performing machine learning operations upon it. I keep getting a Python serialization EOFError - any idea why? I thought it mi
The error appears to happen in the pySpark read_int function. Code for which is as follows from spark site :
def read_int(stream):
length = stream.read(4)
if not length:
raise EOFError
return struct.unpack("!i", length)[0]
This would mean that when reading 4bytes from the stream, if 0 bytes are read, EOF error is raised. The python docs are here.