Apache Spark OutOfMemoryError (HeapSpace)

前端未结

关注

 0  611

I have a dataset with ~5M rows x 20 columns, containing a groupID and a rowID. My goal is to check whether (some) columns contain more than a fixed fraction (say, 50%) of mi