pyarrow version 1.0 bug throws Out Of Memory exception while reading large number of files using ParquetDataset (works fine with version 0.13)

前端未结

关注

 0  1113

I have a dataframe split and stored in more than 5000 files. I use ParquetDataset(fnames).read() to load all files. I updated the pyarrow to latest version 1.0.1 from 0.13.0 and