pySpark using mapInPandas instead of rdd.mapPartitions - is it equivalent

后端未结

关注

 0  1840

I have code that need to run on each "id" where multiple of those can appear in a stream batch, and where the stream is partitioned by the id where the stream contains