pyspark one to many join operation

为君一笑 提交于 2019-12-13 03:18:18

问题


in pyspark dataframe
let say there is dfA and dfB,

dfA : name , class
dfB : class, time

if dfA.select('class').distinct().count() = n, when n is n < 100 , n > 100000,

when I operating the join for this two cases how should I optimize the join?

来源:https://stackoverflow.com/questions/58026274/pyspark-one-to-many-join-operation

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!