问题
in pyspark dataframe
let say there is dfA and dfB,
dfA : name , class
dfB : class, time
if dfA.select('class').distinct().count() = n, when n is n < 100 , n > 100000,
when I operating the join for this two cases how should I optimize the join?
来源:https://stackoverflow.com/questions/58026274/pyspark-one-to-many-join-operation