Why do I get different outputs for ..agg(countDistinct(\"member_id\") as \"count\") and ..distinct.count? Is the difference the same as between
..agg(countDistinct(\"member_id\") as \"count\")
..distinct.count
1st command :
DF.agg(countDistinct("member_id") as "count")
return the same as that of select count distinct(member_id) from DF.
select count distinct(member_id) from DF
2nd command :
DF.distinct.count
is actually getting distinct records or removing al duplicates from the DF and then taking the count.