edf.select(\"x\").distinct.show()
shows the distinct values that are present in x
column of edf
DataFrame.
Is there an efficient
If you are using Java, the import org.apache.spark.sql.functions.countDistinct;
will give an error :
The import org.apache.spark.sql.functions.countDistinct cannot be resolved
To use the countDistinct
in java, use the below format:
import org.apache.spark.sql.functions.*;
import org.apache.spark.sql.*;
import org.apache.spark.sql.types.*;
df.agg(functions.countDistinct("some_column"));