scala - How to count the number of occurrences of each distinct element in a column of a spark dataframe -


suppose have dataframe in following format:

-------------------------------    col1    |  col2    | col3 ------------------------------- value11    | value21  | value31 value12    | value22  | value32 value11    | value22  | value33 value12    | value21  | value33 

here, column col1 has value11, value12 distinct value. want total number of occurrences of each distinct value value11, value12 of column col1.

you can groupby col1, count:

import org.apache.spark.sql.functions.count  df.groupby("col1").agg(count("col1")).show +-------+-----------+ |   col1|count(col1)| +-------+-----------+ |value12|          2| |value11|          2| +-------+-----------+ 

in case want know how many distinct values there in col1, can use countdistinct:

import org.apache.spark.sql.functions.countdistinct  df.agg(countdistinct("col1").as("n_distinct")).show +----------+ |n_distinct| +----------+ |         2| +----------+ 

Comments