You can just groupBy
title
and then count
:
import pyspark.sql.functions as f
df.groupBy('title').agg(f.count('*').alias('count')).show()
+-----+-----+
|title|count|
+-----+-----+
| A| 2|
| c| 3|
| b| 1|
+-----+-----+
Or more concisely:
df.groupBy('title').count().show()
+-----+-----+
|title|count|
+-----+-----+
| A| 2|
| c| 3|
| b| 1|
+-----+-----+