As of Spark 1.5.0 it seems possible to write your own UDAF\'s for custom aggregations on DataFrames: Spark 1.5 DataFrame API Highlights: Date/Time/String Handling, Time Interval
You cannot defined Python UDAF in Spark 1.5.0-2.0.0. There is a JIRA tracking this feature request:
resolved with goal "later" so it probably won't happen anytime soon.
You can use Scala UDAF from PySpark - it is described Spark: How to map Python with Scala or Java User Defined Functions?