How to use window functions in PySpark using DataFrames?

后端未结

关注

 1  1088

借酒劲吻你 2021-01-13 11:12

Trying to figure out how to use window functions in PySpark. Here\'s an example of what I\'d like to be able to do, simply count the number of times a user has an \"event\"

1条回答

囚心锁ツ (楼主)

2021-01-13 11:35
It throws an exception because you pass a list of columns. Signature of DataFrame.select looks as follows
```
df.select(self, *cols)
```
and an expression using a window function is a column like any other so what you need here is something like this:
```
w = Window.partitionBy("id").orderBy("dt") # Just for clarity
df.select("id","dt", count("dt").over(w).alias("count")).show()

## +---+---+-----+
## | id| dt|count|
## +---+---+-----+
## |234|  0|    1|
## |456|  0|    1|
## |456|  1|    2|
## |456|  2|    3|
## |123|  0|    1|
## |123|  1|    2|
## +---+---+-----+
```
Generally speaking Spark SQL window functions behave exactly the same way as in any modern RDBMS.
0 讨论(0)
发布评论:

提交评论
- 加载中...