If you want to simply sort in ascending / descending order there are two pieces you need to make it work:
- RDD.rdd.sortBy function which "sorts (...) RDD by the given
keyfunc
"
knowledge that Python lists
and tuples
are compared lexicographically:
>>> (1, 2) < (3, 4)
True
>>> (5, 6) < (3, 4)
False
>>> ("foo", 1) < ("foo", 2, 5)
True
>>> ("bar", 1, 2) > ("bar", 1)
True
Simply combine these two in something like rdd.sortBy(lambda x: (x[0], x[3]))
and you're good to go.
If you need mixed ordering (descending by some values, ascending by other) on non-numeric values you can either embed this logic inside keyfunc
or convert RDD to a DataFrame and use orderBy
method with desc
:
df.orderBy(desc("foo"), "bar")