问题
I have a spark RDD (myData) that has been mapped as a list. The output of myData.collect() yields the following:
['x', 'y', 'z']
What operation can I perform on myData to map to or create a new RDD containing a list of all permutations of xyz? For example newData.collect() would output:
['xyz', 'xzy', 'zxy', 'zyx', 'yxz', 'yzx']
I've tried using variations of cartesian(myData), but as far as I can tell, the best that gives is different combinations of two-value pairs.
回答1:
Doing this all in pyspark
. You can use rdd.cartesian
but you have filter out repeats and do it twice (not saying this is good!!!):
>>> rdd1 = rdd.cartesian(rdd).filter(lambda x: x[1] not in x[0]).map(lambda x: ''.join(x))
>>> rdd1.collect()
['xy', 'xz', 'yx', 'yz', 'zx', 'zy']
>>> rdd2 = rdd1.cartesian(rdd).filter(lambda x: x[1] not in x[0]).map(lambda x: ''.join(x))
>>> rdd2.collect()
['xyz', 'xzy', 'yxz', 'yzx', 'zxy', 'zyx']
回答2:
>>> from itertools import permutations
>>> t = ['x', 'y', 'z']
>>> ["".join(item) for item in permutations(t)]
['xyz', 'xzy', 'yxz', 'yzx', 'zxy', 'zyx']
Note:
RDD object
can be converted to iterables using toLocalIterator
来源:https://stackoverflow.com/questions/43703046/find-all-permutations-of-values-in-spark-rdd-python