Find all permutations of values in Spark RDD; python

问题

I have a spark RDD (myData) that has been mapped as a list. The output of myData.collect() yields the following:

['x', 'y', 'z']

What operation can I perform on myData to map to or create a new RDD containing a list of all permutations of xyz? For example newData.collect() would output:

['xyz', 'xzy', 'zxy', 'zyx', 'yxz', 'yzx']

I've tried using variations of cartesian(myData), but as far as I can tell, the best that gives is different combinations of two-value pairs.

回答1:

Doing this all in pyspark. You can use rdd.cartesian but you have filter out repeats and do it twice (not saying this is good!!!):

 >>> rdd1 = rdd.cartesian(rdd).filter(lambda x: x[1] not in x[0]).map(lambda x: ''.join(x))
 >>> rdd1.collect()
 ['xy', 'xz', 'yx', 'yz', 'zx', 'zy']
 >>> rdd2 = rdd1.cartesian(rdd).filter(lambda x: x[1] not in x[0]).map(lambda x: ''.join(x))
 >>> rdd2.collect()
 ['xyz', 'xzy', 'yxz', 'yzx', 'zxy', 'zyx']

回答2:

>>> from itertools import permutations
>>> t = ['x', 'y', 'z']
>>> ["".join(item) for item in permutations(t)]

['xyz', 'xzy', 'yxz', 'yzx', 'zxy', 'zyx']

Note: RDD object can be converted to iterables using toLocalIterator

来源：https://stackoverflow.com/questions/43703046/find-all-permutations-of-values-in-spark-rdd-python

标签

python

list

apache-spark

pyspark

permutation

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!