Convert Sparse Vector to Dense Vector in Pyspark

匿名 (未验证) 提交于 2019-12-03 01:22:02

问题:

I have a sparse vector like this

>>> countVectors.rdd.map(lambda vector: vector[1]).collect() [SparseVector(13, {0: 1.0, 2: 1.0, 3: 1.0, 6: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 12: 1.0}), SparseVector(13, {0: 1.0, 1: 1.0, 2: 1.0, 4: 1.0}), SparseVector(13, {0: 1.0, 1: 1.0, 3: 1.0, 4: 1.0, 7: 1.0}), SparseVector(13, {1: 1.0, 2: 1.0, 5: 1.0, 11: 1.0})] 

I am trying to convert this into dense vector in pyspark 2.0.0 like this

>>> frequencyVectors = countVectors.rdd.map(lambda vector: vector[1]) >>> frequencyVectors.map(lambda vector: Vectors.dense(vector)).collect() 

I am getting an error like this:

16/12/26 14:03:35 ERROR Executor: Exception in task 0.0 in stage 13.0 (TID 13) org.apache.spark.api.python.PythonException: Traceback (most recent call last):   File "/opt/BIG-DATA/spark-2.0.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/worker.py", line 172, in main     process()   File "/opt/BIG-DATA/spark-2.0.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/worker.py", line 167, in process     serializer.dump_stream(func(split_index, iterator), outfile)   File "/opt/BIG-DATA/spark-2.0.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream     vs = list(itertools.islice(iterator, batch))   File "<stdin>", line 1, in <lambda>   File "/opt/BIG-DATA/spark-2.0.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/mllib/linalg/__init__.py", line 878, in dense     return DenseVector(elements)   File "/opt/BIG-DATA/spark-2.0.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/mllib/linalg/__init__.py", line 286, in __init__     ar = np.array(ar, dtype=np.float64)   File "/opt/BIG-DATA/spark-2.0.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/ml/linalg/__init__.py", line 701, in __getitem__     raise ValueError("Index %d out of bounds." % index) ValueError: Index 13 out of bounds. 

How can I achieve this conversion? Is there anything wrong here?

回答1:

This resolved my issue

frequencyDenseVectors = frequencyVectors.map(lambda vector: DenseVector(vector.toArray())) 


易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!