Convert StringType to ArrayType in PySpark

前端 未结 2 1319
一生所求
一生所求 2021-01-22 11:02

I am trying to Run the FPGrowth algorithm in PySpark on my Dataset.

from pyspark.ml.fpm import FPGrowth

fpGrowth = FPGrowth(itemsCol=\"name\", minSupport=0.5,mi         


        
2条回答
  •  心在旅途
    2021-01-22 11:59

    Based on your previous question, it seems as though you are building rdd2 incorrectly.

    Try this:

    rd2 = rd.map(lambda x: (x[1], x[0][0] , x[0][1].split(",")))
    rd3 = rd2.map(lambda p:Row(id=int(p[0]), name=p[2], actor=str(p[1])))
    

    The change is that we call str.split(",") on x[0][1] so that it will convert a string like 'a,b' to a list: ['a', 'b'].

提交回复
热议问题