Dividing complex rows of dataframe to simple rows in Pyspark

后端 未结 3 1609
终归单人心
终归单人心 2020-11-27 21:01

I have this code:

from pyspark import SparkContext
from pyspark.sql import SQLContext, Row

sc = SparkContext()
sqlContext = SQLContext(sc)
documents = sqlCo         


        
相关标签:
3条回答
  • 2020-11-27 21:03

    Ok, here is what I've come up with. Unfortunately, I had to leave the world of Row objects and enter the world of list objects because I couldn't find a way to append to a Row object.

    That means this method is bit messy. If you can find a way to add a new column to a Row object, then this is NOT the way to go.

    def add_id(row):
        it_list = []
        for i in range(0, len(row[1])):
            sm_list = []
            for j in row[1][i]:
                sm_list.append(j)
            sm_list.append(row[0])
            it_list.append(sm_list)
        return it_list
    
    with_id = documents.flatMap(lambda x: add_id(x))
    
    df = with_id.map(lambda x: Row(id=x[2], title=Row(value=x[0], max_dist=x[1]))).toDF()
    

    When I run df.show(), I get:

    +---+----------------+
    | id|           title|
    +---+----------------+
    |  1|     [cars,1000]|
    |  2|  [horse bus,50]|
    |  2|[normal bus,100]|
    |  3| [Airplane,5000]|
    |  4|   [Bicycles,20]|
    |  4| [Motorbikes,80]|
    |  5|      [Trams,15]|
    +---+----------------+
    
    0 讨论(0)
  • 2020-11-27 21:04

    I am using Spark Dataset API, and following solved the 'explode' requirement for me:

    Dataset<Row> explodedDataset = initialDataset.selectExpr("ID","explode(finished_chunk) as chunks");
    

    Note: The explode method of Dataset API is deprecated in Spark 2.4.5 and the documentation suggests using Select(shown above) or FlatMap.

    0 讨论(0)
  • 2020-11-27 21:19

    Just explode it:

    from pyspark.sql.functions import explode
    
    documents.withColumn("title", explode("title"))
    ## +---+----------------+
    ## | id|           title|
    ## +---+----------------+
    ## |  1|     [1000,cars]|
    ## |  2|  [50,horse bus]|
    ## |  2|[100,normal bus]|
    ## |  3| [5000,Airplane]|
    ## |  4|   [20,Bicycles]|
    ## |  4| [80,Motorbikes]|
    ## |  5|      [15,Trams]|
    ## +---+----------------+
    
    0 讨论(0)
提交回复
热议问题