Getting all descendants of a parent from a pandas dataframe parent child table

前端 未结 2 1579
悲哀的现实
悲哀的现实 2021-01-06 05:28

I have a Pandas dataframe containing parent ids and child ids. I need help building an updated dataframe listing every descendant of each parent.

For clarificati

2条回答
  •  不思量自难忘°
    2021-01-06 06:10

    As long as your IDs never have cycles, I think this should work

        def get_children(id):
            list_of_children = []
    
            def dfs(id):
                child_ids = df[df["parent_id"]==id]["child_id"]
                if child_ids.empty:
                    return 
                for child_id in child_ids:
                    list_of_children.append(child_id)
                    dfs(child_id)
    
            dfs(id)
            return list_of_children
    
        df["list_of_children"] = df["parent_id"].apply(get_children)
        df
    

    Returns:

        parent_id  child_id                                                                 list_of_children
    
    0        3111      4321                                                                           [4321]
    
    1        2010      3102                                                   [3102, 4001, 3011, 4200, 4010]
    
    2        3000      4023                                             [4023, 5321, 5010, 6525, 6100, 6016]
    
    3        1000      2010  [2010, 3102, 4001, 3011, 4200, 4010, 2110, 3000, 4023, 5321, 5010, 6525, 610...
    
    4        4023      5321                                                   [5321, 5010, 6525, 6100, 6016]
    
    5        3011      4200                                                                     [4200, 4010]
    
    6        3033      4113                                                                     [4113, 4311]
    
    7        5010      6525                                                               [6525, 6100, 6016]
    
    8        3011      4010                                                                     [4200, 4010]
    
    9        3102      4001                                                                           [4001]
    
    10       2010      3011                                                   [3102, 4001, 3011, 4200, 4010]
    
    11       4023      5010                                                   [5321, 5010, 6525, 6100, 6016]
    
    12       2110      3000                           [3000, 4023, 5321, 5010, 6525, 6100, 6016, 3111, 4321]
    
    13       2100      3033                                                               [3033, 4113, 4311]
    
    14       1000      2110  [2010, 3102, 4001, 3011, 4200, 4010, 2110, 3000, 4023, 5321, 5010, 6525, 610...
    
    15       5010      6100                                                               [6525, 6100, 6016]
    
    16       2110      3111                           [3000, 4023, 5321, 5010, 6525, 6100, 6016, 3111, 4321]
    
    17       1000      2100  [2010, 3102, 4001, 3011, 4200, 4010, 2110, 3000, 4023, 5321, 5010, 6525, 610...
    
    18       5010      6016                                                               [6525, 6100, 6016]
    
    19       3033      4311                                                                     [4113, 4311]
    

    One problem is that you don't pass the dataframe to the function here, so you need to be careful about what you name it. You could probably improve it by finding a way to implement this function without the inner dfs function relying on a dataframe named df existing.

提交回复
热议问题