Python: Combinations of parent-child hierarchy

前端 未结 2 561
不知归路
不知归路 2021-01-06 19:25

For a child-parent relationship table (csv), I am trying to gather possible parent to child relationship combination chains using all data in the table. I am trying against

相关标签:
2条回答
  • 2021-01-06 19:37

    I'm not sure if this is the most efficient way to do it (but reading the file in again on every row would be worse).

    find= 'A' #The child for which the code should find all possible parent relationships
    sequences = set(find)
    
    # we'll build up a chain for every relationship, then strip out un-needed ones later
    with open('testing.csv','r') as f:     #testing.csv = child,parent table (above example)
        for row in f:
            child, parent = row.strip().split(',')
            sequences.add(parent + '|' + child)
            for c in sequences.copy():  
                if c[0] == child:
                    sequences.add(parent + '|' + c)
    
    
    # remove any that don't end with our child:
    sequences = set(s for s in sequences if s.endswith(find))
    
    # get all shorter chains when we have a longer one
    extra = set()
    for g1 in sequences:
        for g2 in sequences:
            if g2[2:] == g1:
                extra.add(g1)
    
    # remove the shorter chains
    sequences.difference_update(extra)
    
    for chain in sequences:
        print(chain)
    

    Results:

    D|C|A
    D|C|B|A
    D|B|A
    
    0 讨论(0)
  • 2021-01-06 19:58

    Have you looked at this fantastic essay? It is essential reading to really understand patterns in python. Your problem can be thought of as a graph problem - finding the relationships is basically finding all paths from a child node to the parent node.

    Since there could be an arbitrary amount of nesting (child->parent1->parent2...), you need a recursive solution to find all paths. In your code, you have 2 for loops - which will only result in 3level paths at most as you found out.

    The code below was adapted from the link above to fix your issue. The function find_all_paths requires a graph as an input.

    Let's create the graph from your file:

    graph = {} # Graph is a dictionary to hold our child-parent relationships.
    with open('testing.csv','r') as f:
        for row in f:
            child, parent = row.split(',')
            graph.setdefault(parent, []).append(child)
    
    print graph
    

    with your sample, this should print:

    {'C': ['A', 'B'], 'B': ['A'], 'D': ['B', 'C']}
    

    The following code is straight from the essay:

    def find_all_paths(graph, start, end, path=[]):
        path = path + [start]
        if start == end:
            return [path]
    
        if not graph.has_key(start):
            return []
    
        paths = []
    
        for node in graph[start]:
            if node not in path:
                newpaths = find_all_paths(graph, node, end, path)
                for newpath in newpaths:
                    paths.append(newpath)
        return paths
    
    for path in find_all_paths(graph, 'D', 'A'):
        print '|'.join(path)
    

    Output:

    D|B|A
    D|C|A
    D|C|B|A
    
    0 讨论(0)
提交回复
热议问题