For a child-parent relationship table (csv), I am trying to gather possible parent to child relationship combination chains using all data in the table. I am trying against
I'm not sure if this is the most efficient way to do it (but reading the file in again on every row would be worse).
find= 'A' #The child for which the code should find all possible parent relationships
sequences = set(find)
# we'll build up a chain for every relationship, then strip out un-needed ones later
with open('testing.csv','r') as f: #testing.csv = child,parent table (above example)
for row in f:
child, parent = row.strip().split(',')
sequences.add(parent + '|' + child)
for c in sequences.copy():
if c[0] == child:
sequences.add(parent + '|' + c)
# remove any that don't end with our child:
sequences = set(s for s in sequences if s.endswith(find))
# get all shorter chains when we have a longer one
extra = set()
for g1 in sequences:
for g2 in sequences:
if g2[2:] == g1:
extra.add(g1)
# remove the shorter chains
sequences.difference_update(extra)
for chain in sequences:
print(chain)
Results:
D|C|A
D|C|B|A
D|B|A
Have you looked at this fantastic essay? It is essential reading to really understand patterns in python. Your problem can be thought of as a graph problem - finding the relationships is basically finding all paths from a child node to the parent node.
Since there could be an arbitrary amount of nesting (child->parent1->parent2...), you need a recursive solution to find all paths. In your code, you have 2 for
loops - which will only result in 3level paths at most as you found out.
The code below was adapted from the link above to fix your issue. The function find_all_paths
requires a graph as an input.
Let's create the graph from your file:
graph = {} # Graph is a dictionary to hold our child-parent relationships.
with open('testing.csv','r') as f:
for row in f:
child, parent = row.split(',')
graph.setdefault(parent, []).append(child)
print graph
with your sample, this should print:
{'C': ['A', 'B'], 'B': ['A'], 'D': ['B', 'C']}
The following code is straight from the essay:
def find_all_paths(graph, start, end, path=[]):
path = path + [start]
if start == end:
return [path]
if not graph.has_key(start):
return []
paths = []
for node in graph[start]:
if node not in path:
newpaths = find_all_paths(graph, node, end, path)
for newpath in newpaths:
paths.append(newpath)
return paths
for path in find_all_paths(graph, 'D', 'A'):
print '|'.join(path)
D|B|A
D|C|A
D|C|B|A