问题
I have a pandas DF where each column represent a node and two columns an edge, as following:
import pandas as pd
df = pd.DataFrame({'node1': ['2', '4','17', '17', '205', '208'],
'node2': ['4', '13', '25', '38', '208', '300']})
All Nodes are Undirected, i.e. you can get from one to the other undirected_graph
I would like to group them into all connected groupes (Connectivity), as following:
df = pd.DataFrame({'node1': ['2', '4','17', '17', '205', '208'],
'node2': ['4', '13', '25', '38', '208', '300']
,'desired_group': ['1', '1', '2', '2', '3', '3']})
For example, the reason why the first two rows were grouped, is because its possible to get from node 2 to node 13 (through 4).
The closest question that i managed to find is this one: pandas - reshape dataframe to edge list according to column values but to my understanding, its a different question.
Any help on this would be great, thanks in advance.
回答1:
Using networkx
connected_components
import networkx as nx
G=nx.from_pandas_edgelist(df, 'node1', 'node2')
l=list(nx.connected_components(G))
L=[dict.fromkeys(y,x) for x, y in enumerate(l)]
d={k: v for d in L for k, v in d.items()}
#df['New']=df.node1.map(d)
df.node1.map(d)
0 0
1 0
2 1
3 1
4 2
5 2
Name: node1, dtype: int64
回答2:
If for some reason you could not use an external library, you could implement the algorithms:
import pandas as pd
def bfs(graph, start):
visited, queue = set(), [start]
while queue:
vertex = queue.pop(0)
if vertex not in visited:
visited.add(vertex)
queue.extend(graph[vertex] - visited)
return visited
def connected_components(G):
seen = set()
for v in G:
if v not in seen:
c = set(bfs(G, v))
yield c
seen.update(c)
def graph(edge_list):
result = {}
for source, target in edge_list:
result.setdefault(source, set()).add(target)
result.setdefault(target, set()).add(source)
return result
df = pd.DataFrame({'node1': ['2', '4', '17', '17', '205', '208'],
'node2': ['4', '13', '25', '38', '208', '300']})
G = graph(df[['node1', 'node2']].values)
components = connected_components(G)
lookup = {i: component for i, component in enumerate(components, 1)}
df['group'] = [label for node in df.node1 for label, component in lookup.items() if node in component]
print(df)
Output
node1 node2 group
0 2 4 1
1 4 13 1
2 17 25 3
3 17 38 3
4 205 208 2
5 208 300 2
来源:https://stackoverflow.com/questions/53573865/group-connected-graphs-in-pandas-df