问题
I have a pandas dataframe which has results of record similarity. For example, rowid 123 is similar to rowid 512 and rowid 123 is similar to 681. Technically, all three rows are similar. How can I group similar rows?
Note that my data has combinations - Example (123,512) and (512,123)
import pandas as pd
df = pd.DataFrame({'A': [123,123,512,412,412,536], 'B': [512,681,123,536,919,412]})
df
A B
123 512
123 681
512 123
412 536
412 919
536 412
Expected Output
Group1 123
Group1 512
Group1 681
Group2 412
Group2 536
Group2 919
回答1:
You could use networkx
to determine connected groups.
In [750]: import networkx as nx
In [751]: G = nx.from_pandas_dataframe(df, 'A', 'B') # Create the graph
In [752]: Gcc = nx.connected_components(G)
In [753]: pd.DataFrame([{'id': i, 'group': 'group%s' % (g+1)}
...: for g, ids in enumerate(Gcc) for i in ids])
Out[753]:
group id
0 group1 512
1 group1 681
2 group1 123
3 group2 536
4 group2 412
5 group2 919
来源:https://stackoverflow.com/questions/45086731/how-to-group-a-pandas-dataframe-which-has-a-list-of-combinations