Grouping all connected nodes of a dataset

前端 未结 1 996
野趣味
野趣味 2021-01-27 01:04

This is not a duplicate of:

Fastest way to perform complex search on pandas dataframe

Note: pandas ver 0.23.4

Assumptions: data can be laid out in any o

相关标签:
1条回答
  • 2021-01-27 01:29

    You could define a graph using the values from both columns as edges, and look for the connected_components. Here's a way using NetworkX:

    import networkx as nx
    
    G=nx.Graph()
    G.add_edges_from(df.values.tolist())
    cc = list(nx.connected_components(G))
    # [{'A', 'B', 'C', 'D'}, {'L', 'M', 'N', 'O'}]
    

    Now say for instance you want to filter by D, you could then do:

    component = next(i for i in cc if 'B' in i)
    # {'A', 'B', 'C', 'D'}
    

    And index the dataframe where the values from both columns are in component:

    df[df.isin(component).all(1)]
    
       Col1 Col2
    0    A    B
    1    B    C
    2    D    C
    

    The above can be extended to all items in the list, by generating a list of dataframes. Then we simply have to index using the position in which a given item is present in L:

    L = ['A', 'B', 'C', 'D', 'L', 'M', 'N', 'O']
    
    dfs = [df[df.isin(i).all(1)] for j in L for i in cc if j in i]
    print(dfs[L.index('D')])
    
       Col1 Col2
    0    A    B
    1    B    C
    2    D    C
    
    print(dfs[L.index('L')])
    
       Col1 Col2
    3    L    M
    4    M    N
    5    N    O
    
    0 讨论(0)
提交回复
热议问题