How can I convert a two column array to a matrix with counts of occurences?

前端 未结 5 735
礼貌的吻别
礼貌的吻别 2021-02-02 08:21

I have the following numpy array:

import numpy as np

pair_array = np.array([(205, 254), (205, 382), (254, 382), (18, 69), (205, 382), 
                       (31         


        
相关标签:
5条回答
  • 2021-02-02 08:56

    One way could be to build a graph using NetworkX and obtain the adjacency matrix directly as a dataframe with nx.to_pandas_adjacency. To account for the co-occurrences of the edges in the graph, we can create a nx.MultiGraph, which allows for multiple edges connecting the same pair of nodes:

    import networkx as nx
    
    G = nx.from_edgelist(pair_array, create_using=nx.MultiGraph)
    nx.to_pandas_adjacency(G, nodelist=sorted(G.nodes()), dtype='int')
    
          18   31   69   183  205  254  267  382
    18     0    0    1    0    0    0    0    0
    31     0    0    0    1    0    0    1    1
    69     1    0    0    0    0    0    0    0
    183    0    1    0    0    0    0    1    1
    205    0    0    0    0    0    1    0    2
    254    0    0    0    0    1    0    0    1
    267    0    1    0    1    0    0    0    0
    382    0    1    0    1    2    1    0    0
    

    Building a NetworkX graph, will also enable to create an adjacency matrix or another depending on the behaviour we expect. We can either create it using a:

    • nx.Graph: If we want to set to 1 both entries (x,y) and (y,x) for a (x,y) (or (y,x)) edge. This will hence produce a symmetric adjacency matrix
    • nx.DiGraph: If (x,y) should only set the (x,y) the entry to 1
    • nx.MultiGraph: For the same behaviour as a nx.Graph but accounting for edge co-occurrences
    • nx.MultiDiGraph: For the same behaviour as a nx.DiGraph but also accounting for edge co-occurrences
    0 讨论(0)
  • 2021-02-02 09:00

    If you are okay to add pandas as a dependency you can use this implementation

    >>> import pandas as pd
    >>> df = pd.DataFrame(pair_array)
    >>> pd.crosstab(df[0], df[1])
    1    69   183  254  267  382
    0
    18     1    0    0    0    0
    31     0    1    0    1    1
    183    0    0    0    1    1
    205    0    0    1    0    2
    254    0    0    0    0    1
    
    0 讨论(0)
  • 2021-02-02 09:01

    One way of doing it is appending the pair_array with pair_array reversed at axis 1 which can be done using [::-1]. And to append use np.vstack/np.r_/np.concatenate.

    Now use pd.crosstab to perform cross tabulation.

    all_vals = np.r_[pair_array, pair_array[:, ::-1]]
    pd.crosstab(all_vals[:, 0], all_vals[:, 1])
    
    col_0  18   31   69   183  205  254  267  382
    row_0                                        
    18       0    0    1    0    0    0    0    0
    31       0    0    0    1    0    0    1    1
    69       1    0    0    0    0    0    0    0
    183      0    1    0    0    0    0    1    1
    205      0    0    0    0    0    1    0    2
    254      0    0    0    0    1    0    0    1
    267      0    1    0    1    0    0    0    0
    382      0    1    0    1    2    1    0    0
    

    As @QuangHoang pointed when there are identical pairs occurring more than one time i.e [(18, 18), (18, 18), ...], then use

    rev = pair_array[:, ::-1]
    m = (pair_array == rev)
    rev = rev[~np.all(m, axis=1)]
    all_vals = np.r_[pair_arr, rev]
    
    0 讨论(0)
  • 2021-02-02 09:08

    You could create a data frame of the appropriate size with zeros beforehand and just increment the appropriate cells by looping over the pairs:

    import numpy as np
    import pandas as pd
    
    pair_array = np.array([(205, 254), (205, 382), (254, 382), (18, 69), (205, 382),
                           (31, 183), (31, 267), (31, 82), (183, 267), (183, 382)])
    
    vals = sorted(set(pair_array.flatten()))
    n = len(vals)
    
    df = pd.DataFrame(np.zeros((n, n), dtype=np.int), columns=vals, index=vals)
    
    for r, c in pair_array:
        df.at[r, c] += 1
        df.at[c, r] += 1
    
    print(df)
    

    Output:

         18   31   69   82   183  205  254  267  382
    18     0    0    1    0    0    0    0    0    0
    31     0    0    0    1    1    0    0    1    0
    69     1    0    0    0    0    0    0    0    0
    82     0    1    0    0    0    0    0    0    0
    183    0    1    0    0    0    0    0    1    1
    205    0    0    0    0    0    0    1    0    2
    254    0    0    0    0    0    1    0    0    1
    267    0    1    0    0    1    0    0    0    0
    382    0    0    0    0    1    2    1    0    0
    
    0 讨论(0)
  • 2021-02-02 09:19

    This is crosstab:

    pd.crosstab(pair_array[:,0], pair_array[:,1])
    

    Output:

    col_0  69   82   183  254  267  382
    row_0                              
    18       1    0    0    0    0    0
    31       0    1    1    0    1    0
    183      0    0    0    0    1    1
    205      0    0    0    1    0    2
    254      0    0    0    0    0    1
    
    0 讨论(0)
提交回复
热议问题