Pandas: How to Compare Columns of Lists Row-wise in a DataFrame with Pandas (not for loop)?

后端 未结 2 1381
误落风尘
误落风尘 2021-02-05 21:25

DataFrame

df = pd.DataFrame({\'A\': [[\'gener\'], [\'gener\'], [\'system\'], [\'system\'], [\'gutter\'], [\'gutter\'], [\'gutter\'], [\'gutter\'         


        
相关标签:
2条回答
  • 2021-02-05 21:51

    To check if every item in df.A is contained in df.B:

    >>> df.apply(lambda row: all(i in row.B for i in row.A), axis=1)
    # OR: ~(df['A'].apply(set) - df['B'].apply(set)).astype(bool)
    0     False
    1     False
    2      True
    3      True
    4      True
    5      True
    6      True
    7      True
    8      True
    9      True
    10     True
    11     True
    12     True
    13     True
    14     True
    15     True
    16     True
    17     True
    18     True
    19     True
    dtype: bool
    

    To get the union:

    df['intersection'] = [list(set(a).intersection(set(b))) for a, b in zip(df.A, df.B)]
    
    >>> df
                         A                                      B        intersection
    0              [gener]                               [gutter]                  []
    1              [gener]                               [gutter]                  []
    2             [system]                       [gutter, system]            [system]
    3             [system]                [gutter, guard, system]            [system]
    4             [gutter]                         [ohio, gutter]            [gutter]
    5             [gutter]                       [gutter, toledo]            [gutter]
    6             [gutter]                       [toledo, gutter]            [gutter]
    7             [gutter]                               [gutter]            [gutter]
    8             [gutter]                               [gutter]            [gutter]
    9             [gutter]                               [gutter]            [gutter]
    10          [aluminum]    [how, to, instal, aluminum, gutter]          [aluminum]
    11          [aluminum]                     [aluminum, gutter]          [aluminum]
    12          [aluminum]              [aluminum, gutter, color]          [aluminum]
    13          [aluminum]                     [aluminum, gutter]          [aluminum]
    14          [aluminum]       [aluminum, gutter, adrian, ohio]          [aluminum]
    15          [aluminum]  [aluminum, gutter, bowl, green, ohio]          [aluminum]
    16          [aluminum]        [aluminum, gutter, maume, ohio]          [aluminum]
    17          [aluminum]   [aluminum, gutter, perrysburg, ohio]          [aluminum]
    18          [aluminum]     [aluminum, gutter, tecumseh, ohio]          [aluminum]
    19  [aluminum, toledo]       [aluminum, gutter, toledo, ohio]  [aluminum, toledo]
    
    0 讨论(0)
  • 2021-02-05 22:01

    Just use the apply function supported by pandas, it's great.

    Since you may have more than two columns for intersecting, the auxiliary function can be prepared like this and then applied with the DataFrame.apply function (see http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.apply.html, note the option axis=1 means "across the series" while axis=0 means "along the series", where one series is just one column in the data frame). Each row across the columns is then passed as a iterable Series object to the function applied.

    def intersect(ss):
        ss = iter(ss)
        s = set(next(ss))
        for t in ss:
            s.intersection_update(t) # `t' must not be a `set' here, `list' or any `Iterable` is OK
        return s
    
    res = df.apply(intersect, axis=1)
    
    >>> res
    0                     {}
    1                     {}
    2               {system}
    3               {system}
    4               {gutter}
    5               {gutter}
    6               {gutter}
    7               {gutter}
    8               {gutter}
    9               {gutter}
    10            {aluminum}
    11            {aluminum}
    12            {aluminum}
    13            {aluminum}
    14            {aluminum}
    15            {aluminum}
    16            {aluminum}
    17            {aluminum}
    18            {aluminum}
    19    {aluminum, toledo}
    

    You can augment further operations on the result of the auxiliary function, or make some variations similarly.

    Hope this helps.

    0 讨论(0)
提交回复
热议问题