Pandas Dataframe Check if column value is in column list

前端 未结 4 848
旧巷少年郎
旧巷少年郎 2021-01-02 13:40

I have a dataframe df:

data = {\'id\':[12,112],
        \'idlist\':[[1,5,7,12,112],[5,7,12,111,113]]
       }
df=pd.DataFrame.from_dict(data)


        
相关标签:
4条回答
  • 2021-01-02 14:09

    Try simple for loop:

    flaglist = []
    for i in range(len(df)):
        if df.id[i] in df.idlist[i]:
            flaglist.append(1)
        else:
            flaglist.append(0)
    df["flag"] = flaglist 
    

    df:

        id                idlist  flag
    0   12    [1, 5, 7, 12, 112]     1
    1  112  [5, 7, 12, 111, 113]     0
    

    To drop rows:

    flaglist = []
    for i in range(len(df)):
        if df.id[i] not in df.idlist[i]:
            flaglist.append(i)
    df = df.drop(flaglist)
    

    df:

       id              idlist  flag
    0  12  [1, 5, 7, 12, 112]     1
    

    Above can be converted to list comprehension for creating a flag column:

    df["flag"] = [df.id[i] in df.idlist[i]    for i in range(len(df))]
    print(df)
    #     id                idlist   flag
    # 0   12    [1, 5, 7, 12, 112]   True
    # 1  112  [5, 7, 12, 111, 113]  False
    

    or

    df["flag"] = [1 if df.id[i] in df.idlist[i] else 0    for i in range(len(df))]
    print(df)
    #     id                idlist  flag
    # 0   12    [1, 5, 7, 12, 112]     1
    # 1  112  [5, 7, 12, 111, 113]     0
    

    and for selecting out rows:

    flaglist = [i   for i in range(len(df))   if df.id[i] in df.idlist[i]]
    print(df.iloc[flaglist])
    #    id              idlist
    # 0  12  [1, 5, 7, 12, 112]
    
    0 讨论(0)
  • 2021-01-02 14:20

    Use apply:

    df['flag'] = df.apply(lambda x: int(x['id'] in x['idlist']), axis=1)
    print (df)
        id                idlist  flag
    0   12    [1, 5, 7, 12, 112]     1
    1  112  [5, 7, 12, 111, 113]     0
    

    Similar:

    df['flag'] = df.apply(lambda x: x['id'] in x['idlist'], axis=1).astype(int)
    print (df)
        id                idlist  flag
    0   12    [1, 5, 7, 12, 112]     1
    1  112  [5, 7, 12, 111, 113]     0
    

    With list comprehension:

    df['flag'] = [int(x[0] in x[1]) for x in df[['id', 'idlist']].values.tolist()]
    print (df)
        id                idlist  flag
    0   12    [1, 5, 7, 12, 112]     1
    1  112  [5, 7, 12, 111, 113]     0
    

    Solutions for filtering:

    df = df[df.apply(lambda x: x['id'] in x['idlist'], axis=1)]
    print (df)
       id              idlist
    0  12  [1, 5, 7, 12, 112]
    
    df = df[[x[0] in x[1] for x in df[['id', 'idlist']].values.tolist()]]
    print (df)
    
       id              idlist
    0  12  [1, 5, 7, 12, 112]
    
    0 讨论(0)
  • 2021-01-02 14:34

    You can use df.apply and process each row and create a new column flag that will check the condition and give you result as second output requested.

    df['flag'] = df.loc[:, ('id', 'idlist')].apply(lambda x: 1 if x[0] in x[1] else 0, axis=1)
    
    print(df)
    

    where x[0] is id and x[1] is idlist

    0 讨论(0)
  • 2021-01-02 14:36

    By using issubset

    df.apply(lambda  x : set([x.id]).issubset(x.idlist),1).astype(int)
    Out[378]: 
    0    1
    1    0
    dtype: int32
    

    By using np.vectorize

    def myfun(x,y):
        return np.in1d(x,y)
    
    
    np.vectorize(myfun)(df.id,df.idlist).astype(int)
    

    Timing :

    %timeit np.vectorize(myfun)(df.id,df.idlist).astype(int)
    10000 loops, best of 3: 92.3 µs per loop
    %timeit df.apply(lambda  x : set([x.id]).issubset(x.idlist),1).astype(int)
    1000 loops, best of 3: 353 µs per loop
    
    0 讨论(0)
提交回复
热议问题