Using isin() to determine what should be printed

后端 未结 2 1151
-上瘾入骨i
-上瘾入骨i 2021-01-14 06:04

Right now I have two dataframes (data1 and data2)

I would like to print a column of string values in the dataframe called data1, based on w

相关标签:
2条回答
  • 2021-01-14 06:24

    If all values of ids are unique:

    I think you need merge with inner join. For data2 select only id column, on parameter should be omit, because joining on all columns - here only id:

    df = pd.merge(data1, data2[['id']])
    

    Sample:

    data1 = pd.DataFrame({'id':list('abcdef'),
                          'B':[4,5,4,5,5,4],
                          'C':[7,8,9,4,2,3]})
    
    print (data1)
       B  C id
    0  4  7  a
    1  5  8  b
    2  4  9  c
    3  5  4  d
    4  5  2  e
    5  4  3  f
    
    data2 = pd.DataFrame({'id':list('frcdeg'),
                          'D':[1,3,5,7,1,0],
                          'E':[5,3,6,9,2,4],})
    
    print (data2)
       D  E id
    0  1  5  f
    1  3  3  r
    2  5  6  c
    3  7  9  d
    4  1  2  e
    5  0  4  g
    
    df = pd.merge(data1, data2[['id']])
    print (df)
       B  C id
    0  4  9  c
    1  5  4  d
    2  5  2  e
    3  4  3  f
    

    If id are duplicated in one or another Dataframe use another answer, also added similar solutions:

    df = data1[data1['id'].isin(set(data1['id']) & set(data2['id']))]
    

    ids = set(data1['id']) & set(data2['id'])
    df = data2.query('id in @ids')
    

    df = data1[np.in1d(data1['id'], np.intersect1d(data1['id'], data2['id']))]
    

    Sample:

    data1 = pd.DataFrame({'id':list('abcdef'),
                          'B':[4,5,4,5,5,4],
                          'C':[7,8,9,4,2,3]})
    
    print (data1)
       B  C id
    0  4  7  a
    1  5  8  b
    2  4  9  c
    3  5  4  d
    4  5  2  e
    5  4  3  f
    
    data2 = pd.DataFrame({'id':list('fecdef'),
                          'D':[1,3,5,7,1,0],
                          'E':[5,3,6,9,2,4],})
    
    print (data2)
       D  E id
    0  1  5  f
    1  3  3  e
    2  5  6  c
    3  7  9  d
    4  1  2  e
    5  0  4  f
    
    df = data1[data1['id'].isin(set(data1['id']) & set(data2['id']))]
    print (df)
       B  C id
    2  4  9  c
    3  5  4  d
    4  5  2  e
    5  4  3  f
    

    EDIT:

    You can use:

    df = data2.loc[data1['id'].isin(set(data1['id']) & set(data2['id'])), ['title']]
    
    ids = set(data1['id']) & set(data2['id'])
    df = data2.query('id in @ids')[['title']]
    
    df = data2.loc[np.in1d(data1['id'], np.intersect1d(data1['id'], data2['id'])), ['title']]
    
    0 讨论(0)
  • 2021-01-14 06:29

    You can compute the set intersection of the two columns -

    ids = set(data1['id']).intersection(data2['id'])
    

    Or,

    ids = np.intersect1d(data1['id'], data2['id'])
    

    Next, query/filter out relevant rows.

    data1.loc[data1['id'].isin(ids), 'id']
    
    0 讨论(0)
提交回复
热议问题