Python & Pandas: How to query if a list-type column contains something?

前端 未结 5 835
攒了一身酷
攒了一身酷 2020-11-27 17:59

I have a dataframe, which contains info about movies. It has a column called genre, which contains a list of genres it belongs to. For example:

         


        
相关标签:
5条回答
  • 2020-11-27 18:27

    According to the source code, you can use .str.contains(..., regex=False).

    0 讨论(0)
  • 2020-11-27 18:31

    One liner using boolean indexing and list comprehension:

    searchTerm = 'something'
    df[[searchTerm in x for x in df['arrayColumn']]]
    
    0 讨论(0)
  • 2020-11-27 18:32

    A complete example:

    import pandas as pd
    
    data = pd.DataFrame([[['foo', 'bar']],
                        [['bar', 'baz']]], columns=['list_column'])
    print(data)
      list_column
    0  [foo, bar]
    1  [bar, baz]
    
    filtered_data = data.loc[
        lambda df: df.list_column.apply(
            lambda l: 'foo' in l
        )
    ]
    print(filtered_data)
      list_column
    0  [foo, bar]
    
    0 讨论(0)
  • 2020-11-27 18:35

    You can use apply for create mask and then boolean indexing:

    mask = df.genre.apply(lambda x: 'comedy' in x)
    df1 = df[mask]
    print (df1)
                           genre
    0           [comedy, sci-fi]
    1  [action, romance, comedy]
    
    0 讨论(0)
  • 2020-11-27 18:45

    using sets

    df.genre.map(set(['comedy']).issubset)
    
    0     True
    1     True
    2    False
    3    False
    dtype: bool
    

    df.genre[df.genre.map(set(['comedy']).issubset)]
    
    0             [comedy, sci-fi]
    1    [action, romance, comedy]
    dtype: object
    

    presented in a way I like better

    comedy = set(['comedy'])
    iscomedy = comedy.issubset
    df[df.genre.map(iscomedy)]
    

    more efficient

    comedy = set(['comedy'])
    iscomedy = comedy.issubset
    df[[iscomedy(l) for l in df.genre.values.tolist()]]
    

    using str in two passes
    slow! and not perfectly accurate!

    df[df.genre.str.join(' ').str.contains('comedy')]
    
    0 讨论(0)
提交回复
热议问题