Filtering a list based on a list of booleans

后端 未结 6 1107
清歌不尽
清歌不尽 2020-11-27 10:14

I have a list of values which I need to filter given the values in a list of booleans:

list_a = [1, 2, 4, 6]
filter = [True, False, True, False]
相关标签:
6条回答
  • 2020-11-27 10:56

    Like so:

    filtered_list = [i for (i, v) in zip(list_a, filter) if v]
    

    Using zip is the pythonic way to iterate over multiple sequences in parallel, without needing any indexing. This assumes both sequences have the same length (zip stops after the shortest runs out). Using itertools for such a simple case is a bit overkill ...

    One thing you do in your example you should really stop doing is comparing things to True, this is usually not necessary. Instead of if filter[idx]==True: ..., you can simply write if filter[idx]: ....

    0 讨论(0)
  • 2020-11-27 10:57
    filtered_list = [list_a[i] for i in range(len(list_a)) if filter[i]]
    
    0 讨论(0)
  • 2020-11-27 11:05

    With numpy:

    In [128]: list_a = np.array([1, 2, 4, 6])
    In [129]: filter = np.array([True, False, True, False])
    In [130]: list_a[filter]
    
    Out[130]: array([1, 4])
    

    or see Alex Szatmary's answer if list_a can be a numpy array but not filter

    Numpy usually gives you a big speed boost as well

    In [133]: list_a = [1, 2, 4, 6]*10000
    In [134]: fil = [True, False, True, False]*10000
    In [135]: list_a_np = np.array(list_a)
    In [136]: fil_np = np.array(fil)
    
    In [139]: %timeit list(itertools.compress(list_a, fil))
    1000 loops, best of 3: 625 us per loop
    
    In [140]: %timeit list_a_np[fil_np]
    10000 loops, best of 3: 173 us per loop
    
    0 讨论(0)
  • 2020-11-27 11:08

    With python 3 you can use list_a[filter] to get True values. To get False values use list_a[~filter]

    0 讨论(0)
  • 2020-11-27 11:09

    You're looking for itertools.compress:

    >>> from itertools import compress
    >>> list_a = [1, 2, 4, 6]
    >>> fil = [True, False, True, False]
    >>> list(compress(list_a, fil))
    [1, 4]
    

    Timing comparisons(py3.x):

    >>> list_a = [1, 2, 4, 6]
    >>> fil = [True, False, True, False]
    >>> %timeit list(compress(list_a, fil))
    100000 loops, best of 3: 2.58 us per loop
    >>> %timeit [i for (i, v) in zip(list_a, fil) if v]  #winner
    100000 loops, best of 3: 1.98 us per loop
    
    >>> list_a = [1, 2, 4, 6]*100
    >>> fil = [True, False, True, False]*100
    >>> %timeit list(compress(list_a, fil))              #winner
    10000 loops, best of 3: 24.3 us per loop
    >>> %timeit [i for (i, v) in zip(list_a, fil) if v]
    10000 loops, best of 3: 82 us per loop
    
    >>> list_a = [1, 2, 4, 6]*10000
    >>> fil = [True, False, True, False]*10000
    >>> %timeit list(compress(list_a, fil))              #winner
    1000 loops, best of 3: 1.66 ms per loop
    >>> %timeit [i for (i, v) in zip(list_a, fil) if v] 
    100 loops, best of 3: 7.65 ms per loop
    

    Don't use filter as a variable name, it is a built-in function.

    0 讨论(0)
  • 2020-11-27 11:14

    To do this using numpy, ie, if you have an array, a, instead of list_a:

    a = np.array([1, 2, 4, 6])
    my_filter = np.array([True, False, True, False], dtype=bool)
    a[my_filter]
    > array([1, 4])
    
    0 讨论(0)
提交回复
热议问题