Pandas How to filter a Series

后端 未结 7 1861
自闭症患者
自闭症患者 2020-11-30 20:51

I have a Series like this after doing groupby(\'name\') and used mean() function on other column

name
383      3.000000
663      1.000000
726      1.000000
7         


        
相关标签:
7条回答
  • 2020-11-30 21:27

    In my case I had a panda Series where the values are tuples of characters:

    Out[67]
    0    (H, H, H, H)
    1    (H, H, H, T)
    2    (H, H, T, H)
    3    (H, H, T, T)
    4    (H, T, H, H)
    

    Therefore I could use indexing to filter the series, but to create the index I needed apply. My condition is "find all tuples which have exactly one 'H'".

    series_of_tuples[series_of_tuples.apply(lambda x: x.count('H')==1)]
    

    I admit it is not "chainable", (i.e. notice I repeat series_of_tuples twice; you must store any temporary series into a variable so you can call apply(...) on it).

    There may also be other methods (besides .apply(...)) which can operate elementwise to produce a Boolean index.

    Many other answers (including accepted answer) using the chainable functions like:

    • .compress()
    • .where()
    • .loc[]
    • []

    These accept callables (lambdas) which are applied to the Series, not to the individual values in those series!

    Therefore my Series of tuples behaved strangely when I tried to use my above condition / callable / lambda, with any of the chainable functions, like .loc[]:

    series_of_tuples.loc[lambda x: x.count('H')==1]
    

    Produces the error:

    KeyError: 'Level H must be same as name (None)'

    I was very confused, but it seems to be using the Series.count series_of_tuples.count(...) function , which is not what I wanted.

    I admit that an alternative data structure may be better:

    • A Category datatype?
    • A Dataframe (each element of the tuple becomes a column)
    • A Series of strings (just concatenate the tuples together):

    This creates a series of strings (i.e. by concatenating the tuple; joining the characters in the tuple on a single string)

    series_of_tuples.apply(''.join)
    

    So I can then use the chainable Series.str.count

    series_of_tuples.apply(''.join).str.count('H')==1
    
    0 讨论(0)
  • 2020-11-30 21:30

    A fast way of doing this is to reconstruct using numpy to slice the underlying arrays. See timings below.

    mask = s.values != 1
    pd.Series(s.values[mask], s.index[mask])
    
    0
    383    3.000000
    737    9.000000
    833    8.166667
    dtype: float64
    

    naive timing

    0 讨论(0)
  • 2020-11-30 21:33

    From pandas version 0.18+ filtering a series can also be done as below

    test = {
    383:    3.000000,
    663:    1.000000,
    726:    1.000000,
    737:    9.000000,
    833:    8.166667
    }
    
    pd.Series(test).where(lambda x : x!=1).dropna()
    

    Checkout: http://pandas.pydata.org/pandas-docs/version/0.18.1/whatsnew.html#method-chaininng-improvements

    0 讨论(0)
  • 2020-11-30 21:35

    If you like a chained operation, you can also use compress function:

    test = pd.Series({
    383:    3.000000,
    663:    1.000000,
    726:    1.000000,
    737:    9.000000,
    833:    8.166667
    })
    
    test.compress(lambda x: x != 1)
    
    # 383    3.000000
    # 737    9.000000
    # 833    8.166667
    # dtype: float64
    
    0 讨论(0)
  • 2020-11-30 21:39
    In [5]:
    
    import pandas as pd
    
    test = {
    383:    3.000000,
    663:    1.000000,
    726:    1.000000,
    737:    9.000000,
    833:    8.166667
    }
    
    s = pd.Series(test)
    s = s[s != 1]
    s
    Out[0]:
    383    3.000000
    737    9.000000
    833    8.166667
    dtype: float64
    
    0 讨论(0)
  • 2020-11-30 21:39

    Another way is to first convert to a DataFrame and use the query method (assuming you have numexpr installed):

    import pandas as pd
    
    test = {
    383:    3.000000,
    663:    1.000000,
    726:    1.000000,
    737:    9.000000,
    833:    8.166667
    }
    
    s = pd.Series(test)
    s.to_frame(name='x').query("x != 1")
    
    0 讨论(0)
提交回复
热议问题