I have a Series like this after doing groupby(\'name\') and used mean() function on other column
name
383 3.000000
663 1.000000
726 1.000000
7
In my case I had a panda Series where the values are tuples of characters:
Out[67]
0 (H, H, H, H)
1 (H, H, H, T)
2 (H, H, T, H)
3 (H, H, T, T)
4 (H, T, H, H)
Therefore I could use indexing to filter the series, but to create the index I needed apply
. My condition is "find all tuples which have exactly one 'H'".
series_of_tuples[series_of_tuples.apply(lambda x: x.count('H')==1)]
I admit it is not "chainable", (i.e. notice I repeat series_of_tuples
twice; you must store any temporary series into a variable so you can call apply(...) on it).
There may also be other methods (besides .apply(...)
) which can operate elementwise to produce a Boolean index.
Many other answers (including accepted answer) using the chainable functions like:
.compress()
.where()
.loc[]
[]
These accept callables (lambdas) which are applied to the Series, not to the individual values in those series!
Therefore my Series of tuples behaved strangely when I tried to use my above condition / callable / lambda, with any of the chainable functions, like .loc[]
:
series_of_tuples.loc[lambda x: x.count('H')==1]
Produces the error:
KeyError: 'Level H must be same as name (None)'
I was very confused, but it seems to be using the Series.count series_of_tuples.count(...) function , which is not what I wanted.
I admit that an alternative data structure may be better:
This creates a series of strings (i.e. by concatenating the tuple; joining the characters in the tuple on a single string)
series_of_tuples.apply(''.join)
So I can then use the chainable Series.str.count
series_of_tuples.apply(''.join).str.count('H')==1
A fast way of doing this is to reconstruct using numpy
to slice the underlying arrays. See timings below.
mask = s.values != 1
pd.Series(s.values[mask], s.index[mask])
0
383 3.000000
737 9.000000
833 8.166667
dtype: float64
naive timing
From pandas version 0.18+ filtering a series can also be done as below
test = {
383: 3.000000,
663: 1.000000,
726: 1.000000,
737: 9.000000,
833: 8.166667
}
pd.Series(test).where(lambda x : x!=1).dropna()
Checkout: http://pandas.pydata.org/pandas-docs/version/0.18.1/whatsnew.html#method-chaininng-improvements
If you like a chained operation, you can also use compress
function:
test = pd.Series({
383: 3.000000,
663: 1.000000,
726: 1.000000,
737: 9.000000,
833: 8.166667
})
test.compress(lambda x: x != 1)
# 383 3.000000
# 737 9.000000
# 833 8.166667
# dtype: float64
In [5]:
import pandas as pd
test = {
383: 3.000000,
663: 1.000000,
726: 1.000000,
737: 9.000000,
833: 8.166667
}
s = pd.Series(test)
s = s[s != 1]
s
Out[0]:
383 3.000000
737 9.000000
833 8.166667
dtype: float64
Another way is to first convert to a DataFrame and use the query method (assuming you have numexpr installed):
import pandas as pd
test = {
383: 3.000000,
663: 1.000000,
726: 1.000000,
737: 9.000000,
833: 8.166667
}
s = pd.Series(test)
s.to_frame(name='x').query("x != 1")