Is there a query method or similar for pandas Series (pandas.Series.query())?

后端未结

关注

 3  763

The pandas.DataFrame.query() method is of great usage for (pre/post)-filtering data when loading or plotting. It comes particularly handy for method chaining.

相关标签:

3条回答

野的像风

2021-02-18 20:27
Instead of query you can use pipe:
```
s.pipe(lambda x: x[x>0]).pipe(lambda x: x[x<10])
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

鱼传尺愫

2021-02-18 20:29

IIUC you can add query("Points > 100"):

df = pd.DataFrame({'Points':[50,20,38,90,0, np.Inf],
                   'Player':['a','a','a','s','s','s']})

print (df)
  Player     Points
0      a  50.000000
1      a  20.000000
2      a  38.000000
3      s  90.000000
4      s   0.000000
5      s        inf

points_series = df.query("Points < inf").groupby("Player").agg({"Points": "sum"})['Points']
print (points_series)     
a = points_series[points_series > 100]
print (a)     
Player
a    108.0
Name: Points, dtype: float64


points_series = df.query("Points < inf")
                  .groupby("Player")
                  .agg({"Points": "sum"})
                  .query("Points > 100")

print (points_series)     
        Points
Player        
a        108.0

Another solution is Selection By Callable:

points_series = df.query("Points < inf")
                  .groupby("Player")
                  .agg({"Points": "sum"})['Points']
                  .loc[lambda x: x > 100]

print (points_series)     
Player
a    108.0
Name: Points, dtype: float64

Edited answer by edited question:

np.random.seed(1234)
df = pd.DataFrame({
    'Points': [np.random.choice([1,3]) for x in range(100)], 
    'Player': [np.random.choice(["A","B","C"]) for x in range(100)]})

print (df.query("Points == 3").Player.value_counts().loc[lambda x: x > 15])
C    19
B    16
Name: Player, dtype: int64

print (df.query("Points == 3").groupby("Player").size().loc[lambda x: x > 15])
Player
B    16
C    19
dtype: int64

0 讨论(0)

青春惊慌失措

2021-02-18 20:39
Why not convert from Series to DataFrame, do the querying, and then convert back.
```
df["Points"] = df["Points"].to_frame().query('Points > 100')["Points"]
```
Here, .to_frame() converts to DataFrame, while the trailing ["Points"] converts to Series.

The method .query() can then be used consistently whether or not the Pandas object has 1 or more columns.
0 讨论(0)
发布评论:

提交评论
- 加载中...