Expressing pandas subset using pipe

后端 未结 3 2243
滥情空心
滥情空心 2021-02-15 06:24

I have a dataframe that I subset like this:

   a  b   x  y
0  1  2   3 -1
1  2  4   6 -2
2  3  6   6 -3
3  4  8   3 -4

df = df[(df.a >= 2) & (df.b <=          


        
3条回答
  •  无人共我
    2021-02-15 06:26

    As long as you can categorize a step as something that returns a DataFrame, and takes a DataFrame (with possibly more arguments), then you can use pipe. Whether there's an advantage to doing so, is another question.

    Here, e.g., you can use

    df\
        .pipe(lambda df_, x, y: df_[(df_.a >= x) & (df_.b <= y)], 2, 8)\
        .pipe(lambda df_: df_.groupby(df_.x))\
        .mean()
    

    Notice how the first stage is a lambda that takes 3 arguments, with the 2 and 8 passed as parameters. That's not the only way to do so - it is equivalent to

        .pipe(lambda df_: df_[(df_.a >= 2) & (df_.b <= 8)])\
    

    Also note that you can use

    df\
        .pipe(lambda df_, x, y: df[(df.a >= x) & (df.b <= y)], 2, 8)\
        .groupby('x')\
        .mean()
    

    Here the lambda takes df_, but operates on df, and the second pipe has been replaced with a groupby.

    • The first change works here, but is gragile. It happens to work since this is the first pipe stage. If it would be a later stage, it might take a DataFrame with one dimension, and attempt to filter it on a mask with another dimension, for example.

    • The second change is fine. In face, I think it is more readable. Basically, anything that takes a DataFrame and returns one, can be either be called directly or through pipe.

提交回复
热议问题