Alternatives to awkward Pandas/Python Dataframe Indexing: df_REPEATED[df_REPEATED['var']]>0?

后端 未结 2 952
感动是毒
感动是毒 2021-01-24 01:25

In Pandas/Python, I have to write the dataframe name twice when conditioning on its own variable:

df_REPEATED[df_REPEATED[\'var\']>0]

This h

2条回答
  •  北恋
    北恋 (楼主)
    2021-01-24 02:12

    df_REPEATED['var'] > 0 is a boolean array. Other than its length, it has no connection to the DataFrame. It could have been the result of another expression, say another_df['another_var'] > some_other_value, as long as the lengths match. So it offers flexibility. If the syntax was like the one you suggested, we couldn't do this. However, there are alternatives to what you are asking. For example,

    df_REPEATED.query('var > 0')
    

    query can be very fast if the DataFrame is large and it is less verbose but it lacks the advantages of boolean indexing and you start having troubles if the expression gets complicated.

提交回复
热议问题