Nested np.where

后端 未结 3 1115
清酒与你
清酒与你 2021-01-22 03:56

I have the following dataframe:

S A
1 1
1 0
2 1
2 0

I wanted to create a new \'Result\' column that is calculated based on the val

相关标签:
3条回答
  • 2021-01-22 04:36

    As far as I know np.where does not support multiple return statements (at least not more than two). So either you rewrite your np.where to result in one True and one False statement and to return 1/0 for True/False, or you need to use masks.

    If you rewrite np.where, you are limited to two results and the second result will always be set when the condition is not True. So it will be also set for values like (S == 5) & (A = np.nan).

    df['Result'] = np.where(((df.S == 1) & (df.A == 1)) | ((df.S == 2) & (df.A == 0)), 1, 0)
    

    When using masks, you can apply an arbitrary number of conditions and results. For your example, the solution looks like:

    mask_0 = ((df.S == 1) & (df.A == 0)) | ((df.S == 2) & (df.A == 1))
    mask_1 = ((df.S == 1) & (df.A == 1)) | ((df.S == 2) & (df.A == 0))
    df.loc[mask_0, 'Result'] = 0
    df.loc[mask_1, 'Result'] = 1
    

    Results will be set to np.nan where no condition is met. This is imho failsafe and should thus be used. But if you want to have zeros in these locations, just initialize your Results column with zeros.
    Of course this can be simplified for special cases like only having 1 and 0 as result and extended for any number of result by using dicts or other containers.

    0 讨论(0)
  • 2021-01-22 04:39

    I would recommend you using numpy.select if you have very nested operations.

    df = pd.DataFrame({
        "S": [1, 1, 2, 2],
        "A": [1, 0, 1, 0]
    })
    
    # you could of course combine the clause (1, 4) and (2, 3) with the '|' or operator
    df['RESULT'] = np.select([
        (df.S == 1) & (df.A == 1),
        (df.S == 1) & (df.A == 0),
        (df.S == 2) & (df.A == 1),
        (df.S == 2) & (df.A == 0)
    ], [1, 0, 0, 1])
    
    0 讨论(0)
  • 2021-01-22 04:50

    You should use nested np.where. It is like sql case clause. But be careful when there is nan in the data.

    df=pd.DataFrame({'S':[1,1,2,2],'A':[1,0,1,0]})
    df['Result'] = np.where((df.S == 1) & (df.A == 1), 1,   #when... then
                     np.where((df.S == 1) & (df.A == 0), 0,  #when... then
                      np.where((df.S == 2) & (df.A == 1), 0,  #when... then
                        1)))                                  #else
    df
    

    output:

    |   | S | A | Result |
    |---|---|---|--------|
    | 0 | 1 | 1 | 1      |
    | 1 | 1 | 0 | 0      |
    | 2 | 2 | 1 | 0      |
    | 3 | 2 | 0 | 1      |
    
    0 讨论(0)
提交回复
热议问题