how to calculate if statement for relative value rebalancing / Error: “The truth value of a Series is ambiguous”

前端 未结 2 972
有刺的猬
有刺的猬 2021-01-19 07:09

Below you find the code I wrote to calculate a relative change in value of df.a and df.b while df is a dataframe. What has to be calculated is basically df[\"c\"] = df

相关标签:
2条回答
  • 2021-01-19 07:44

    Ok, I get the result you want, but this is still way too complicated and unefficient. I would be interested to see a superior solution:

    import pandas as pd
    import numpy as np
    import datetime
    
    randn = np.random.randn
    
    
    rng = pd.date_range('1/1/2011', periods=10, freq='D')
    
    df = pd.DataFrame({'a': [1.1, 1.2, 2.3, 1.4, 1.5, 1.8, 0.7, 1.8, 1.9, 2.0], 'b': [1.1, 1.5, 1.3, 1.6, 1.5, 1.1, 1.5, 1.7, 2.1, 2.1],'c':[None] * 10},index=rng)
    
    
    
    df["d"]= [0,0,0,0,0,0,0,0,0,0]
    
    
    
    df["t"]= np.arange(len(df))
    tolerance = 0.3
    
    df['d1'] = df.a/df.a.iloc[df.d].values > df.b/df.b.iloc[df.d].values * (1+tolerance)
    
    df['d2'] = df.a/df.a.iloc[df.d].values * (1+tolerance) < df.b/df.b.iloc[df.d].values
    
    
    
    df['e'] = df.d1*df.t
    df['f'] = df.d2*df.t
    df['g'] = df.e +df.f
    df.ix[df.g > df.g.shift(1),"h"] = df.g * 1; df
    df.h = df.h + 1
    df.h = df.h.shift(1)
    df['h'][0] = 0
    
    df.h.fillna(method='ffill',inplace=True)
    df["d"] = df.h
    df["c"] = df.a/df.a.iloc[df.d].values
    

    and that's the result:

                  a    b         c  d  t     d1     d2  e  f  g  h
    2011-01-01  1.1  1.1  1.000000  0  0  False  False  0  0  0  0
    2011-01-02  1.2  1.5  1.090909  0  1  False  False  0  0  0  0
    2011-01-03  2.3  1.3  2.090909  0  2   True  False  2  0  2  0
    2011-01-04  1.4  1.6  1.000000  3  3  False  False  0  0  0  3
    2011-01-05  1.5  1.5  1.071429  3  4  False  False  0  0  0  3
    2011-01-06  1.8  1.1  1.285714  3  5   True  False  5  0  5  3
    2011-01-07  0.7  1.5  1.000000  6  6  False   True  0  6  6  6
    2011-01-08  1.8  1.7  1.000000  7  7  False  False  0  0  0  7
    2011-01-09  1.9  2.1  1.055556  7  8  False  False  0  0  0  7
    2011-01-10  2.0  2.1  1.111111  7  9  False  False  0  0  0  7
    

    from here you can easily delete rows with e.g. del df['g']

    0 讨论(0)
  • 2021-01-19 07:53

    This is not a 100% solution, but should at least get you down a better path and fix a primary problem. The core problem I'm seeing here from a syntax side is that you are trying to mix vectorized and non-vectorized code. You could instead do something more like this:

    >>> df['d1'] = df.a/df.a.iloc[df.d].values > df.b/df.b.iloc[df.d].values * (1+tolerance)
    
    >>> df['d2'] = df.a/df.a.iloc[df.d].values * (1+tolerance) < df.b/df.b.iloc[df.d].values
    
    >>> df['d'] = df['d1'] | df['d2']
    
    >>> df
    
                  a    b     c      d  t     d1     d2
    2011-01-01  1.1  1.1  None  False  0  False  False
    2011-01-02  1.2  1.5  None  False  1  False  False
    2011-01-03  2.3  1.3  None   True  2   True  False
    2011-01-04  1.4  1.6  None  False  3  False  False
    2011-01-05  1.5  1.5  None  False  4  False  False
    2011-01-06  1.8  1.1  None   True  5   True  False
    2011-01-07  0.7  1.5  None   True  6  False   True
    2011-01-08  1.8  1.7  None  False  7  False  False
    2011-01-09  1.9  2.1  None  False  8  False  False
    2011-01-10  2.0  2.1  None  False  9  False  False
    

    That's not quite the answer you want, but hopefully shows you what is going on with the code and how you can fix it to get what you want (i.e. you don't need or want to be using a function and applying it here, just use standard pandas vectorized code).

    If you can get that to work, the cleaner way to do that would be with np.where (either two of them sequentially, or nested).

    0 讨论(0)
提交回复
热议问题