Incorrect results when applying solution to real data

后端 未结 2 664
一向
一向 2021-01-17 17:37

I\'ve tried to apply the solution provided in this question to my real data: Selecting rows in a MultiIndexed dataframe. Somehow I cannot get the results it should give. I\'

相关标签:
2条回答
  • 2021-01-17 17:41

    I couldn't figure out the way to modify the original question you were linking to, since your solution looked like it should work. However, this is an iterative way to solve what you're looking for.

    import pandas as pd
    
    
    df_test = pd.read_csv('df_test.csv')
    print(df_test.head())
    """
    
       bins_DO    L  T2011   2011  T2012  2012  T2013   2013  T2014   2014  T2015   2015  Ttotal  total
    0        0  IR1      6  -6.06     13 -3.22     12  -1.60      7  14.64     12 -18.20      50 -14.44
    1        1  IR1     14 -16.32     12 -6.74     14  -1.22      5   1.58      8  -0.42      53 -23.12
    2        2  IR1     10  -9.14     10 -0.42     10  11.84     13  -5.74      7  -3.10      50  -6.56
    3        3  IR1      9 -13.78     14 -3.14     10  -2.48      6  -0.02      5  -4.78      44 -24.20
    4        4  IR1      6   0.54      9 -9.40     15 -11.20      7   0.68      9  12.04      46  -7.34
    
    """
    cols = ['2012', '2013', '2014', '2015']
    
    
    def process_df(df: pd.DataFrame, cols: list, threshold: float):
        # initialize the benchmark
        # this gets reset any time the newest row fails the threshold test
        base_vals = [0 for _ in cols]
        keep_col = []
    
        for row in df[cols].values:
            # by default, keep the row
            keep_row = True
            for x in range(len(cols)):
                # if it fails on the row, then make keep row false
                if row[x] + base_vals[x] > threshold:
                    keep_row = False
    
            keep_col.append(keep_row)
    
            if keep_row:
                # if we were happy with those results, then keep adding the column values to the base_vals
                for x in range(len(cols)):
                    base_vals[x] += row[x]
            else:
                # otherwise, reset the base vals
                base_vals = [0 for _ in cols]
    
        # only keep rows that we want
        df = df.loc[keep_col, :]
    
        return df
    
    
    new_df = process_df(df = df_test, cols = cols, threshold = 0)
    print(new_df)
    
    """
    
        bins_DO    L  T2011   2011  T2012   2012  T2013   2013  T2014   2014  T2015  2015  Ttotal  total
    3         3  IR1      9 -13.78     14  -3.14     10  -2.48      6  -0.02      5 -4.78      44 -24.20
    11       11  IR1      7   7.10     10 -10.04      7 -10.60     17  -5.56     11 -8.44      52 -27.54
    12       12  IR1     10  -0.28      7  -7.30      8   5.96      8 -12.58     10 -6.86      43 -21.06
    
    """
    
    0 讨论(0)
  • 2021-01-17 17:58

    According to the logic from your comment you are looking for rows that have every value in columns 2012,2013,2014,2015 less than 0 or have a cumulative sum less than 0. Since the first condition will always be true when the second condition is true you just test for the second condition.

    cols = ['2012', '2013', '2014', '2015']
    df.loc[(df[cols].cumsum(axis=1) < 0).all(axis=1), cols]
    
         2012   2013   2014   2015
    1   -6.74  -1.22   1.58  -0.42
    3   -3.14  -2.48  -0.02  -4.78
    4   -9.40 -11.20   0.68  12.04
    7   -3.12  -5.74   0.84   1.94
    8  -10.14 -12.24 -11.10  15.20
    11 -10.04 -10.60  -5.56  -8.44
    12  -7.30   5.96 -12.58  -6.86
    15 -10.24  -4.16   5.46 -14.00
    

    Let me know in the comments if this is not what you want.

    0 讨论(0)
提交回复
热议问题