How to deal with this complex logic in python pandas?

后端 未结 2 1426
没有蜡笔的小新
没有蜡笔的小新 2021-01-14 10:04

I have some data like follow structure. It used in python pandas Data Frame and I named it df.

Data1,Data2,Flag
2016-04-29,00:40:15,1
2016-04-29,00:40:24,2
2         


        
相关标签:
2条回答
  • 2021-01-14 10:28

    I built a generator to produce the rows then used pd.concat

    def get_row(df):
        ref = None
        for i, row in df.iterrows():
            if ref is not None:
                cond1 = (row.Data2.total_seconds() - 
                         ref.Data2.total_seconds() > 18)
                cond2 = row.Flag != ref.Flag
            if ref is None or cond1 or cond2:
                yield row
                ref = row
    
    pd.concat([r for r in get_row(df)], axis=1).T
    


    Timing

    Because @Kartik insisted :-)

    0 讨论(0)
  • 2021-01-14 10:41

    Here, try this:

    df['Data2'] = pd.to_timedelta(df['Data2'])
    
    tdf = df.copy()
    sel_idx = []
    while len(tdf) > 0:
        sel_idx.extend([tdf.index[0]])
        cond1 = tdf['Data2'] > tdf.loc[sel_idx[-1], 'Data2'] + pd.to_timedelta(18, 's')
        cond2 = (tdf['Flag'] != tdf.loc[sel_idx[-1], 'Flag']) & (tdf['Data2'] > tdf.loc[sel_idx[-1], 'Data2'])
        tdf = tdf[cond1 | cond2]
    df.loc[sel_idx, :]
    

    Test

    Code:

    import pandas as pd
    from io import StringIO
    
    data = StringIO("""Data1,Data2,Flag
    2016-04-29,00:40:15,1
    2016-04-29,00:40:24,2
    2016-04-29,00:40:35,2
    2015-04-29,00:40:36,2
    2015-04-29,00:40:43,2
    2015-04-29,00:40:45,2
    2015-04-29,00:40:55,1
    2015-04-29,00:41:05,1
    2015-04-29,00:41:16,1
    2015-04-29,00:41:17,2
    2016-11-29,11:52:36,2
    2016-11-29,11:52:43,2
    2016-11-29,11:52:45,2
    2016-11-29,11:52:55,1""")
    
    df = pd.read_csv(data)
    df['Data2'] = pd.to_timedelta(df['Data2'])
    print("Input\n", df)
    
    tdf = df.copy()
    sel_idx = []
    while len(tdf) > 0:
        sel_idx.extend([tdf.index[0]])
        cond1 = tdf['Data2'] > tdf.loc[sel_idx[-1], 'Data2'] + pd.to_timedelta(18, 's')
        cond2 = (tdf['Flag'] != tdf.loc[sel_idx[-1], 'Flag']) & (tdf['Data2'] > tdf.loc[sel_idx[-1], 'Data2'])
        tdf = tdf[cond1 | cond2]
    print("Ouput\n", df.loc[sel_idx, :])
    

    Output:

    Input
        Data1       Data2       Flag
    0   2016-04-29  00:40:15    1
    1   2016-04-29  00:40:24    2
    2   2016-04-29  00:40:35    2
    3   2015-04-29  00:40:36    2
    4   2015-04-29  00:40:43    2
    5   2015-04-29  00:40:45    2
    6   2015-04-29  00:40:55    1
    7   2015-04-29  00:41:05    1
    8   2015-04-29  00:41:16    1
    9   2015-04-29  00:41:17    2
    10  2016-11-29  11:52:36    2
    11  2016-11-29  11:52:43    2
    12  2016-11-29  11:52:45    2
    13  2016-11-29  11:52:55    1
    
    Output
        Data1       Data2       Flag
    0   2016-04-29  00:40:15    1
    1   2016-04-29  00:40:24    2
    4   2015-04-29  00:40:43    2
    6   2015-04-29  00:40:55    1
    8   2015-04-29  00:41:16    1
    9   2015-04-29  00:41:17    2
    10  2016-11-29  11:52:36    2
    13  2016-11-29  11:52:55    1
    
    0 讨论(0)
提交回复
热议问题