Resampling Error : cannot reindex a non-unique index with a method or limit

后端 未结 1 1851
礼貌的吻别
礼貌的吻别 2020-12-17 22:39

I am using Pandas to structure and process Data.

I have here a DataFrame with dates as index, Id and bitrate. I want to group my Data by Id and resample, at the same

相关标签:
1条回答
  • 2020-12-17 22:53

    It seems there is problem with duplicates in columns beginning_time and end_time, I try simulate it:

    df = pd.DataFrame(
    {'Id' : ['CODI126640013.ts', 'CODI126622312.ts', 'a'],
    'beginning_time':['2016-07-08 02:17:42', '2016-07-08 02:17:42', '2016-07-08 02:17:45'], 
    'end_time' :['2016-07-08 02:17:42', '2016-07-08 02:17:42', '2016-07-08 02:17:42'],
    'bitrate': ['3750000', '3750000', '444'],
    'type' : ['vod', 'catchup', 's'],
    'unique_id':['f2514f6b-ce7e-4e1a-8f6a-3ac5d524be30', 'f2514f6b-ce7e-4e1a-8f6a-3ac5d524bb22','w']})
    
    print (df)  
                     Id       beginning_time  bitrate             end_time  \
    0  CODI126640013.ts  2016-07-08 02:17:42  3750000  2016-07-08 02:17:42   
    1  CODI126622312.ts  2016-07-08 02:17:42  3750000  2016-07-08 02:17:42   
    2                 a  2016-07-08 02:17:45      444  2016-07-08 02:17:42   
    
          type                             unique_id  
    0      vod  f2514f6b-ce7e-4e1a-8f6a-3ac5d524be30  
    1  catchup  f2514f6b-ce7e-4e1a-8f6a-3ac5d524bb22  
    2        s                                     w  
    
    df = df.drop(['type', 'unique_id'], axis=1)
    df.beginning_time = pd.to_datetime(df.beginning_time)
    df.end_time = pd.to_datetime(df.end_time)
    df = pd.melt(df, id_vars=['Id','bitrate'], value_name='dates').drop('variable', axis=1)
    df.set_index('dates', inplace=True)
    
    
    print (df)  
                                       Id  bitrate
    dates                                         
    2016-07-08 02:17:42  CODI126640013.ts  3750000
    2016-07-08 02:17:42  CODI126622312.ts  3750000
    2016-07-08 02:17:45                 a      444
    2016-07-08 02:17:42  CODI126640013.ts  3750000
    2016-07-08 02:17:42  CODI126622312.ts  3750000
    2016-07-08 02:17:42                 a      444
    
    print (df.groupby('Id').resample('1S').ffill())
    

    ValueError: cannot reindex a non-unique index with a method or limit

    One possible solution is add drop_duplicates and use old way for resample with groupby:

    df = df.drop(['type', 'unique_id'], axis=1)
    df.beginning_time = pd.to_datetime(df.beginning_time)
    df.end_time = pd.to_datetime(df.end_time)
    df = pd.melt(df, id_vars=['Id','bitrate'], value_name='dates').drop('variable', axis=1)
    
    print (df.groupby('Id').apply(lambda x : x.drop_duplicates('dates')
                                              .set_index('dates')
                                              .resample('1S')
                                              .ffill()))
    
                                                        Id  bitrate
    Id               dates                                         
    CODI126622312.ts 2016-07-08 02:17:42  CODI126622312.ts  3750000
    CODI126640013.ts 2016-07-08 02:17:42  CODI126640013.ts  3750000
    a                2016-07-08 02:17:41                 a      444
                     2016-07-08 02:17:42                 a      444
                     2016-07-08 02:17:43                 a      444
                     2016-07-08 02:17:44                 a      444
                     2016-07-08 02:17:45                 a      444
    

    You can also check duplicates by boolean indexing:

    print (df[df.beginning_time == df.end_time])
    2        s                                     w  
                     Id       beginning_time  bitrate             end_time  \
    0  CODI126640013.ts  2016-07-08 02:17:42  3750000  2016-07-08 02:17:42   
    1  CODI126622312.ts  2016-07-08 02:17:42  3750000  2016-07-08 02:17:42   
    
          type                             unique_id  
    0      vod  f2514f6b-ce7e-4e1a-8f6a-3ac5d524be30  
    1  catchup  f2514f6b-ce7e-4e1a-8f6a-3ac5d524bb22  
    
    0 讨论(0)
提交回复
热议问题