PANDAS - Loop over two datetime indexes with different sizes to compare days and values

筅森魡賤 提交于 2019-12-24 03:23:47

问题


Looking for a more efficient way to loop over and compare datetimeindex values in two Series objects with different frequencies.

Setup

Imagine two Pandas series, each with a datetime index covering the same year span yet with different frequencies for each index. One has a frequency of days, the other a frequency of hours.

range1 = pd.date_range('2016-01-01','2016-12-31', freq='D')
range2 = pd.date_range('2016-01-01','2016-12-31', freq='H')

I'm trying to loop over these series using their indexes as a lookup to match days so I can compare data for each day.

What I'm doing now...slow.

Right now I'm using multi-level for loops and if statements (see below); the time to complete these loops seems excessive (5.45 s per loop) compared with what I'm used to in Pandas operations.

for date, val in zip(frame1.index, frame1['data']): # freq = 'D'
    for date2, val2 in zip(frame2.index, frame2['data']): # freq = 'H'
        if date.day == date2.day: # check to see if dates are a match
            if val2 > val: # compare the values
                # append values, etc

Question

Is there a more efficient way of using the index in frame1 to loop over the index in frame2 and compare the values in each frame for a given day? Ultimately I want to create a series of values wherever frame2 vals are greater than frame1 vals.

Reproducible (Tested) Example

Create two separate series with random data and assign each a datetime index.

import pandas as pd
import numpy as np

range1 = pd.date_range('2016-01-01','2016-12-31', freq='D')
range2 = pd.date_range('2016-01-01','2016-12-31', freq='H')

frame1 = pd.Series(np.random.rand(366), index=range1)
frame2 = pd.Series(np.random.rand(8761), index=range2)

回答1:


Yes, use resample, asfreq and pd.concat.

Use resample to get the proper frequency out of your series.

asfreq (which sounds sort of dirty) is use to convert back to a series with frequency defined in resample.

Concatenate with frame1 to get values side-by-side.

df = pd.concat([frame1,frame2.resample('1D').asfreq()],axis=1)
df.head()

Output:

                   0         1
2016-01-01  0.147067  0.235858
2016-01-02  0.820398  0.353275
2016-01-03  0.840499  0.186273
2016-01-04  0.505740  0.340201
2016-01-05  0.547840  0.695041

Then, you can us the following to get back to your series of frame2 exceeding frame1.

df.columns = ['frame1','frame2']
df.query('framed1 < frame2')['frame2']



回答2:


Still not sure what you want to do with the information. But I'd do:

  • make a copy of frame2
  • split its index into a date and time component
  • compare specifying a level

frame3 = frame2.copy()
frame3.index = [pd.to_datetime(frame3.index.date), frame.index.time]
results = frame3.lt(frame1, level=0)

results.head()

2016-01-01  00:00:00    True
            01:00:00    True
            02:00:00    True
            03:00:00    True
            04:00:00    True
dtype: bool


来源:https://stackoverflow.com/questions/43265590/pandas-loop-over-two-datetime-indexes-with-different-sizes-to-compare-days-and

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!