How to align indexes of many dataframes and fill in respective missing values in Pandas?

非 Y 不嫁゛ 提交于 2019-12-07 12:49:59

问题


I have 4 dataframes with data of similar datetime indexes, however in each of them there are few missing lines and I know that the gaps can be filled using previous known data.

I would like to 'align' these dataframes so that they have union of indexes of all dataframes and to fill in missing values. I know how to do it for 2 dataframes: df1, df2 = df1.align(df2, axis=0, method='pad'), but what is the good way to do it for more than 2?

I have tried this and it does work:

df1 = pd.DataFrame({'values': 1}, index=pd.DatetimeIndex(['2016-06-01', '2016-06-03']))
df2 = pd.DataFrame({'values': 2}, index=pd.DatetimeIndex(['2016-06-02', '2016-06-04', '2016-06-07']))
df3 = pd.DataFrame({'values': 3}, index=pd.DatetimeIndex(['2016-06-01', '2016-06-05']))

commonIndex = df1.index.join(df2.index, how='outer').join(df3.index, how='outer')

for d in [df1, df2, df3]:
    d = d.reindex(index=commonIndex, method='pad')

However when I try same approach on my real data, it gives error: "ValueError: index must be monotonic increasing or decreasing". It is price data, and index looks like this:

DatetimeIndex(['2014-03-24 00:00:00', '2014-03-24 00:01:00',
           '2014-03-24 00:02:00', '2014-03-24 00:03:00',
           '2014-03-24 00:04:00', '2014-03-24 00:05:00',
           '2014-03-24 00:06:00', '2014-03-24 00:07:00',
           '2014-03-24 00:08:00', '2014-03-24 00:09:00',
           ...
           '2014-10-10 17:51:00', '2014-10-10 17:52:00',
           '2014-10-10 17:53:00', '2014-10-10 17:54:00',
           '2014-10-10 17:55:00', '2014-10-10 17:56:00',
           '2014-10-10 17:57:00', '2014-10-10 17:58:00',
           '2014-10-10 17:59:00', '2014-10-10 18:00:00'],
          dtype='datetime64[ns]', name=u'datetime', length=139671, freq=None)

As far as I understand it should be increasing. Not sure about 'monotonic', but I assumed that it wasn't a constraint as demonstrated in above example (there were still missing dates).

Any help is greatly appreciated, and forgive me if I'm not using the correct terminology.

P.S. When I iterate through list of dataframes it looks like they are not actually saved in df1, df2, df3 after applying reindexing. How do I fix that?


回答1:


Is this the behavior you are trying to achieve? Note that this method works regardless of whether or not the indexes on the dataframes are monotonic.

df1 = pd.DataFrame({'values': 1}, index=pd.DatetimeIndex(['2016-06-01', '2016-06-03']))
df2 = pd.DataFrame({'values': 2}, index=pd.DatetimeIndex(['2016-06-02', '2016-06-04', '2016-06-07']))
df3 = pd.DataFrame({'values': 3}, index=pd.DatetimeIndex(['2016-06-01', '2016-06-05']))

df = pd.concat([df1,df2,df3], axis=1).ffill().bfill()
df.columns = ['values1', 'values2', 'values3']
df

Which gives:

          values1  values2  values3
2016-05-04  1.0     2.0     3.0
2016-06-01  1.0     2.0     3.0
2016-06-02  1.0     2.0     3.0
2016-06-03  1.0     2.0     3.0
2016-06-05  1.0     2.0     3.0

Or if you just want the data-frames left separate, this will also work regardless of whether the data-frame has a monotonic index.

commonIndex = df1.index | df2.index | df3.index
df2.reindex(commonIndex).ffill()

EDIT:

I had a snippet here that reproduced your error, but I think it works better as its own question- so take a look here.



来源:https://stackoverflow.com/questions/37981678/how-to-align-indexes-of-many-dataframes-and-fill-in-respective-missing-values-in

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!