问题
I am having some issues with interpolation of a Pandas dataframe.
Basically, I have a dataframe of 295339 rows and have artificially generated nan's to study different sampling rates and completion methods.
The issue is that when I do some combinations of my sampling rates and completion methods it all works out while for others I get the following error message,
ValueError: The number of derivatives at boundaries does not match: expected. 1, got 0+0.
The type of ValueError
depends on the combination of sampling rate and completion method I'm using.
So for example, if I make one nan per hour per customer and then interpolate using either the linear or the cubic method it works. But if I sample once every four hours per customer it works for the linear method but not for the cubic method (code for the interpolation bellow):
latitude = my_frame.filter(['Customer_id', 'Lat'], axis=1)
latitude = latitude.groupby('Customer_id').apply(lambda group: group.interpolate(method= 'cubic')
The weird thing is that during my tests I limited my approach to 3 customers (representing 8500 rows) for speed purposes and no issues were raised.
So, my question is why does this happen and is there any workaround.
回答1:
I found that the issue was that for customers with fewer records I wasn't capable to interpolate using the cubic method because they did not have at least 4 known points.
来源:https://stackoverflow.com/questions/57412489/pandas-interpolate-returning-valueerrors-for-some-methods-and-some-sizes-of-data