Is it possible in pandas to interpolate for missing values in multiindex dataframe. This example below does not work as expected:
arr1=np.array(np.arange(1.,
There are different ways depending on how many rows do you have.
I used to deal with a dataset with 70 million rows on my MAC Pro(16G RAM). I had to group rows by product_id, client_id and week number to caculate customer's demand. Like your example, this dataset does not have every product of every week. So I try these ways:
Find missing week number of every product, fill in and reindex. It takes too much time and memory to return result, even when i split the dataset into several pieces.
Find missing week number of every product, make a new dataframe, and concat with origin dataframe. More efficient, but still using too much time(several hours) and memory.
After all, I find this post on Stackoverflow. I try unstack the week number, fillna with "-9999"(an non-existed number) in the empty weeks and stack it again. After that I replace "-9999" with np.nan, then I get what I want. It just takes several minutes to make it done. I think it's the right way to do it.
As a conclusion, if you have limited resource, "reindex" could just be used on a small dataset (I used the first way to process a piece with 5 million rows, it returns in minutes), besides "unstack/stack" chould works on bigger dataframe.