问题
I've pulled some stock data from Quandl for both Crude Oil prices (WTI) and Caterpillar (CAT) price. When I concatenate the two dataframes together I'm left with some NaNs. My ultimate goal is to run a .Pearsonr() to assess the correlation (along with p-values), however I can't get Pearsonr() to work because of all the Nan's. So I'm trying to clean them up. When I use the .fillNA() function it doesn't seem to be working. I've even tried .interpolate() as well as .dropna(). None of them appear to work. Here is my working code.
import Quandl
import pandas as pd
import numpy as np
#WTI Data#
WTI_daily = Quandl.get("DOE/RWTC", collapse="daily",trim_start="1986-10-10", trim_end="1986-10-15")
WTI_daily.columns = ['WTI']
#CAT Data
CAT_daily = Quandl.get("YAHOO/CAT.6", collapse = "daily",trim_start="1986-10-10", trim_end="1986-10-15")
CAT_daily.columns = ['CAT']
#Combine Data Frames
daily_price_df = pd.concat([CAT_daily, WTI_daily], axis=1)
print daily_price_df
#Verify they are dataFrames:
def really_a_df(var):
if isinstance(var, pd.DataFrame):
print "DATAFRAME SUCCESS"
else:
print "Wahh Wahh"
return 'done'
print really_a_df(daily_price_df)
#Fill NAs
#CAN'T GET THIS TO WORK!!
daily_price_df.fillna(method='pad', limit=8)
print daily_price_df
# Try to interpolate
#CAN'T GET THIS TO WORK!!
daily_price_df.interpolate()
print daily_price_df
#Drop NAs
#CAN'T GET THIS TO WORK!!
daily_price_df.dropna(axis=1)
print daily_price_df
For what it's worth I've managed to get the function working when I create a dataframe from scratch using this code:
import pandas as pd
import numpy as np
d = {'a' : 0., 'b' : 1., 'c' : 2.,'d':None,'e':6}
d_series = pd.Series(d, index=['a', 'b', 'c', 'd','e'])
d_df = pd.DataFrame(d_series)
d_df = d_df.fillna(method='pad')
print d_df
Initially I was thinking that perhaps my data wasn't in dataframe form, but I used a simple test to confirm they are in fact dataframe. The only conclusion I that remains (in my opinion) is that it is something about the structure of the Quandl dataframe, or possibly the TimeSeries nature. Please know I'm somewhat new to python so structure answers for a begginner/novice. Any help is much appreciated!
回答1:
pot shot - have you just forgotten to assign or use the inplace flag.
daily_price_df = daily_price_df.fillna(method='pad', limit=8)
OR
daily_price_df.fillna(method='pad', limit=8, inplace=True)
来源:https://stackoverflow.com/questions/35461548/filling-data-using-fillna-data-pulled-from-quandl