问题
I am using Python(Pandas) to manipulate high frequency data. Basically, I need to fill the blank cells.
If the this row is blank, then this row will be filled in with the previous existed observation.
My original data example:
Time bid ask
15:00 . .
15:00 . .
15:02 76 .
15:02 . 77
15:03 . .
15:03 78 .
15:04 . .
15:05 . 80
15:05 . .
15:05 . .
needs to converted to
Time bid ask
15:00 . .
15:00 . .
15:02 76 .
15:00 76 77
15:00 76 77
15:00 78 77
15:00 78 77
15:00 78 80
15:05 78 80
15:05 78 80
This is my code:
#Import
tan=pd.read_csv('sample.csv')
#From here fill the blank cells
first_line = True
mydata = []
with open(tan, 'rb') as f:
reader = csv.reader(f)
# loop through each row...
for row in reader:
this_row = row
# now do the blank-cell checking...
if first_line:
for colnos in range(len(this_row)):
if this_row[colnos] == '':
this_row[colnos] = 0
first_line = False
else:
for colnos in range(len(this_row)):
if this_row[colnos] == '':
this_row[colnos] = prev_row[colnos]
mydata.append( [this_row] )
prev_row = this_row
However, the code does not work.
System indicates:
TypeError: coercing to Unicode: need string or buffer, DataFrame found
I really appreciated if your can help me to solve this issue. Thanks.
回答1:
Use fillna()
property. You can specify the method as forward fill
as follows
import pandas as pd
data = pd.read_csv('sample.csv')
data = data.fillna(method='ffill') # This one forward fills all the columns.
# You can also apply to specific columns as below
# data[['bid','ask']] = data[['bid','ask']].fillna(method='ffill')
print data
Time bid ask
0 15:00 NaN NaN
1 15:00 NaN NaN
2 15:02 76 NaN
3 15:02 76 77
4 15:03 76 77
5 15:03 78 77
6 15:04 78 77
7 15:05 78 80
8 15:05 78 80
9 15:05 78 80
回答2:
There is the lesser known ffill
method:
In [102]:
df.ffill()
Out[102]:
Time bid ask
0 15:00 NaN NaN
1 15:00 NaN NaN
2 15:02 76 NaN
3 15:02 76 77
4 15:03 76 77
5 15:03 78 77
6 15:04 78 77
7 15:05 78 80
8 15:05 78 80
9 15:05 78 80
来源:https://stackoverflow.com/questions/31470551/pythonpandas-fills-blanks-cells