I am preparing a pandas df for output, and would like to remove the NaN and NaT in the table, and leave those table locations blank. An example would be
myd
This won't win any speed awards, but if the DataFrame is not too long, reassignment using a list comprehension will do the job:
df1['date'] = [d.strftime('%Y-%m-%d') if not pd.isnull(d) else '' for d in df1['date']]
import numpy as np
import pandas as pd
Timestamp = pd.Timestamp
nan = np.nan
NaT = pd.NaT
df1 = pd.DataFrame({
'col1': list('ac'),
'col2': ['b', nan],
'date': (Timestamp('2014-08-14'), NaT)
})
df1['col2'] = df1['col2'].fillna('')
df1['date'] = [d.strftime('%Y-%m-%d') if not pd.isnull(d) else '' for d in df1['date']]
print(df1)
yields
col1 col2 date
0 a b 2014-08-14
1 c
@unutbu's answer will work fine, but if you don't want to modify the DataFrame, you could do something like this. to_html
takes a parameter for how NaN
is represented, to handle the NaT
you need to pass a custom formatting function.
date_format = lambda d : pd.to_datetime(d).strftime('%Y-%m-%d') if not pd.isnull(d) else ''
df1.to_html(na_rep='', formatters={'date': date_format})
If all you want to do is convert to a string:
In [37]: df1.to_csv(None,sep=' ')
Out[37]: ' col1 col2 date\n0 a b "2014-08-14 00:00:00"\n1 c \n'
To replace missing values with a string
In [36]: df1.to_csv(None,sep=' ',na_rep='missing_value')
Out[36]: ' col1 col2 date\n0 a b "2014-08-14 00:00:00"\n1 c missing_value missing_value\n'
I had the same issue: This does it all in place using pandas apply function. Should be the fastest method.
import pandas as pd
df['timestamp'] = df['timestamp'].apply(lambda x: x.strftime('%Y-%m-%d')if not pd.isnull(x) else '')
if your timestamp field is not yet in datetime
format then:
import pandas as pd
df['timestamp'] = pd.to_datetime(df['timestamp']).apply(lambda x: x.strftime('%Y-%m-%d')if not pd.isnull(x) else '')