Pandas Changing the format of NaN values when saving to CSV

后端未结

关注

 4  2338

南笙 2021-02-19 06:55

I am working with a df and using numpy to transform data - including setting blanks (or \'\') to NaN. But when I write the df to csv - the output contains the string \'nan\' as

4条回答

一生所求 (楼主)

2021-02-19 07:31

In my situation, the culprit was np.where. When the data types of the two return elements are different, then your np.NaN will be converted to a nan.

It's hard (for me) to see exactly what's going on under the hood, but I suspect this might be true for other Numpy array methods that have mixed types.

A minimal example:

import numpy as np
import pandas as pd

seq = [1, 2, 3, 4, np.NaN]
same_type_seq = np.where("parrot"=="dead", 0, seq)
diff_type_seq = np.where("parrot"=="dead", "spam", seq)

pd.Series(seq).to_csv("vanilla_nan.csv", header=False) # as expected, last row is blank
pd.Series(same_type_seq).to_csv("samey_nan.csv", header=False) # also, blank
pd.Series(diff_type_seq).to_csv("nany_nan.csv", header=False) # nan instead of blank

So how to get round this? I'm not too sure, but as a hacky workaround for small datasets, you can replace NaN in your original sequence with a token string and then replace it back to np.NaN

repl = "missing"
hacky_seq = np.where("parrot"=="dead", "spam", [repl if np.isnan(x) else x for x in seq])
pd.Series(hacky_seq).replace({repl:np.NaN}).to_csv("hacky_nan.csv", header=False)

0 讨论(0)

查看其它4个回答