Pandas Changing the format of NaN values when saving to CSV

后端 未结 4 2338
南笙
南笙 2021-02-19 06:55

I am working with a df and using numpy to transform data - including setting blanks (or \'\') to NaN. But when I write the df to csv - the output contains the string \'nan\' as

4条回答
  •  一生所求
    2021-02-19 07:31

    In my situation, the culprit was np.where. When the data types of the two return elements are different, then your np.NaN will be converted to a nan.

    It's hard (for me) to see exactly what's going on under the hood, but I suspect this might be true for other Numpy array methods that have mixed types.

    A minimal example:

    import numpy as np
    import pandas as pd
    
    seq = [1, 2, 3, 4, np.NaN]
    same_type_seq = np.where("parrot"=="dead", 0, seq)
    diff_type_seq = np.where("parrot"=="dead", "spam", seq)
    
    pd.Series(seq).to_csv("vanilla_nan.csv", header=False) # as expected, last row is blank
    pd.Series(same_type_seq).to_csv("samey_nan.csv", header=False) # also, blank
    pd.Series(diff_type_seq).to_csv("nany_nan.csv", header=False) # nan instead of blank
    

    So how to get round this? I'm not too sure, but as a hacky workaround for small datasets, you can replace NaN in your original sequence with a token string and then replace it back to np.NaN

    repl = "missing"
    hacky_seq = np.where("parrot"=="dead", "spam", [repl if np.isnan(x) else x for x in seq])
    pd.Series(hacky_seq).replace({repl:np.NaN}).to_csv("hacky_nan.csv", header=False)
    

提交回复
热议问题