Pandas Changing the format of NaN values when saving to CSV

后端 未结 4 2324
南笙
南笙 2021-02-19 06:55

I am working with a df and using numpy to transform data - including setting blanks (or \'\') to NaN. But when I write the df to csv - the output contains the string \'nan\' as

相关标签:
4条回答
  • 2021-02-19 07:31

    In my situation, the culprit was np.where. When the data types of the two return elements are different, then your np.NaN will be converted to a nan.

    It's hard (for me) to see exactly what's going on under the hood, but I suspect this might be true for other Numpy array methods that have mixed types.

    A minimal example:

    import numpy as np
    import pandas as pd
    
    seq = [1, 2, 3, 4, np.NaN]
    same_type_seq = np.where("parrot"=="dead", 0, seq)
    diff_type_seq = np.where("parrot"=="dead", "spam", seq)
    
    pd.Series(seq).to_csv("vanilla_nan.csv", header=False) # as expected, last row is blank
    pd.Series(same_type_seq).to_csv("samey_nan.csv", header=False) # also, blank
    pd.Series(diff_type_seq).to_csv("nany_nan.csv", header=False) # nan instead of blank
    

    So how to get round this? I'm not too sure, but as a hacky workaround for small datasets, you can replace NaN in your original sequence with a token string and then replace it back to np.NaN

    repl = "missing"
    hacky_seq = np.where("parrot"=="dead", "spam", [repl if np.isnan(x) else x for x in seq])
    pd.Series(hacky_seq).replace({repl:np.NaN}).to_csv("hacky_nan.csv", header=False)
    
    0 讨论(0)
  • 2021-02-19 07:32

    Using df.replace may help -

    df = df.replace(np.nan, '', regex=True)
    df.to_csv("df.csv", index=False)
    

    (This sets all the null values to '' i.e empty string.)

    0 讨论(0)
  • 2021-02-19 07:46

    User @coldspeed illustrates how to replace nan values with NULL when save pd.DataFrame. In case, for data analysis, one is interested in replacing the "NULL" values in pd.DataFrame with np.NaN values, the following code will do:

    import numpy as np, pandas as pd
    
    # replace NULL values with np.nan
    colNames = mydf.columns.tolist()
    dfVals = mydf.values
    matSyb = mydf.isnull().values
    dfVals[matSyb] = np.NAN
    
    mydf = pd.DataFrame(dfVals, columns=colNames)    
    #np.nansum(mydf.values, axis=0 )
    #np.nansum(dfVals, axis=0 )
    
    0 讨论(0)
  • 2021-02-19 07:47

    Pandas to the rescue, use na_rep to fix your own representation for NaNs.

    df.to_csv('file.csv', na_rep='NULL')
    

    file.csv

    ,index,x,y,z
    0,0,1.0,NULL,2
    1,1,NULL,3.0,4
    
    0 讨论(0)
提交回复
热议问题