How to replace NaN values by Zeroes in a column of a Pandas Dataframe?

后端 未结 13 1401
无人共我
无人共我 2020-11-22 01:37

I have a Pandas Dataframe as below:

      itm Date                  Amount 
67    420 2012-09-30 00:00:00   65211
68    421 2012-09-09 00:00:00   29424
69             


        
相关标签:
13条回答
  • 2020-11-22 01:46

    To replace nan in different columns with different ways:

       replacement= {'column_A': 0, 'column_B': -999, 'column_C': -99999}
       df.fillna(value=replacement)
    
    0 讨论(0)
  • 2020-11-22 01:49

    You could use replace to change NaN to 0:

    import pandas as pd
    import numpy as np
    
    # for column
    df['column'] = df['column'].replace(np.nan, 0)
    
    # for whole dataframe
    df = df.replace(np.nan, 0)
    
    # inplace
    df.replace(np.nan, 0, inplace=True)
    
    0 讨论(0)
  • 2020-11-22 01:50

    The below code worked for me.

    import pandas
    
    df = pandas.read_csv('somefile.txt')
    
    df = df.fillna(0)
    
    0 讨论(0)
  • 2020-11-22 01:54

    I just wanted to provide a bit of an update/special case since it looks like people still come here. If you're using a multi-index or otherwise using an index-slicer the inplace=True option may not be enough to update the slice you've chosen. For example in a 2x2 level multi-index this will not change any values (as of pandas 0.15):

    idx = pd.IndexSlice
    df.loc[idx[:,mask_1],idx[mask_2,:]].fillna(value=0,inplace=True)
    

    The "problem" is that the chaining breaks the fillna ability to update the original dataframe. I put "problem" in quotes because there are good reasons for the design decisions that led to not interpreting through these chains in certain situations. Also, this is a complex example (though I really ran into it), but the same may apply to fewer levels of indexes depending on how you slice.

    The solution is DataFrame.update:

    df.update(df.loc[idx[:,mask_1],idx[[mask_2],:]].fillna(value=0))
    

    It's one line, reads reasonably well (sort of) and eliminates any unnecessary messing with intermediate variables or loops while allowing you to apply fillna to any multi-level slice you like!

    If anybody can find places this doesn't work please post in the comments, I've been messing with it and looking at the source and it seems to solve at least my multi-index slice problems.

    0 讨论(0)
  • 2020-11-22 01:54

    There are two options available primarily; in case of imputation or filling of missing values NaN / np.nan with only numerical replacements (across column(s):

    df['Amount'].fillna(value=None, method= ,axis=1,) is sufficient:

    From the Documentation:

    value : scalar, dict, Series, or DataFrame Value to use to fill holes (e.g. 0), alternately a dict/Series/DataFrame of values specifying which value to use for each index (for a Series) or column (for a DataFrame). (values not in the dict/Series/DataFrame will not be filled). This value cannot be a list.

    Which means 'strings' or 'constants' are no longer permissable to be imputed.

    For more specialized imputations use SimpleImputer():

    from sklearn.impute import SimpleImputer
    si = SimpleImputer(strategy='constant', missing_values=np.nan, fill_value='Replacement_Value')
    df[['Col-1', 'Col-2']] = si.fit_transform(X=df[['C-1', 'C-2']])
    
    
    0 讨论(0)
  • 2020-11-22 01:55

    If you want to fill NaN for a specific column you can use loc:

    d1 = {"Col1" : ['A', 'B', 'C'],
         "fruits": ['Avocado', 'Banana', 'NaN']}
    d1= pd.DataFrame(d1)
    
    output:
    
    Col1    fruits
    0   A   Avocado
    1   B   Banana
    2   C   NaN
    
    
    d1.loc[ d1.Col1=='C', 'fruits' ] =  'Carrot'
    
    
    output:
    
    Col1    fruits
    0   A   Avocado
    1   B   Banana
    2   C   Carrot
    
    0 讨论(0)
提交回复
热议问题