How to replace negative numbers in Pandas Data Frame by zero

前端 未结 5 833
孤独总比滥情好
孤独总比滥情好 2020-11-27 15:09

I would like to know if there is someway of replacing all DataFrame negative numbers by zeros?

相关标签:
5条回答
  • 2020-11-27 15:34

    Another succinct way of doing this is pandas.DataFrame.clip.

    For example:

    import pandas as pd
    
    In [20]: df = pd.DataFrame({'a': [-1, 100, -2]})
    
    In [21]: df
    Out[21]: 
         a
    0   -1
    1  100
    2   -2
    
    In [22]: df.clip(lower=0)
    Out[22]: 
         a
    0    0
    1  100
    2    0
    

    There's also df.clip_lower(0).

    0 讨论(0)
  • 2020-11-27 15:38

    If all your columns are numeric, you can use boolean indexing:

    In [1]: import pandas as pd
    
    In [2]: df = pd.DataFrame({'a': [0, -1, 2], 'b': [-3, 2, 1]})
    
    In [3]: df
    Out[3]: 
       a  b
    0  0 -3
    1 -1  2
    2  2  1
    
    In [4]: df[df < 0] = 0
    
    In [5]: df
    Out[5]: 
       a  b
    0  0  0
    1  0  2
    2  2  1
    

    For the more general case, this answer shows the private method _get_numeric_data:

    In [1]: import pandas as pd
    
    In [2]: df = pd.DataFrame({'a': [0, -1, 2], 'b': [-3, 2, 1],
                               'c': ['foo', 'goo', 'bar']})
    
    In [3]: df
    Out[3]: 
       a  b    c
    0  0 -3  foo
    1 -1  2  goo
    2  2  1  bar
    
    In [4]: num = df._get_numeric_data()
    
    In [5]: num[num < 0] = 0
    
    In [6]: df
    Out[6]: 
       a  b    c
    0  0  0  foo
    1  0  2  goo
    2  2  1  bar
    

    With timedelta type, boolean indexing seems to work on separate columns, but not on the whole dataframe. So you can do:

    In [1]: import pandas as pd
    
    In [2]: df = pd.DataFrame({'a': pd.to_timedelta([0, -1, 2], 'd'),
       ...:                    'b': pd.to_timedelta([-3, 2, 1], 'd')})
    
    In [3]: df
    Out[3]: 
            a       b
    0  0 days -3 days
    1 -1 days  2 days
    2  2 days  1 days
    
    In [4]: for k, v in df.iteritems():
       ...:     v[v < 0] = 0
       ...:     
    
    In [5]: df
    Out[5]: 
           a      b
    0 0 days 0 days
    1 0 days 2 days
    2 2 days 1 days
    

    Update: comparison with a pd.Timedelta works on the whole DataFrame:

    In [1]: import pandas as pd
    
    In [2]: df = pd.DataFrame({'a': pd.to_timedelta([0, -1, 2], 'd'),
       ...:                    'b': pd.to_timedelta([-3, 2, 1], 'd')})
    
    In [3]: df[df < pd.Timedelta(0)] = 0
    
    In [4]: df
    Out[4]: 
           a      b
    0 0 days 0 days
    1 0 days 2 days
    2 2 days 1 days
    
    0 讨论(0)
  • 2020-11-27 15:43

    Another clean option that I have found useful is pandas.DataFrame.mask which will "replace values where the condition is true."

    Create the DataFrame:

    In [2]: import pandas as pd
    
    In [3]: df = pd.DataFrame({'a': [0, -1, 2], 'b': [-3, 2, 1]})
    
    In [4]: df
    Out[4]: 
       a  b
    0  0 -3
    1 -1  2
    2  2  1
    

    Replace negative numbers with 0:

    In [5]: df.mask(df < 0, 0)
    Out[5]: 
       a  b
    0  0  0
    1  0  2
    2  2  1
    
    

    Or, replace negative numbers with NaN, which I frequently need:

    In [7]: df.mask(df < 0)
    Out[7]: 
         a    b
    0  0.0  NaN
    1  NaN  2.0
    2  2.0  1.0
    
    0 讨论(0)
  • 2020-11-27 15:53

    If you are dealing with a large df (40m x 700 in my case) it works much faster and memory savvy through iteration on columns with something like.

    for col in df.columns:
        df[col][df[col] < 0] = 0
    
    0 讨论(0)
  • 2020-11-27 15:54

    Perhaps you could use pandas.where(args) like so:

    data_frame = data_frame.where(data_frame < 0, 0)
    
    0 讨论(0)
提交回复
热议问题