How to set a cell to NaN in a pandas dataframe

前端 未结 8 942
时光取名叫无心
时光取名叫无心 2020-12-04 09:51

I\'d like to replace bad values in a column of a dataframe by NaN\'s.

mydata = {\'x\' : [10, 50, 18, 32, 47, 20], \'y\' : [\'12\', \'11\', \'N/A\', \'13\', \         


        
相关标签:
8条回答
  • 2020-12-04 09:55

    You can use replace:

    df['y'] = df['y'].replace({'N/A': np.nan})
    

    Also be aware of the inplace parameter for replace. You can do something like:

    df.replace({'N/A': np.nan}, inplace=True)
    

    This will replace all instances in the df without creating a copy.

    Similarly, if you run into other types of unknown values such as empty string or None value:

    df['y'] = df['y'].replace({'': np.nan})
    
    df['y'] = df['y'].replace({None: np.nan})
    

    Reference: Pandas Latest - Replace

    0 讨论(0)
  • 2020-12-04 09:57

    While using replace seems to solve the problem, I would like to propose an alternative. Problem with mix of numeric and some string values in the column not to have strings replaced with np.nan, but to make whole column proper. I would bet that original column most likely is of an object type

    Name: y, dtype: object
    

    What you really need is to make it a numeric column (it will have proper type and would be quite faster), with all non-numeric values replaced by NaN.

    Thus, good conversion code would be

    pd.to_numeric(df['y'], errors='coerce')
    

    Specify errors='coerce' to force strings that can't be parsed to a numeric value to become NaN. Column type would be

    Name: y, dtype: float64
    
    0 讨论(0)
  • 2020-12-04 10:04

    df.replace('columnvalue',np.NaN,inplace=True)

    0 讨论(0)
  • 2020-12-04 10:08

    Most replies here need to import numpy as np

    There is a built-in solution into pandas itself: pd.NA, to use like this:

    df.replace('N/A', pd.NA)
    
    0 讨论(0)
  • 2020-12-04 10:14

    As of pandas 1.0.0, you no longer need to use numpy to create null values in your dataframe. Instead you can just use pandas.NA (which is of type pandas._libs.missing.NAType), so it will be treated as null within the dataframe but will not be null outside dataframe context.

    0 讨论(0)
  • 2020-12-04 10:15

    You can try these snippets.

    In [16]:mydata = {'x' : [10, 50, 18, 32, 47, 20], 'y' : ['12', '11', 'N/A', '13', '15', 'N/A']}
    In [17]:df=pd.DataFrame(mydata)
    
    In [18]:df.y[df.y=="N/A"]=np.nan
    
    Out[19]:df 
        x    y
    0  10   12
    1  50   11
    2  18  NaN
    3  32   13
    4  47   15
    5  20  NaN
    
    0 讨论(0)
提交回复
热议问题