Pandas: ValueError: cannot convert float NaN to integer

后端 未结 5 789
攒了一身酷
攒了一身酷 2020-12-03 00:49

I get ValueError: cannot convert float NaN to integer for following:

df = pandas.read_csv(\'zoom11.csv\')
df[[\'x\']] = df[[\'x\']].astype(i         


        
相关标签:
5条回答
  • 2020-12-03 01:15

    Also, even at the lastest versions of pandas if the column is object type you would have to convert into float first, something like:

    df['column_name'].astype("Float32").astype("Int32")
    

    The size of the float and int if it's 32 or 64 depends on your variable, be aware you may loose some precision if your numbers are to big for the format.

    0 讨论(0)
  • 2020-12-03 01:22

    For identifying NaN values use boolean indexing:

    print(df[df['x'].isnull()])
    

    Then for removing all non-numeric values use to_numeric with parameter errors='coerce' - to replace non-numeric values to NaNs:

    df['x'] = pd.to_numeric(df['x'], errors='coerce')
    

    And for remove all rows with NaNs in column x use dropna:

    df = df.dropna(subset=['x'])
    

    Last convert values to ints:

    df['x'] = df['x'].astype(int)
    
    0 讨论(0)
  • 2020-12-03 01:26

    I know this has been answered but wanted to provide alternate solution for anyone in the future:

    You can use .loc to subset the dataframe by only values that are notnull(), and then subset out the 'x' column only. Take that same vector, and apply(int) to it.

    If column x is float:

    df.loc[df['x'].notnull(), 'x'] = df.loc[df['x'].notnull(), 'x'].apply(int)
    
    0 讨论(0)
  • 2020-12-03 01:34

    ValueError: cannot convert float NaN to integer

    From v0.24, you actually can. Pandas introduces Nullable Integer Data Types which allows integers to coexist with NaNs.

    Given a series of whole float numbers with missing data,

    s = pd.Series([1.0, 2.0, np.nan, 4.0])
    s
    
    0    1.0
    1    2.0
    2    NaN
    3    4.0
    dtype: float64
    
    s.dtype
    # dtype('float64')
    

    You can convert it to a nullable int type (choose from one of Int16, Int32, or Int64) with,

    s2 = s.astype('Int32') # note the 'I' is uppercase
    s2
    
    0      1
    1      2
    2    NaN
    3      4
    dtype: Int32
    
    s2.dtype
    # Int32Dtype()
    

    Your column needs to have whole numbers for the cast to happen. Anything else will raise a TypeError:

    s = pd.Series([1.1, 2.0, np.nan, 4.0])
    
    s.astype('Int32')
    # TypeError: cannot safely cast non-equivalent float64 to int32
    
    0 讨论(0)
  • 2020-12-03 01:34

    if you have null value then in doing mathematical operation you will get this error to resolve it use df[~df['x'].isnull()]df[['x']].astype(int) if you want your dataset to be unchangeable.

    0 讨论(0)
提交回复
热议问题