Python pandas: how to remove nan and -inf values

前端 未结 6 532
自闭症患者
自闭症患者 2020-12-02 12:10

I have the following dataframe

           time       X    Y  X_t0     X_tp0  X_t1     X_tp1  X_t2     X_tp2
0         0.002876    0   10     0       NaN   Na         


        
相关标签:
6条回答
  • 2020-12-02 12:56

    df.replace only replaces the first occurrence on the value and thus the error

    df = list(filter(lambda x: x!= inf, df)) would remove all occurrences of inf and then the drop function can be used

    0 讨论(0)
  • 2020-12-02 13:02

    I prefer to set the options so that inf values are calculated to nan;

    s1 = pd.Series([0, 1, 2])
    s2 = pd.Series([2, 1, 0])
    s1/s2
    # Outputs:
    # 0.0
    # 1.0
    # inf
    # dtype: float64
    
    pd.set_option('mode.use_inf_as_na', True)
    s1/s2
    # Outputs:
    # 0.0
    # 1.0
    # NaN
    # dtype: float64
    

    Note you can also use context;

    with pd.option_context('mode.use_inf_as_na', True):
        print(s1/s2)
    # Outputs:
    # 0.0
    # 1.0
    # NaN
    # dtype: float64
    
    0 讨论(0)
  • 2020-12-02 13:03

    You can replace inf and -inf with NaN, and then select non-null rows.

    df[df.replace([np.inf, -np.inf], np.nan).notnull().all(axis=1)]  # .astype(np.float64) ?
    

    or

    df.replace([np.inf, -np.inf], np.nan).dropna(axis=1)
    

    Check the type of your columns returns to make sure they are all as expected (e.g. np.float32/64) via df.info().

    0 讨论(0)
  • 2020-12-02 13:06

    Use pd.DataFrame.isin and check for rows that have any with pd.DataFrame.any. Finally, use the boolean array to slice the dataframe.

    df[~df.isin([np.nan, np.inf, -np.inf]).any(1)]
    
                 time    X    Y  X_t0     X_tp0   X_t1     X_tp1   X_t2     X_tp2
    4        0.037389    3   10     3  0.333333    2.0  0.500000    1.0  1.000000
    5        0.037393    4   10     4  0.250000    3.0  0.333333    2.0  0.500000
    1030308  9.962213  256  268   256  0.000000  256.0  0.003906  255.0  0.003922
    
    0 讨论(0)
  • 2020-12-02 13:07
    df.replace([np.inf, -np.inf], np.nan)
    
    df.dropna(inplace=True)
    
    0 讨论(0)
  • 2020-12-02 13:15

    Instead of dropping rows which contain any nulls and infinite numbers, it is more succinct to the reverse the logic of that and instead return the rows where all cells are finite numbers. The numpy isfinite function does this and the '.all(1)' will only return a TRUE if all cells in row are finite.

    df = df[np.isfinite(df).all(1)]
    
    0 讨论(0)
提交回复
热议问题