Find mixed types in Pandas columns

前端 未结 3 2021
闹比i
闹比i 2020-12-25 15:06

Ever so often I get this warning when parsing data files:

WARNING:py.warnings:/usr/local/python3/miniconda/lib/python3.4/site-
packages/pandas-0.16.0_12_gdcc         


        
相关标签:
3条回答
  • 2020-12-25 15:37

    I'm not entirely sure what you're after, but it's easy enough to find the rows which contain elements which don't share the type of the first row. For example:

    >>> df = pd.DataFrame({"A": np.arange(500), "B": np.arange(500.0)})
    >>> df.loc[321, "A"] = "Fred"
    >>> df.loc[325, "B"] = True
    >>> weird = (df.applymap(type) != df.iloc[0].apply(type)).any(axis=1)
    >>> df[weird]
            A     B
    321  Fred   321
    325   325  True
    
    0 讨论(0)
  • 2020-12-25 16:01

    This approach uses pandas.api.types.infer_dtype to find the columns which have mixed dtypes. It was tested with Pandas 1 under Python 3.8.

    Note that this answer has multiple uses of assignment expressions which work only with Python 3.8 or newer. It can however trivially be modified to not use them.

    if mixed_dtypes := {c: dtype for c in df.columns if (dtype := pd.api.types.infer_dtype(df[c])).startswith("mixed")}:
        raise TypeError(f"Dataframe has one more mixed dtypes: {mixed_dtypes}")
    

    This approach doesn't however find a row with the changed dtype.

    0 讨论(0)
  • 2020-12-25 16:02

    In addition to DSM's answer, with a many-column dataframe it can be helpful to find the columns that change type like so:

    for col in df.columns:
        weird = (df[[col]].applymap(type) != df[[col]].iloc[0].apply(type)).any(axis=1)
        if len(df[weird]) > 0:
            print(col)
    
    0 讨论(0)
提交回复
热议问题