Asserting column(s) data type in Pandas

前端 未结 2 1489
执念已碎
执念已碎 2021-02-05 15:27

I\'m trying to find a better way to assert the column data type in Python/Pandas of a given dataframe.

For example:

import pandas as pd
t = pd.DataFrame(         


        
相关标签:
2条回答
  • 2021-02-05 15:36

    You can do this

    import numpy as np
    numeric_dtypes = [np.dtype('int64'), np.dtype('float64')]
    # or whatever types you want
    
    assert t[numeric_cols].apply(lambda c: c.dtype).isin(numeric_dtypes).all()
    
    0 讨论(0)
  • 2021-02-05 15:48

    You could use ptypes.is_numeric_dtype to identify numeric columns, ptypes.is_string_dtype to identify string-like columns, and ptypes.is_datetime64_any_dtype to identify datetime64 columns:

    import pandas as pd
    import pandas.api.types as ptypes
    
    t = pd.DataFrame({'a':[1,2,3], 'b':[2,6,0.75], 'c':['foo','bar','beer'],
                  'd':pd.date_range('2000-1-1', periods=3)})
    cols_to_check = ['a', 'b']
    
    assert all(ptypes.is_numeric_dtype(t[col]) for col in cols_to_check)
    # True
    assert ptypes.is_string_dtype(t['c'])
    # True
    assert ptypes.is_datetime64_any_dtype(t['d'])
    # True
    

    The pandas.api.types module (which I aliased to ptypes) has both a is_datetime64_any_dtype and a is_datetime64_dtype function. The difference is in how they treat timezone-aware array-likes:

    In [239]: ptypes.is_datetime64_any_dtype(pd.DatetimeIndex([1, 2, 3], tz="US/Eastern"))
    Out[239]: True
    
    In [240]: ptypes.is_datetime64_dtype(pd.DatetimeIndex([1, 2, 3], tz="US/Eastern"))
    Out[240]: False
    
    0 讨论(0)
提交回复
热议问题