Check if dataframe column is Categorical

后端 未结 5 1904
灰色年华
灰色年华 2020-12-29 00:53

I can\'t seem to get a simple dtype check working with Pandas\' improved Categoricals in v0.15+. Basically I just want something like is_categorical(column) -> True

相关标签:
5条回答
  • 2020-12-29 01:29

    Just putting this here because pandas.DataFrame.select_dtypes() is what I was actually looking for:

    df['column'].name in df.select_dtypes(include='category').columns
    

    Thanks to @Jeff.

    0 讨论(0)
  • 2020-12-29 01:36

    In my pandas version (v1.0.3), a shorter version of joris' answer is available.

    df = pd.DataFrame({'noncat': [1, 2, 3], 'categ': pd.Categorical(['A', 'B', 'C'])})
    
    print(isinstance(df.noncat.dtype, pd.CategoricalDtype))  # False
    print(isinstance(df.categ.dtype, pd.CategoricalDtype))   # True
    
    print(pd.CategoricalDtype.is_dtype(df.noncat)) # False
    print(pd.CategoricalDtype.is_dtype(df.categ))  # True
    
    0 讨论(0)
  • 2020-12-29 01:39

    Use the name property to do the comparison instead, it should always work because it's just a string:

    >>> import numpy as np
    >>> arr = np.array([1, 2, 3, 4])
    >>> arr.dtype.name
    'int64'
    
    >>> import pandas as pd
    >>> cat = pd.Categorical(['a', 'b', 'c'])
    >>> cat.dtype.name
    'category'
    

    So, to sum up, you can end up with a simple, straightforward function:

    def is_categorical(array_like):
        return array_like.dtype.name == 'category'
    
    0 讨论(0)
  • 2020-12-29 01:40

    First, the string representation of the dtype is 'category' and not 'categorical', so this works:

    In [41]: df.cat_column.dtype == 'category'
    Out[41]: True
    

    But indeed, as you noticed, this comparison gives a TypeError for other dtypes, so you would have to wrap it with a try .. except .. block.


    Other ways to check using pandas internals:

    In [42]: isinstance(df.cat_column.dtype, pd.api.types.CategoricalDtype)
    Out[42]: True
    
    In [43]: pd.api.types.is_categorical_dtype(df.cat_column)
    Out[43]: True
    

    For non-categorical columns, those statements will return False instead of raising an error. For example:

    In [44]: pd.api.types.is_categorical_dtype(df.x)
    Out[44]: False
    

    For much older version of pandas, replace pd.api.types in the above snippet with pd.core.common.

    0 讨论(0)
  • 2020-12-29 01:40

    I ran into this thread looking for the exact same functionality, and also found out another option, right from the pandas documentation here.

    It looks like the canonical way to check if a pandas dataframe column is a categorical Series should be the following:

    hasattr(column_to_check, 'cat')
    

    So, as per the example given in the initial question, this would be:

    hasattr(df.x, 'cat') #True
    
    0 讨论(0)
提交回复
热议问题