How to generate pandas DataFrame column of Categorical from string column?

后端 未结 2 1961
臣服心动
臣服心动 2021-01-05 00:41

I can convert a pandas string column to Categorical, but when I try to insert it as a new DataFrame column it seems to get converted right back to Series of str:

<         


        
相关标签:
2条回答
  • 2021-01-05 01:22

    The labels<->levels is stored in the index object.

    • To convert an integer array to string array: index[integer_array]
    • To convert a string array to integer array: index.get_indexer(string_array)

    Here is some exampe:

    In [56]:
    
    c = pd.Categorical.from_array(['a', 'b', 'c', 'd', 'e'])
    
    idx = c.levels
    
    In [57]:
    
    idx[[1,2,1,2,3]]
    
    Out[57]:
    
    Index([b, c, b, c, d], dtype=object)
    
    In [58]:
    
    idx.get_indexer(["a","c","d","e","a"])
    
    Out[58]:
    
    array([0, 2, 3, 4, 0])
    
    0 讨论(0)
  • 2021-01-05 01:25

    The only workaround for pandas pre-0.15 I found is as follows:

    • column must be converted to a Categorical for classifier, but numpy will immediately coerce the levels back to int, losing the factor information
    • so store the factor in a global variable outside the dataframe

    .

    train_LocationNFactor = pd.Categorical.from_array(train['LocationNormalized']) # default order: alphabetical
    
    train['LocationNFactor'] = train_LocationNFactor.labels # insert in dataframe
    

    [UPDATE: pandas 0.15+ added decent support for Categorical]

    0 讨论(0)
提交回复
热议问题