Convert text to int64 categorical in Pandas

末鹿安然 提交于 2019-12-22 18:47:35

问题


I have some artist names in data['artist'] that I would like to convert to a categorical column via:

x = data['artist'].astype('category').cat.codes
x.dtype 

Returns:

dtype('int32')

I am getting negative numbers which suggests some sort of overflow situation. So, I'd like to use np.int64 instead but I can't find documentation on how to accomplish this.

x = data['artist'].astype('category').cat.codes.astype(np.int64)
x.dtype

Gives

dtype('int64')

but it is clear that the int32 gets converted to int64 and so the negative value is still present

x = data['artist'].astype('category').cat.codes.astype(np.int64)
x.min()

-1

回答1:


I think you have NaN in column artist, so code is -1:

data=pd.DataFrame({'artist':[np.nan,'y','z','x','y','z']})

x = data['artist'].astype('category').cat.codes
print x
0   -1
1    1
2    2
3    0
4    1
5    2
dtype: int8

For checking NaN you can use isnull:

print data[data.artist.isnull()]
  artist
0    NaN


来源:https://stackoverflow.com/questions/37148032/convert-text-to-int64-categorical-in-pandas

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!