问题
I have some artist names in data['artist']
that I would like to convert to a categorical column via:
x = data['artist'].astype('category').cat.codes
x.dtype
Returns:
dtype('int32')
I am getting negative numbers which suggests some sort of overflow situation. So, I'd like to use np.int64
instead but I can't find documentation on how to accomplish this.
x = data['artist'].astype('category').cat.codes.astype(np.int64)
x.dtype
Gives
dtype('int64')
but it is clear that the int32 gets converted to int64 and so the negative value is still present
x = data['artist'].astype('category').cat.codes.astype(np.int64)
x.min()
-1
回答1:
I think you have NaN
in column artist
, so code is -1
:
data=pd.DataFrame({'artist':[np.nan,'y','z','x','y','z']})
x = data['artist'].astype('category').cat.codes
print x
0 -1
1 1
2 2
3 0
4 1
5 2
dtype: int8
For checking NaN
you can use isnull:
print data[data.artist.isnull()]
artist
0 NaN
来源:https://stackoverflow.com/questions/37148032/convert-text-to-int64-categorical-in-pandas