Why does pandas convert unsigned int greater than 2**63-1 to objects?

馋奶兔 提交于 2019-12-06 01:51:43

问题


When I convert a numpy array to a pandas data frame pandas changes uint64 types to object types if the integer is greater than 2^63 - 1.

import pandas as pd
import numpy as np

x = np.array([('foo', 2 ** 63)], dtype = np.dtype([('string', np.str_, 3), ('unsigned', np.uint64)]))
y = np.array([('foo', 2 ** 63 - 1)], dtype = np.dtype([('string', np.str_, 3), ('unsigned', np.uint64)]))

print pd.DataFrame(x).dtypes.unsigned
dtype('O')
print pd.DataFrame(y).dtypes.unsigned
dtype('uint64')

This is annoying as I can't write the data frame to a hdf file in the table format:

pd.DataFrame(x).to_hdf('x.hdf', 'key', format = 'table')

Ouput:

TypeError: Cannot serialize the column [unsigned] because its data contents are [integer] object dtype

Can someone explain the type conversion?


回答1:


It's an open bug, but you can force it back to an uint64 using DataFrame.astype()

x = np.array([('foo', 2 ** 63)], dtype = np.dtype([('string', np.str_, 3), ('unsigned', np.uint64)]))

a = pd.DataFrame(x)
a['unsigned'] = a['unsigned'].astype(np.uint64)
>>>a.dtypes
string      object
unsigned    uint64
dtype: object

Other methods used to convert data types to numeric values raised errors or did not work:

>>>pd.to_numeric(a['unsigned'], errors = coerce)
OverflowError: Python int too large to convert to C long

>>>a.convert_objects(convert_numeric = True).dtypes
string      object
unsigned    object
dtype: object



回答2:


x = np.array([('foo', 2 ** 63)], 
             dtype = np.dtype([('string', np.str_, 3), 
                               ('unsigned', 'f4')]))

y = np.array([('foo', 2 ** 63 - 1)], 
             dtype = np.dtype([('string', np.str_, 3), 
                               ('unsigned', 'i8')]))


来源:https://stackoverflow.com/questions/34283319/why-does-pandas-convert-unsigned-int-greater-than-263-1-to-objects

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!