Prevent pandas from automatically inferring type in read_csv

后端 未结 2 1749
梦谈多话
梦谈多话 2020-12-29 08:36

I have a #-separated file with three columns: the first is integer, the second looks like a float, but isn\'t, and the third is a string. I attempt to load this directly in

相关标签:
2条回答
  • 2020-12-29 08:49

    I think your best bet is to read the data in as a record array first using numpy.

    # what you described:
    In [15]: import numpy as np
    In [16]: import pandas
    In [17]: x = pandas.read_csv('weird.csv')
    
    In [19]: x.dtypes
    Out[19]: 
    int_field            int64
    floatlike_field    float64  # what you don't want?
    str_field           object
    
    In [20]: datatypes = [('int_field','i4'),('floatlike','S10'),('strfield','S10')]
    
    In [21]: y_np = np.loadtxt('weird.csv', dtype=datatypes, delimiter=',', skiprows=1)
    
    In [22]: y_np
    Out[22]: 
    array([(1, '2.31', 'one'), (2, '3.12', 'two'), (3, '1.32', 'three ')], 
          dtype=[('int_field', '<i4'), ('floatlike', '|S10'), ('strfield', '|S10')])
    
    In [23]: y_pandas = pandas.DataFrame.from_records(y_np)
    
    In [25]: y_pandas.dtypes
    Out[25]: 
    int_field     int64
    floatlike    object  # better?
    strfield     object
    
    0 讨论(0)
  • 2020-12-29 08:52

    I'm planning to add explicit column dtypes in the upcoming file parser engine overhaul in pandas 0.10. Can't commit myself 100% to it but it should be pretty simple with the new infrastructure coming together (http://wesmckinney.com/blog/?p=543).

    0 讨论(0)
提交回复
热议问题