Defining Data Type during csv file import based on column index in pandas

后端 未结 3 493
夕颜
夕颜 2020-12-21 09:36

I need to import a csv file that has 300+ columns, among these columns, only the first column needs to specified as a category, while the rest of the columns should be float

3条回答
  •  礼貌的吻别
    2020-12-21 10:10

    There are two scenarios:

    1. You know and can therefore specify the optimal type for each column in advance; or
    2. You don't know optimal types in advance and have to convert to optimal types after reading the file.

    Specify in advance

    This is the straightforward case. Use a dictionary:

    type_dict = {'Col_A': 'category', 'Col_B': 'int16',
                 'Col_C': 'float16', 'Col_D': 'float32'}
    
    df = pd.read_csv(myfile, delim_whitespace=True, dtype=type_dict)
    

    If you don't know your column names in advance, just read the columns as an initial step:

    cols = pd.read_csv(myfile, delim_whitespace=True, nrows=0).columns
    # Index(['Col_A', 'Col_B', 'Col_C', 'Col_D'], dtype='object')
    
    type_dict = {'Col_A': 'category', **{col: 'float32' for col in cols[1:]}}
    
    df = pd.read_csv(myfile, delim_whitespace=True, dtype=type_dict)
    

    Specify after reading

    Often you won't know the optimal type beforehand. In this case, you can read in data as normal and perform conversions for int and float explicitly in a subsequent step:

    df = pd.read_csv(myfile, delim_whitespace=True, dtype={'Col_A': 'category'})
    
    cols = {k: df.select_dtypes([k]).columns for k in ('integer', 'float')}
    
    for col_type, col_names in cols.items():
        df[col_names] = df[col_names].apply(pd.to_numeric, downcast=col_type)
    
    print(df.dtypes)
    
    Col_A    category
    Col_B        int8
    Col_C     float32
    Col_D     float32
    dtype: object
    

    Setup used for testing

    from io import StringIO
    
    myfile = StringIO("""Col_A   Col_B   Col_C   Col_D
    001       1       2      1.2
    002       2       3      3.5
    003       3       4.5      7
    004       4       6.5     10""")
    

提交回复
热议问题