Pandas read_csv alters the columns when it starts with 0

前端 未结 1 1421
别那么骄傲
别那么骄傲 2021-01-22 09:24

I have a script where I read from a csv file some zipcodes. The format of the zipcodes are like this:

zipcode
75180
90672
01037
20253
09117
31029
07745
90453
12         


        
相关标签:
1条回答
  • 2021-01-22 09:46

    You need to pass the dtype as str:

    reader = pd.read_csv(file, sep=';', encoding='utf-8-sig', dtype=str)
    

    to read those values as str:

    In [152]:
    import pandas as pd
    import io
    t="""zipcode
    75180
    90672
    01037
    20253
    09117
    31029
    07745
    90453
    12105
    18140
    36108
    10403
    76470
    06628
    93105
    88069
    31094
    84095
    63069"""
    df = pd.read_csv(io.StringIO(t), dtype=str)
    df
    
    Out[152]:
       zipcode
    0    75180
    1    90672
    2    01037
    3    20253
    4    09117
    5    31029
    6    07745
    7    90453
    8    12105
    9    18140
    10   36108
    11   10403
    12   76470
    13   06628
    14   93105
    15   88069
    16   31094
    17   84095
    18   63069
    

    by default pandas sniffs the dytpes and in this case it thinks they are numeric so you lose leading zeroes

    You can also do this as a post-processing step by casting to str and then using the vectorised str.zfill:

    In [154]:
    df['zipcode'] = df['zipcode'].astype(str).str.zfill(5)
    df
    
    Out[154]:
       zipcode
    0    75180
    1    90672
    2    01037
    3    20253
    4    09117
    5    31029
    6    07745
    7    90453
    8    12105
    9    18140
    10   36108
    11   10403
    12   76470
    13   06628
    14   93105
    15   88069
    16   31094
    17   84095
    18   63069
    
    0 讨论(0)
提交回复
热议问题