Python - Loading Zip Codes into a DataFrame as Strings?

久未见 提交于 2019-12-09 19:14:07

问题


I'm using Pandas to load an Excel spreadsheet which contains zip code (e.g. 32771). The zip codes are stored as 5 digit strings in spreadsheet. When they are pulled into a DataFrame using the command...

xls = pd.ExcelFile("5-Digit-Zip-Codes.xlsx")
dfz = xls.parse('Zip Codes')

they are converted into numbers. So '00501' becomes 501.

So my questions are, how do I:

a. Load the DataFrame and keep the string type of the zip codes stored in the Excel file?

b. Convert the numbers in the DataFrame into a five digit string e.g. "501" becomes "00501"?


回答1:


As a workaround, you could convert the ints to 0-padded strings of length 5 using Series.str.zfill:

df['zipcode'] = df['zipcode'].astype(str).str.zfill(5)

Demo:

import pandas as pd
df = pd.DataFrame({'zipcode':['00501']})
df.to_excel('/tmp/out.xlsx')
xl = pd.ExcelFile('/tmp/out.xlsx')
df = xl.parse('Sheet1')
df['zipcode'] = df['zipcode'].astype(str).str.zfill(5)
print(df)

yields

  zipcode
0   00501



回答2:


str(my_zip).zfill(5)

or

print("{0:>05s}".format(str(my_zip)))

are 2 of many many ways to do this




回答3:


You can avoid panda's type inference with a custom converter, e.g. if 'zipcode' was the header of the column with zipcodes:

dfz = xls.parse('Zip Codes', converters={'zipcode': lambda x:x})

This is arguably a bug since the column was originally string encoded, made an issue here



来源:https://stackoverflow.com/questions/33137686/python-loading-zip-codes-into-a-dataframe-as-strings

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!