问题
I'm trying to extract Zip Codes from an Excel Spreadsheet and load them into a list as Strings.
import xlrd
BIL = xlrd.open_workbook(r"C:\Temp\Stores.xls)
Worksheet = BIL.sheet_by_name("Open_Locations")
ZIPs = []
for record in Worksheet.col(17):
if record.value == "Zip":
pass
else:
ZIPs.append(record.value)
Unfortunately, this Excel workbook is managed by someone else so I cannot simply go and convert the field holding zip codes in the excel spreadsheet to text to solve my problem. In addition, believe it or not, this Excel spreadsheet also is used by some business intelligence systems. So changing that field from number to String could cause problems for other workflows leveraging this workbook, which I am not privy to.
What I'm finding is that when I print the numbers as they are without casting to integer or string first, I of course get a bunch of floats. I expected that, since Excel stores numbers as floats.
>>>Zips
[u'06405',
04650.0,
10017.0,
71055.0,
70801.0]
What I didn't expect is that when I cast these floats as int to get rid of the decimal values, then cast the result of that as string the result is that any leading or trailing zero which are part of the zip code value are truncated.
import xlrd
BIL = xlrd.open_workbook(r"C:\Temp\Stores.xls)
Worksheet = BIL.sheet_by_name("Open_Locations")
ZIPs = []
for record in Worksheet.col(17):
if record.value == "Zip":
pass
else:
ZIPs.append(str(int(record.value)))
>>>Zips
['6405',
'465',
'10017',
'71055',
'70801']
How can I convert these zip codes to string without dropping the leading or trailing zeros or determine the number of leading and trailing zeros on the value prior to truncation and append them back as appropriate?
回答1:
All ZIP codes (not including the Zip+4) are 5 characters so you could just pad out to 5:
C#
- Use the String.Pad left method: https://msdn.microsoft.com/en-us/library/system.string.padleft%28v=vs.110%29.aspx
ZIPs.append(str.PadLeft(5, '0');
Python:
- Use rjust: http://www.tutorialspoint.com/python/string_rjust.htm
ZIPs.append(str(int(record.value)).rjust(5, '0'))
回答2:
So after some tinkering, it turned out the answer was to:
- Not cast the zip code as an int since this will also truncate any leading zeros
- Explicitly encode the string as utf-8
The presence of the unicode string indicator tipped me off that this might be the answer when it appeared on some values but not all when I printed the list
for record in Worksheet.col(17):
if record.value == "Zip":
pass
else:
# In this case, the value is still being returned as float, because
it has 1 significant digit of 0 appended to the end. So we'll cast
as string and explicitly encode it as utf-8 which will retain the
leading and trailing zeros of the value and also truncate the
significant digits via index.
if len(str(record.value).encode('utf-8')) > 5
ZIPs.append(str(record.value).encode('utf-8'))
else:
# In this case, the value is already being returned as a unicode
string for some reason, probably because of poor excel worksheet
management, but in any case cast as string and explicitly encode
as utf-8 just for peace of mind.
ZIPs.append(str(record.value).encode('utf-8'))
>>>Zips
['06405',
'04650',
'10017',
'71055',
'70801']
If anyone has a more elegant way of doing this, I would love to see it.
回答3:
You can try to do this through string manipulation.
Our assumption here will be that the column will be ZIP codes, so '.0' at the end will never be necessary.
The below would go in your else statement:
record_str = str(record.value)
formatted_record = record_str[:-2] if record_str.endswith('.0') else record_str
ZIPs.append(formatted_record )
Or if you want to be risque our assumption here would be reading this column will always have a '.0' otherwise it can cause unexpected behavior.
ZIPs.append(str(record.value)[:-2])
来源:https://stackoverflow.com/questions/28399533/converting-float-to-string-without-truncating-leading-or-trailing-zeros