Recursion seems like the way to go here, but if you're on python 2.xx you want to be checking for unicode
, not str
(the str
type represents a string of bytes, and the unicode
type a string of unicode characters; neither inherits from the other and it is unicode-type strings that are displayed in the interpreter with a u in front of them).
There's also a little syntax error in your posted code (the trailing elif:
should be an else
), and you're not returning the same structure in the case where input is either a dictionary or a list. (In the case of a dictionary, you're returning the converted version of the final key; in the case of a list, you're returning the converted version of the final element. Neither is right!)
You can also make your code pretty and Pythonic by using comprehensions.
Here, then, is what I'd recommend:
def convert(input):
if isinstance(input, dict):
return {convert(key): convert(value) for key, value in input.iteritems()}
elif isinstance(input, list):
return [convert(element) for element in input]
elif isinstance(input, unicode):
return input.encode('utf-8')
else:
return input
One final thing. I changed encode('ascii')
to encode('utf-8')
. My reasoning is as follows: any unicode string that contains only characters in the ASCII character set will be represented by the same byte string when encoded in ASCII as when encoded in utf-8, so using utf-8 instead of ASCII cannot break anything and the change will be invisible as long as the unicode strings you're dealing with use only ASCII characters. However, this change extends the scope of the function to be able to handle strings of characters from the entire unicode character set, rather than just ASCII ones, should such a thing ever be necessary.