Since you are using Python, you might try UnicodeDammit. It is part of Beautiful Soup that you also may find useful.
Like the name suggests, UnicodeDammit
will try to do whatever it takes to get proper unicode out of the crap you may find in the world.