I am writing a script that will try encoding bytes into many different encodings in Python 2.6. Is there some way to get a list of available encodings that I can iterate ove
You could use a technique to list all modules in the encodings
package.
import pkgutil
import encodings
false_positives = set(["aliases"])
found = set(name for imp, name, ispkg in pkgutil.iter_modules(encodings.__path__) if not ispkg)
found.difference_update(false_positives)
print found
Here's a programmatic way to list all the encodings defined in the stdlib encodings package, note that this won't list user defined encodings. This combines some of the tricks in the other answers but actually produces a working list using the codec's canonical name.
import encodings
import pkgutil
import pprint
all_encodings = set()
for _, modname, _ in pkgutil.iter_modules(
encodings.__path__, encodings.__name__ + '.',
):
try:
mod = __import__(modname, fromlist=[str('__trash')])
except (ImportError, LookupError):
# A few encodings are platform specific: mcbs, cp65001
# print('skip {}'.format(modname))
pass
try:
all_encodings.add(mod.getregentry().name)
except AttributeError as e:
# the `aliases` module doensn't actually provide a codec
# print('skip {}'.format(modname))
if 'regentry' not in str(e):
raise
pprint.pprint(sorted(all_encodings))
The Python source code has a script at Tools/unicode/listcodecs.py
which lists all codecs.
Among the listed codecs, however, there are some that are not Unicode-to-byte converters, like base64_codec
, quopri_codec
and bz2_codec
, as @John Machin pointed out.