Adding encoding alias to python

后端 未结 3 1491
不知归路
不知归路 2021-01-06 11:59

Is there a way that I can add alias to python for encoding. There are sites on the web that are using the encoding \'windows-1251\' but have their charset set to win-1251, s

相关标签:
3条回答
  • 2021-01-06 12:50

    Encoding aliases can be added by editing aliases.py file.

    # euc_jp codec
    'eucjp'              : 'euc_jp',
    'ujis'               : 'euc_jp',
    'u_jis'              : 'euc_jp',
    'euc_jp_linux'       : 'euc_jp',
    'euc-jp-linux'       : 'euc_jp',
    

    Above I have added two aliases euc_jp_linux and euc-jp-linux to the encoding euc_jp.

    For a 64 bit linux system aliases.py file is generally located under /usr/lib64/python2.6/encodings/

    0 讨论(0)
  • 2021-01-06 12:54

    The encodings module is not well documented so I'd instead use codecs, which is:

    import codecs
    
    def encalias(oldname, newname):
      old = codecs.lookup(oldname)
      new = codecs.CodecInfo(old.encode, old.decode, 
                             streamreader=old.streamreader,
                             streamwriter=old.streamwriter,
                             incrementalencoder=old.incrementalencoder,
                             incrementaldecoder=old.incrementaldecoder,
                             name=newname)
      def searcher(aname):
        if aname == newname:
          return new
        else:
          return None
      codecs.register(searcher)
    

    This is Python 2.6 -- the interface is different in earlier versions.

    If you don't mind relying on a specific version's undocumented internals, @Lennart's aliasing approach is OK, too, of course - and indeed simpler than this;-). But I suspect (as he appears to) that this one is more maintainable.

    0 讨论(0)
  • 2021-01-06 12:58
    >>> import encodings
    >>> encodings.aliases.aliases['win_1251'] = 'cp1251'
    >>> print '\xcc\xce\xd1K\xc2\xc0'.decode('win-1251')
    MOCKBA
    

    Although I personally would consider this monkey-patching, and use my own conversion table. But I can't give any good arguments for that position. :)

    0 讨论(0)
提交回复
热议问题