ConfigParser with Unicode items

ぐ巨炮叔叔 提交于 2019-12-18 12:49:19

问题


my troubles with ConfigParser continue. It seems it doesn't support Unicode very well. The config file is indeed saved as UTF-8, but when ConfigParser reads it it seems to be encoded into something else. I assumed it was latin-1 and I thougt overriding optionxform could help:

-- configfile.cfg -- 
[rules]
Häjsan = 3
☃ = my snowman

-- myapp.py --
# -*- coding: utf-8 -*-  
import ConfigParser

def _optionxform(s):
    try:
        newstr = s.decode('latin-1')
        newstr = newstr.encode('utf-8')
        return newstr
    except Exception, e:
        print e

cfg = ConfigParser.ConfigParser()
cfg.optionxform = _optionxform    
cfg.read("myconfig") 

Of course, when I read the config I get:

'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

I've tried a couple of different variations of decoding 's' but the point seems moot, since it really should be a unicode object from the beginning. After all, the config file is UTF-8? I have confirmed that's something is wrong in the way ConfigParser reads the file by stubbing it out with this DummyConfig class. If I use that then everything is nice unicode, fine and dandy.

-- config.py --
# -*- coding: utf-8 -*-                
apa = {'rules': [(u'Häjsan', 3), (u'☃', u'my snowman')]}

class DummyConfig(object):
    def sections(self):
        return apa.keys()
    def items(self, section):
       return apa[section]
    def add_section(self, apa):
        pass  
    def set(self, *args):
        pass  

Any ideas what could be causing this or suggestions of other config modules that supports Unicode better are most welcome. I don't want to use sys.setdefaultencoding()!


回答1:


The ConfigParser.readfp() method can take a file object, have you tried opening the file object with the correct encoding using the codecs module before sending it to ConfigParser like below:

cfg.readfp(codecs.open("myconfig", "r", "utf8"))

For Python 3.2 or above, readfp() is deprecated. Use read_file() instead.




回答2:


In python 3.2 encoding parameter was introduced to read(), so it can now be used as:

cfg.read("myconfig", encoding='utf-8')



回答3:


Try to overwrite the write function in RawConfigParser() like this:

class ConfigWithCoder(RawConfigParser):
def write(self, fp):
    """Write an .ini-format representation of the configuration state."""
    if self._defaults:
        fp.write("[%s]\n" % "DEFAULT")
        for (key, value) in self._defaults.items():
            fp.write("%s = %s\n" % (key, str(value).replace('\n', '\n\t')))
        fp.write("\n")
    for section in self._sections:
        fp.write("[%s]\n" % section)
        for (key, value) in self._sections[section].items():
            if key == "__name__":
                continue
            if (value is not None) or (self._optcre == self.OPTCRE):
                if type(value) == unicode:
                    value = ''.join(value).encode('utf-8')
                else:
                    value = str(value)
                value = value.replace('\n', '\n\t')
                key = " = ".join((key, value))
            fp.write("%s\n" % (key))
        fp.write("\n")



回答4:


The config module is broken when reading and writing unicode strings as values. I tried to fix it, but got caught up in the odd way the parser works.




回答5:


Seems to be a problem with the ConfigParser version for python 2x, and version for 3x is free of this problem. In this issue of the Python Bug Tracker, the status is Closed + WONTFIX.

I've fixed it editing the ConfigParser.py file. In the write method (about the line 412), change:

key = " = ".join((key, str(value).replace('\n', '\n\t')))

by

key = " = ".join((key, str(value).decode('utf-8').replace('\n', '\n\t')))

I don't know if it's a real solution, but tested in Windows 7 and Ubuntu 15.04, works like a charm, and I can share and work with the same .ini file in both systems.




回答6:


what I did is just:

file_name = file_name.decode("utf-8")
cfg.read(file_name)


来源:https://stackoverflow.com/questions/1648517/configparser-with-unicode-items

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!