csv reader behavior with None and empty string

后端 未结 7 1428
夕颜
夕颜 2020-12-01 13:59

I\'d like to distinguishing None and empty strings when going back and forth between Python data structure and csv representation using Python\'s csv

相关标签:
7条回答
  • 2020-12-01 14:50

    You could at least partially side-step what the csv module does by creating your own version of a singleton None-like class/value:

    from __future__ import print_function
    import csv
    try:
        from cStringIO import StringIO  # Python 2.
    except ModuleNotFoundError:
        from io import StringIO  # Python 3.
    
    class NONE(object):
        def __repr__(self): # Method csv.writer class uses to write values.
            return 'NONE'   # Unique string value to represent None.
        def __len__(self):  # Method called to determine length and truthiness.
            return 0
    
    NONE = NONE()  # Singleton instance of the class.
    
    data = [['None value', None], ['NONE value', NONE], ['empty string', '']]
    f = StringIO()
    csv.writer(f).writerows(data)
    
    f = StringIO(f.getvalue())
    print(" input:", data)
    print("output:", [e for e in csv.reader(f)])
    

    Results:

     input: [['None value', None], ['NONE value', NONE],   ['empty string', '']]
    output: [['None value', ''],   ['NONE value', 'NONE'], ['empty string', '']]
    

    Using NONE instead of None would preserve enough information for you to be able to differentiate between it and any actual empty-string data values.

    Even better alternative…

    You could use the same approach to implement a pair of relatively lightweight csv.reader and csv.writer “proxy” classes — necessary since you can't actually subclass the built-in csv classes which are written in C — without introducing a lot of overhead (since the majority of the processing would still be performed by the underlying built-ins). This would make what goes on completely transparent since it's all encapsulated within the proxies.

    from __future__ import print_function
    import csv
    
    
    class csvProxyBase(object): _NONE = '<None>'  # Unique value representing None.
    
    class csvWriter(csvProxyBase):
        def __init__(self, csvfile, *args, **kwrags):
            self.writer = csv.writer(csvfile, *args, **kwrags)
        def writerow(self, row):
            self.writer.writerow([self._NONE if val is None else val for val in row])
        def writerows(self, rows):
            list(map(self.writerow, rows))
    
    class csvReader(csvProxyBase):
        def __init__(self, csvfile, *args, **kwrags):
            self.reader = csv.reader(csvfile, *args, **kwrags)
        def __iter__(self):
            return self
        def __next__(self):
            return [None if val == self._NONE else val for val in next(self.reader)]
        next = __next__  # Python2.x compatibility.
    
    
    if __name__ == '__main__':
    
        try:
            from cStringIO import StringIO  # Python 2.
        except ModuleNotFoundError:
            from io import StringIO  # Python 3.
    
        data = [['None value', None], ['empty string', '']]
        f = StringIO()
        csvWriter(f).writerows(data)
    
        f = StringIO(f.getvalue())
        print("input : ", data)
        print("ouput : ", [e for e in csvReader(f)])
    
    

    Results:

     input: [['None value', None], ['empty string', '']]
    output: [['None value', None], ['empty string', '']]
    
    0 讨论(0)
提交回复
热议问题