csv reader behavior with None and empty string

后端未结

关注

 7  1441

I\'d like to distinguishing None and empty strings when going back and forth between Python data structure and csv representation using Python\'s csv


                      
              相关标签:


      
      
        
          7条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  失恋的感觉        
                
              
                            
                2020-12-01 14:36
              
            
            
                                                                       
I don't think it would be possible to do what you want with a mere dialect, but you could write your own csv.reader/write subclass. On the other hand, I still think that is overkill for this use case. Even if you want to catch more than just None, you probably just want str():

>>> data = [['NULL/None value',None],['empty string','']]
>>> i = cStringIO.StringIO()
>>> csv.writer(i).writerows(map(str,row) for row in data)
>>> print i.getvalue()
NULL/None value,None
empty string,

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  迷失自我        
                
              
                            
                2020-12-01 14:36
              
            
            
                                                                       
I meet this problem too and find this https://bugs.python.org/issue23041.

Solutions from the issue:


  
  subclass csv.DictWriter, use dictionaries as your element type, and have its writerow method do the application-specific work.
  define a writerow() function which does something similar (essentially wrapping csv.writerow()).
  

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  粉色の甜心        
                
              
                            
                2020-12-01 14:39
              
            
            
                                                                       
The documentation suggests that what you want is not possible:


  To make it as easy as possible to interface with modules which implement the DB API, the value None is written as the empty string.


This is in the documentation for the writer class, suggesting it is true for all dialects and is an intrinsic limitation of the csv module.

I for one would support changing this (along with various other limitations of the csv module), but it may be that people would want to offload this sort of work into a different library, and keep the CSV module simple (or at least as simple as it is).

If you need more powerful file-reading capabilities, you might want to look at the CSV reading functions in numpy, scipy, and pandas, which as I recall have more options.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  -上瘾入骨i        
                
              
                            
                2020-12-01 14:41
              
            
            
                                                                       
As others have pointed out you can't really do this via csv.Dialect or parameters to csv.writer and/or csv.reader. However as I said in one comment, you implement it by effectively subclassing the latter two (you apparently can't really do because they're built-in). What the "subclasses" do on writing is simply intercept None values and change them into a unique string and reverse the process when reading them back in. Here's a fully worked-out example:

import csv, cStringIO
NULL = '<NULL>'  # something unlikely to ever appear as a regular value in your csv files

class MyCsvWriter(object):
    def __init__(self, *args, **kwrds):
        self.csv_writer = csv.writer(*args, **kwrds)

    def __getattr__(self, name):
        return getattr(self.csv_writer, name)

    def writerow(self, row):
        self.csv_writer.writerow([item if item is not None else NULL
                                      for item in row])
    def writerows(self, rows):
        for row in rows:
            self.writerow(row)

class MyCsvReader(object):
    def __init__(self, *args, **kwrds):
        self.csv_reader = csv.reader(*args, **kwrds)

    def __getattr__(self, name):
        return getattr(self.csv_reader, name)

    def __iter__(self):
        rows = iter(self.csv_reader)
        for row in rows:
            yield [item if item != NULL else None for item in row]

data = [['NULL/None value', None],
        ['empty string', '']]

f = cStringIO.StringIO()
MyCsvWriter(f).writerows(data)  # instead of csv.writer(f).writerows(data)

f = cStringIO.StringIO(f.getvalue())
data2 = [e for e in MyCsvReader(f)]  # instead of [e for e in csv.reader(f)]

print "input : ", data
print "ouput : ", data2


Output:

input :  [['NULL/None value', None], ['empty string', '']]
ouput :  [['NULL/None value', None], ['empty string', '']]


It's a tad verbose and probably slows the reading & writing of csv file a bit (since they're written in C/C++) but that may make little difference since the process is likely low-level I/O bound anyway. 
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  轻奢々        
                
              
                            
                2020-12-01 14:44
              
            
            
                                                                       
As you have control over both the consumer and the creator of the serialised data, consider using a format that does support that distinction.

Example:

>>> import json
>>> json.dumps(['foo', '', None, 666])
'["foo", "", null, 666]'
>>>

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  无人共我        
                
              
                            
                2020-12-01 14:47
              
            
            
                                                                       
As mentioned above, this is a limitation of the csv module. A solution is just to rewrite the rows inside a loop with a simple dictionary comprehension, as follows:

reader = csv.DictReader(csvfile)
for row in reader:
    # Interpret empty values as None (instead of '')
    row = {k: v if v else None for k, v in row.items()}
    :

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
   
          
     1
2
下一页
           
           
        
                                  
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复