Difference between open and codecs.open in Python

前端未结

关注

 8  488

There are two ways to open a text file in Python:

f = open(filename)

And

import codecs
f = codecs.open(filename, encoding=\


                      
              相关标签:


      
      
        
          8条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  感动是毒        
                
              
                            
                2020-12-04 09:57
              
            
            
                                                                       
When you need to open a file that has a certain encoding, you would use the codecs module. 
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  半阙折子戏        
                
              
                            
                2020-12-04 10:02
              
            
            
                                                                       
In Python 2 there are unicode strings and bytestrings. If you just use bytestrings, you can read/write to a file opened with open() just fine. After all, the strings are just bytes. 

The problem comes when, say, you have a unicode string and you do the following:

>>> example = u'Μου αρέσει Ελληνικά'
>>> open('sample.txt', 'w').write(example)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)


So here obviously you either explicitly encode your unicode string in utf-8 or you use codecs.open to do it for you transparently. 

If you're only ever using bytestrings then no problems:

>>> example = 'Μου αρέσει Ελληνικά'
>>> open('sample.txt', 'w').write(example)
>>>


It gets more involved than this because when you concatenate a unicode and bytestring string with the + operator you get a unicode string. Easy to get bitten by that one.

Also codecs.open doesn't like bytestrings with non-ASCII chars being passed in:

codecs.open('test', 'w', encoding='utf-8').write('Μου αρέσει')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/codecs.py", line 691, in write
    return self.writer.write(data)
  File "/usr/lib/python2.7/codecs.py", line 351, in write
    data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xce in position 0: ordinal not in range(128)


The advice about strings for input/ouput is normally "convert to unicode as early as possible and back to bytestrings as late as possible". Using codecs.open allows you to do the latter very easily.

Just be careful that you are giving it unicode strings and not bytestrings that may have non-ASCII characters.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  轻奢々        
                
              
                            
                2020-12-04 10:06
              
            
            
                                                                       
Personally, I always use codecs.open unless there's a clear identified need to use open**.  The reason is that there's been so many times when I've been bitten by having utf-8 input sneak into my programs.  "Oh, I just know it'll always be ascii" tends to be an assumption that gets broken often.

Assuming 'utf-8' as the default encoding tends to be a safer default choice in my experience, since ASCII can be treated as UTF-8, but the converse is not true.  And in those cases when I truly do know that the input is ASCII, then I still do codecs.open as I'm a firm believer in "explicit is better than implicit".

** - in Python 2.x, as the comment on the question states in Python 3 open replaces codecs.open
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  囚心锁ツ        
                
              
                            
                2020-12-04 10:07
              
            
            
                                                                       
I was in a situation to open a .asm file and process the file. 

#https://docs.python.org/3/library/codecs.html#codecs.ignore_errors
#https://docs.python.org/3/library/codecs.html#codecs.Codec.encode


with codecs.open(file, encoding='cp1252', errors ='replace') as file:


Without much trouble I am able to read the entire file, any suggestions?
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  感动是毒        
                
              
                            
                2020-12-04 10:09
              
            
            
                                                                       
Since Python 2.6, a good practice is to use io.open(), which also takes an encoding argument, like the now obsolete codecs.open(). In Python 3, io.open is an alias for the open() built-in. So io.open() works in Python 2.6 and all later versions, including Python 3.4. See docs: http://docs.python.org/3.4/library/io.html

Now, for the original question: when reading text (including "plain text", HTML, XML and JSON) in Python 2 you should always use io.open() with an explicit encoding, or open() with an explicit encoding in Python 3. Doing so means you get correctly decoded Unicode, or get an error right off the bat, making it much easier to debug.

Pure ASCII "plain text" is a myth from the distant past. Proper English text uses curly quotes, em-dashes, bullets, € (euro signs) and even diaeresis (¨). Don't be naïve! (And let's not forget the Façade design pattern!) 

Because pure ASCII is not a real option, open() without an explicit encoding is only useful to read binary files.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  执念已碎        
                
              
                            
                2020-12-04 10:09
              
            
            
                                                                       
codecs.open, i suppose, is just a remnant from the Python 2 days when the built-in open had a much simpler interface and fewer capabilities. In Python 2, built-in open doesn't take an encoding argument, so if you want to use something other than binary mode or the default encoding, codecs.open was supposed to be used.

In Python 2.6, the io module came to the aid to make things a bit simpler.
According to the official documentation

New in version 2.6.

The io module provides the Python interfaces to stream handling.
Under Python 2.x, this is proposed as an alternative to the
built-in file object, but in Python 3.x it is the default
interface to access files and streams.


Having said that, the only use i can think of codecs.open in the current scenario is for the backward compatibility. In all other scenarios (unless you are using Python < 2.6) it is preferable to use io.open. Also in Python 3.x io.open is the same as built-in open

Note:

There is a syntactical difference between codecs.open and io.open as well.

codecs.open:

open(filename, mode='rb', encoding=None, errors='strict', buffering=1)


io.open:

open(file, mode='r', buffering=-1, encoding=None,
     errors=None, newline=None, closefd=True, opener=None)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
   
          
     1
2
下一页
           
           
        
                                  
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复