Python 3: gzip.open() and modes

后端 未结 2 432
情书的邮戳
情书的邮戳 2021-01-18 03:20

https://docs.python.org/3/library/gzip.html

I am considering to use gzip.open(), and I am a little confused about the mode argument:

相关标签:
2条回答
  • 2021-01-18 04:02

    It means that r defaults to rb, and if you want text you have to specify it using rt.

    (as opposed to open behaviour where r means rt, not rb)

    0 讨论(0)
  • 2021-01-18 04:11

    Exactly as you say and as already covered by @

    Jean-François Fabre answer.
    I just wanted to show some code, as it was fun.
    Let's have a look at the gzip.py source code in the python library to see that's effectively what happens.
    The gzip.open() can be found here https://github.com/python/cpython/blob/master/Lib/gzip.py and I report below

    def open(filename, mode="rb", compresslevel=9,
             encoding=None, errors=None, newline=None):
        """Open a gzip-compressed file in binary or text mode.
        The filename argument can be an actual filename (a str or bytes object), or
        an existing file object to read from or write to.
        The mode argument can be "r", "rb", "w", "wb", "x", "xb", "a" or "ab" for
        binary mode, or "rt", "wt", "xt" or "at" for text mode. The default mode is
        "rb", and the default compresslevel is 9.
        For binary mode, this function is equivalent to the GzipFile constructor:
        GzipFile(filename, mode, compresslevel). In this case, the encoding, errors
        and newline arguments must not be provided.
        For text mode, a GzipFile object is created, and wrapped in an
        io.TextIOWrapper instance with the specified encoding, error handling
        behavior, and line ending(s).
        """
        if "t" in mode:
            if "b" in mode:
                raise ValueError("Invalid mode: %r" % (mode,))
        else:
            if encoding is not None:
                raise ValueError("Argument 'encoding' not supported in binary mode")
            if errors is not None:
                raise ValueError("Argument 'errors' not supported in binary mode")
            if newline is not None:
                raise ValueError("Argument 'newline' not supported in binary mode")
    
        gz_mode = mode.replace("t", "")
        if isinstance(filename, (str, bytes, os.PathLike)):
            binary_file = GzipFile(filename, gz_mode, compresslevel)
        elif hasattr(filename, "read") or hasattr(filename, "write"):
            binary_file = GzipFile(None, gz_mode, compresslevel, filename)
        else:
            raise TypeError("filename must be a str or bytes object, or a file")
    
        if "t" in mode:
            return io.TextIOWrapper(binary_file, encoding, errors, newline)
        else:
            return binary_file  
    

    Few things we notice:

    • the default mode is rb as the documentation you report says
    • to open a binary file, it doesn't care whether it's "r", "rb", "w", "wb" for example.
      This we can see in the following lines:

      gz_mode = mode.replace("t", "")
      if isinstance(filename, (str, bytes, os.PathLike)):
          binary_file = GzipFile(filename, gz_mode, compresslevel)
      elif hasattr(filename, "read") or hasattr(filename, "write"):
          binary_file = GzipFile(None, gz_mode, compresslevel, filename)
      else:
          raise TypeError("filename must be a str or bytes object, or a file")
      
      if "t" in mode:
          return io.TextIOWrapper(binary_file, encoding, errors, newline)
      else:
          return binary_file
      

      basically the binary file binary_file gets built wether there's an additional b or not as gz_mode can have the b or not at this point.
      Now the class class GzipFile(_compression.BaseStream) is called to build binary_file.

    In the constructor the following lines are important:

     if mode and ('t' in mode or 'U' in mode):
            raise ValueError("Invalid mode: {!r}".format(mode))
        if mode and 'b' not in mode:
            mode += 'b'
        if fileobj is None:
            fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
        if filename is None:
            filename = getattr(fileobj, 'name', '')
            if not isinstance(filename, (str, bytes)):
                filename = ''
        else:
            filename = os.fspath(filename)
        if mode is None:
            mode = getattr(fileobj, 'mode', 'rb')
    

    where can be clearly seen that if 'b' is not present in the mode it will be added

    if mode and 'b' not in mode:
                mode += 'b'  
    

    so there's no distinction between the two modes as already discussed.

    0 讨论(0)
提交回复
热议问题