Paramiko Fails to download large files >1GB

前端 未结 7 1401
一向
一向 2020-12-13 10:44
def download():
if os.path.exists( dst_dir_path ) == False:
    logger.error( \"Cannot access destination folder %s. Please check path and permissions. \" % ( dst_di         


        
相关标签:
7条回答
  • 2020-12-13 11:13

    I've run into problems downloading large files (>1 GB) via SFTP using pysftp. Underlying library is Paramiko. Googling about the problem lead me here and there are great solutions. Nevertheless, many posts are relatively old and I suppose a majority of these problems have been solved over time. And it did not help with my problem.

    Which is: Paramiko runs into a memory error while loading chunks during prefetch in sftp_file.py. The list grows beyond limits and memory error was somehow not blocking execution. It was probably silently consumed some way on the stack. The download fails only when this error happens, and they run in separate threads.

    Anyway, the way to control the size of the list is to set the MAX_REQUEST_SIZE:

    paramiko.sftp_file.SFTPFile.MAX_REQUEST_SIZE = pow(2, 22) # 4MB per chunk
    

    If you go over 16MBs though, you'll run into a new problem: paramiko.sftp.SFTPError: Garbage packet received. Turns out there is a check in sftp.py in _read_packet method:

    # most sftp servers won't accept packets larger than about 32k, so
    # anything with the high byte set (> 16MB) is just garbage.
    if byte_ord(x[0]):
        raise SFTPError("Garbage packet received")
    

    So if a chunk is > 16MB we have this error raised. I did not care to fiddle with Paramiko library itself, so I had to keep my chunk size at an 'acceptable maximum' at 4MB.

    This way I was able to download files of size > 30GB. Hope this helps people.

    0 讨论(0)
  • 2020-12-13 11:16

    In addition to Screwtape's answer it's also worth mentioning that you should probably limit the block size with .read([block size in bytes])

    See lazy method for reading big file

    I had real issues with just file.read() without block size size in 2.4 it is possible 2.7 determines the correct block size though.

    0 讨论(0)
  • 2020-12-13 11:16

    I have try to trace the code into paramiko, now I'm sure it is the server problem.

    1. What prefetch has done

    In order to increase the download speed, paramiko try prefetch the file by fetch method.When SFTP_FILE.prefetch() method is called, a new thread is create and tons fetch request will send to server util the whole file is covered.
    we can find this in file paramiko/sftp_file.py around line 464.

    2. How to sure is the server problem

    The request mention above is run in async mode. SFTP_FILE._async_response() is used to receive the response from the server async.And trace down the code, we can find this exception is created in method SFTP_FILE._async_response() which convert from the message sent from the server. Now, We can sure that is the exception from server.

    3. How to solve the problem

    Because I have no access to the server, so use sftp in command line is my best choice.But on the other hand, now we know that too many request makes the server crash, so we can make a sleep when sending the request to server.

    0 讨论(0)
  • 2020-12-13 11:21

    I was running into a similar issue as well.

    Traceback (most recent call last):
    File "---", line 948, in <module>
    main()
    File "---", line 937, in main
    args.sshProxyKeyfile)
    File "---", line 904, in Bootstrap
    CopyFiles(client, builds, k8sVer)
    File "---", line 393, in CopyWcpFilesToVC
    ScpWithClient(client, __file__, __file__)
    File "---", line 621, in ScpWithClient
    with client.open_sftp() as sftp:
    File "---_env.Linux/lib64/python3.5/site-packages/paramiko/client.py", line 556, in open_sftp
    return self._transport.open_sftp_client()
    File "---_env.Linux/lib64/python3.5/site-packages/paramiko/transport.py", line 1097, in open_sftp_client
    return SFTPClient.from_transport(self)
    File "---_env.Linux/lib64/python3.5/site-packages/paramiko/sftp_client.py", line 170, in from_transport
    return cls(chan)
    File "---_env.Linux/lib64/python3.5/site-packages/paramiko/sftp_client.py", line 130, in __init__
    server_version = self._send_version()
    File "---_env.Linux/lib64/python3.5/site-packages/paramiko/sftp.py", line 134, in _send_version
    t, data = self._read_packet()
    File "---_env.Linux/lib64/python3.5/site-packages/paramiko/sftp.py", line 205, in _read_packet
    raise SFTPError("Garbage packet received")
    paramiko.sftp.SFTPError: Garbage packet received
    

    The reason was because bash was not the default shell associated with the user login. Changing the default shell for the user permanently using chsh -s /bin/bash <user> fixed the issue.

    0 讨论(0)
  • 2020-12-13 11:23

    I had a very similar problem, in my case the file is only ~400 MB but it would consistently fail after downloading about 35 MB or so. It didn't always fail at the exact same number of bytes downloaded but somewhere around 35 - 40 MB the file would stop transferring and a minute or so later I would get the "There are insufficient resources to complete the request" error.

    Downloading the file via WinSCP or PSFTP worked fine.

    I tried Screwtape's method, and it did work but was painfully slow. My 400 MB file was on pace to take something like 4 hours to download, which was an unacceptable timeframe for this particular application.

    Also, at one time, when we first set this up, everything worked fine. But the server administrator made some changes to the SFTP server and that's when things broke. I'm not sure what the changes were, but since it still worked OK using WinSCP/other SFTP methods I didn't think it was going to be fruitful to try attacking this from the server side.

    I'm not going to pretend to understand why, but here's what ended up working for me:

    1. I downloaded and installed the current version of Paramiko (1.11.1 at this time). Initially this didn't make any difference at all but I figured I'd mention it just in case it was part of the solution.

    2. The stack trace for the exception was:

      File "C:\Python26\lib\site-packages\paramiko\sftp_client.py", line 676, in get
          size = self.getfo(remotepath, fl, callback)
      File "C:\Python26\lib\site-packages\paramiko\sftp_client.py", line 645, in getfo
          data = fr.read(32768)
      File "C:\Python26\lib\site-packages\paramiko\file.py", line 153, in read
          new_data = self._read(read_size)
      File "C:\Python26\lib\site-packages\paramiko\sftp_file.py", line 157, in _read
          data = self._read_prefetch(size)
      File "C:\Python26\lib\site-packages\paramiko\sftp_file.py", line 138, in _read_prefetch
          self._check_exception()
      File "C:\Python26\lib\site-packages\paramiko\sftp_file.py", line 483, in _check_exception
          raise x
      
    3. Poking around a bit in sftp_file.py, I noticed this (lines 43-45 in the current version):

      # Some sftp servers will choke if you send read/write requests larger than
      # this size.
      MAX_REQUEST_SIZE = 32768
      
    4. On a whim, I tried changing MAX_REQUEST_SIZE to 1024 and, lo and behold, I was able to download the whole file!

    5. After I got it to work by changing the MAX_REQUEST_SIZE to 1024, I tried a bunch of other values between 1024 and 32768 to see if it affected performance or anything. But I always got the error sooner or later when the value was significantly bigger then 1024 (1025 was OK, but 1048 eventually failed).

    0 讨论(0)
  • 2020-12-13 11:25

    The SFTP protocol doesn't have a way to stream file data; instead what it has is a way to request a block of data from a particular offset in an open file. The naive method of downloading a file would be to request the first block, write it to disk, then request the second block, and so forth. This is reliable, but very slow.

    Instead, Paramiko has a performance trick it uses: when you call .get() it immediately sends a request for every block in the file, and it remembers what offset they're supposed to be written to. Then as each response arrives, it makes sure it gets written to the correct offset on-disk. For more information, see the SFTPFile.prefetch() and SFTPFile.readv() methods in the Paramiko documentation. I suspect the book-keeping information it stores when downloading your 1GB file might be causing... something to run out of resources, generating your "insufficient resources" message.

    Rather than using .get(), if you just call .open() to get an SFTPFile instance, then call .read() on that object, or just hand it to the Python standard library function shutil.copyfileobj() to download the contents. That should avoid the Paramiko prefetch cache, and allow you to download the file even if it's not quite as fast.

    i.e:

     def lazy_loading_ftp_file(sftp_host_conn, filename):
        """
            Lazy loading ftp file when exception simple sftp.get call
            :param sftp_host_conn: sftp host
            :param filename: filename to be downloaded
            :return: None, file will be downloaded current directory
        """
        import shutil
        try:
            with sftp_host_conn() as host:
                sftp_file_instance = host.open(filename, 'r')
                with open(filename, 'wb') as out_file:
                    shutil.copyfileobj(sftp_file_instance, out_file)
                return {"status": "sucess", "msg": "sucessfully downloaded file: {}".format(filename)}
        except Exception as ex:
            return {"status": "failed", "msg": "Exception in Lazy reading too: {}".format(ex)}
    
    0 讨论(0)
提交回复
热议问题