How to make Python check if ftp directory exists?

前端 未结 8 1828
一生所求
一生所求 2021-02-12 20:17

I\'m using this script to connect to sample ftp server and list available directories:

from ftplib import FTP
ftp = FTP(\'ftp.cwi.nl\')   # connect to host, defa         


        
8条回答
  •  南方客
    南方客 (楼主)
    2021-02-12 20:39

    => I found this web-page while googling for a way to check if a file exists using ftplib in python. The following is what I figured out (hope it helps someone):

    => When trying to list non-existent files/directories, ftplib raises an exception. Even though Adding a try/except block is a standard practice and a good idea, I would prefer my FTP scripts to download file(s) only after making sure they exist. This helps in keeping my scripts simpler - at least when listing a directory on the FTP server is possible.

    For example, the Edgar FTP server has multiple files that are stored under the directory /edgar/daily-index/. Each file is named liked "master.YYYYMMDD.idx". There is no guarantee that a file will exist for every date (YYYYMMDD) - there is no file dated 24th Nov 2013, but there is a file dated: 22th Nov 2013. How does listing work in these two cases?

    # Code
    from __future__ import print_function  
    import ftplib  
    
    ftp_client = ftplib.FTP("ftp.sec.gov", "anonymous", "MY.EMAIL@gmail.com")  
    resp = ftp_client.sendcmd("MLST /edgar/daily-index/master.20131122.idx")  
    print(resp)   
    resp = ftp_client.sendcmd("MLST /edgar/daily-index/master.20131124.idx")  
    print(resp)  
    
    # Output
    250-Start of list for /edgar/daily-index/master.20131122.idx  
    modify=20131123030124;perm=adfr;size=301580;type=file;unique=11UAEAA398;  
    UNIX.group=1;UNIX.mode=0644;UNIX.owner=1019;  
    /edgar/daily-index/master.20131122.idx
    250 End of list  
    
    Traceback (most recent call last):
    File "", line 10, in 
    resp = ftp_client.sendcmd("MLST /edgar/daily-index/master.20131124.idx")
    File "lib/python2.7/ftplib.py", line 244, in sendcmd
    return self.getresp()
    File "lib/python2.7/ftplib.py", line 219, in getresp
    raise error_perm, resp
    ftplib.error_perm: 550 '/edgar/daily-index/master.20131124.idx' cannot be listed
    

    As expected, listing a non-existent file generates an exception.

    => Since I know that the Edgar FTP server will surely have the directory /edgar/daily-index/, my script can do the following to avoid raising exceptions due to non-existent files:
    a) list this directory.
    b) download the required file(s) if they are are present in this listing - To check the listing I typically perform a regexp search, on the list of strings that the listing operation returns.

    For example this script tries to download files for the past three days. If a file is found for a certain date then it is downloaded, else nothing happens.

    import ftplib
    import re
    from datetime import date, timedelta
    
    ftp_client = ftplib.FTP("ftp.sec.gov", "anonymous", "MY.EMAIL@gmail.com")
    listing = []
    # List the directory and store each directory entry as a string in an array
    ftp_client.retrlines("LIST /edgar/daily-index", listing.append)
    # go back 1,2 and 3 days
    for diff in [1,2,3]:
      today = (date.today() - timedelta(days=diff)).strftime("%Y%m%d")
      month = (date.today() - timedelta(days=diff)).strftime("%Y_%m")
      # the absolute path of the file we want to download - if it indeed exists
      file_path = "/edgar/daily-index/master.%(date)s.idx" % { "date": today }
      # create a regex to match the file's name
      pattern = re.compile("master.%(date)s.idx" % { "date": today })
      # filter out elements from the listing that match the pattern
      found = filter(lambda x: re.search(pattern, x) != None, listing)
      if( len(found) > 0 ):
        ftp_client.retrbinary(
          "RETR %(file_path)s" % { "file_path": file_path },
          open(
            './edgar/daily-index/%(month)s/master.%(date)s.idx' % {
              "date": today
            }, 'wb'
          ).write
        )
    

    => Interestingly, there are situations where we cannot list a directory on the FTP server. The edgar FTP server, for example, disallows listing on /edgar/data because it contains far too many sub-directories. In such cases, I wouldn't be able to use the "List and check for existence" approach described here - in these cases I would have to use exception handling in my downloader script to recover from non-existent file/directory access attempts.

提交回复
热议问题