Extending Python's os.walk function on FTP server

后端 未结 4 1445
[愿得一人]
[愿得一人] 2021-02-15 15:01

How can I make os.walk traverse the directory tree of an FTP database (located on a remote server)? The way the code is structured now is (comments provided):

4条回答
  •  隐瞒了意图╮
    2021-02-15 15:23

    All you need is utilizing the python's ftplib module. Since os.walk() is based on a Breadth-first search algorithm you need to find the directories and file names at each iteration, then continue the traversing recursively from the first directory. I implemented this algorithm about 2 years ago for using as the heart of FTPwalker, which is an optimum package for traversing extremely large directory trees Through FTP.

    from os import path as ospath
    
    
    class FTPWalk:
        """
        This class is contain corresponding functions for traversing the FTP
        servers using BFS algorithm.
        """
        def __init__(self, connection):
            self.connection = connection
    
        def listdir(self, _path):
            """
            return files and directory names within a path (directory)
            """
    
            file_list, dirs, nondirs = [], [], []
            try:
                self.connection.cwd(_path)
            except Exception as exp:
                print ("the current path is : ", self.connection.pwd(), exp.__str__(),_path)
                return [], []
            else:
                self.connection.retrlines('LIST', lambda x: file_list.append(x.split()))
                for info in file_list:
                    ls_type, name = info[0], info[-1]
                    if ls_type.startswith('d'):
                        dirs.append(name)
                    else:
                        nondirs.append(name)
                return dirs, nondirs
    
        def walk(self, path='/'):
            """
            Walk through FTP server's directory tree, based on a BFS algorithm.
            """
            dirs, nondirs = self.listdir(path)
            yield path, dirs, nondirs
            for name in dirs:
                path = ospath.join(path, name)
                yield from self.walk(path)
                # In python2 use:
                # for path, dirs, nondirs in self.walk(path):
                #     yield path, dirs, nondirs
                self.connection.cwd('..')
                path = ospath.dirname(path)
    

    Now for using this class, you can simply create a connection object using ftplib module and pass the the object to FTPWalk object and just loop over the walk() function:

    In [2]: from test import FTPWalk
    
    In [3]: import ftplib
    
    In [4]: connection = ftplib.FTP("ftp.uniprot.org")
    
    In [5]: connection.login()
    Out[5]: '230 Login successful.'
    
    In [6]: ftpwalk = FTPWalk(connection)
    
    In [7]: for i in ftpwalk.walk():
                print(i)
       ...:     
    ('/', ['pub'], [])
    ('/pub', ['databases'], ['robots.txt'])
    ('/pub/databases', ['uniprot'], [])
    ('/pub/databases/uniprot', ['current_release', 'previous_releases'], ['LICENSE', 'current_release/README', 'current_release/knowledgebase/complete', 'previous_releases/', 'current_release/relnotes.txt', 'current_release/uniref'])
    ('/pub/databases/uniprot/current_release', ['decoy', 'knowledgebase', 'rdf', 'uniparc', 'uniref'], ['README', 'RELEASE.metalink', 'changes.html', 'news.html', 'relnotes.txt'])
    ...
    ...
    ...
    

提交回复
热议问题