How do I parse a listing of files to get just the filenames in Python?

前端 未结 7 1542
一向
一向 2021-02-10 23:50

So lets say I\'m using Python\'s ftplib to retrieve a list of log files from an FTP server. How would I parse that list of files to get just the file names (the last column) ins

7条回答
  •  醉话见心
    2021-02-11 00:26

    This best answer

    You may want to use ftp.nlst() instead of ftp.retrlines(). It will give you exactly what you want.

    If you can't, read the following :

    Generators for sysadmin processes

    In his now famous review, Generator Tricks For Systems Programmers An Introduction, David M. Beazley gives a lot of receipes to answer to this kind of data problem with wuick and reusable code.

    E.G :

    # empty list that will receive all the log entry
    log = [] 
    # we pass a callback function bypass the print_line that would be called by retrlines
    # we do that only because we cannot use something better than retrlines
    ftp.retrlines('LIST', callback=log.append)
    # we use rsplit because it more efficient in our case if we have a big file
    files = (line.rsplit(None, 1)[1] for line in log)
    # get you file list
    files_list = list(files)
    

    Why don't we generate immediately the list ?

    Well, it's because doing it this way offer you much flexibility : you can apply any intermediate generator to filter files before turning it into files_list : it's just like pipe, add a line, you add a process without overheat (since it's generators). And if you get rid off retrlines, it still work be it's even better because you don't store the list even one time.

    EDIT : well, I read the comment to the other answer and it says that this won't work if there is any space in the name.

    Cool, this will illustrate why this method is handy. If you want to change something in the process, you just change a line. Swap :

    files = (line.rsplit(None, 1)[1] for line in log)
    

    and

    # join split the line, get all the item from the field 8 then join them
    files = (' '.join(line.split()[8:]) for line in log)
    

    Ok, this may no be obvious here, but for huge batch process scripts, it's nice :-)

提交回复
热议问题