How to use glob to read limited set of files with numeric names?

前端 未结 2 376
谎友^
谎友^ 2020-12-14 23:43

How to use glob to only read limited set of files?

I have json files named numbers from 50 to 20000 (e.g. 50.json,51.json,52.json...19999.json,20000.json) within the

相关标签:
2条回答
  • 2020-12-14 23:50

    Although it hardly counts as beautiful code, you could implement your own filtering as follows:

    import os, re
    directory = "/Users/Chris/Dropbox"
    all_files = os.listdir(directory)
    
    read_files = [this_file for this_file in all_files 
                    if (int(re.findall('\d+', this_file)[-1]) > 18000)]
    
    print read_files
    

    The crucial line here (should) iterate through each file name in the directory (for this_file in all_files), pull out a list of number segments in that file name (re.findall('\d+', this_file)), and include it in read_files if the last of these number segments, as an integer, is greater than 18000.

    I think this will break on files with no integers in the name, so user beware.


    Edit: I see the previous answer has been edited to include what looks a much better thought out way to do this.

    0 讨论(0)
  • 2020-12-15 00:11

    You are using the glob syntax incorrectly; the [..] sequence works per character. The following glob would match your files correctly instead:

    '1[5-8][0-9][0-9][0-9].*'
    

    Under the covers, glob uses fnmatch which translates the pattern to a regular expression. Your pattern translates to:

    >>> import fnmatch
    >>> fnmatch.translate('[15000-18000].*')
    '[15000-18000]\\..*\\Z(?ms)'
    

    which matches 1 character before the ., a 0, 1, 5 or 8. Nothing else.

    glob patterns are quite limited; matching numeric ranges is not easy with it; you'd have to create separate globs for ranges, for example (glob('1[8-9][0-9][0-9][0-9]') + glob('2[0-9][0-9][0-9][0-9]'), etc.).

    Do your own filtering instead:

    directory = "/Users/Chris/Dropbox"
    
    for filename in os.listdir(directory):
        basename, ext = os.path.splitext(filename)
        if ext != '.json':
            continue
        try:
            number = int(basename)
        except ValueError:
            continue  # not numeric
        if 18000 <= number <= 19000:
            # process file
            filename = os.path.join(directory, filename)
    
    0 讨论(0)
提交回复
热议问题