Whats the best way of determining if an image is part of a sequence

问题

I have an image file and I'd like to check if its part of an image sequence using python.

For example i start with this file:

/projects/image_0001.jpg

and i want to check if the file is part of a sequence i.e.

/projects/image_0001.jpg
/projects/image_0002.jpg
/projects/image_0003.jpg
...

Checking for whether there is a sequence of images seems simple if i can determine if the file name could be art of a sequence, i.e. if there is a sequence of numbers of the file name

My first though was to ask the user to add #### to the file path where the numbers should be and input a start and end frame number to replace the hashes with but this is obviously not very user friendly. Is there a way to check for a sequence of numbers in a string with regular expressions or something similar?

回答1:

It's relatively easy to use python's re module to see if a string contains a sequence of digits. You could do something like this:

mo = re.findall('\d+', filename)

This will return a list of all digits sequences in filename. If:

There is a single result (that is, the filename contains only a single sequence of digits), AND
A subsequent filename has a single digit sequence of the same length, AND
The second digit sequence is 1 greater than the previous

...then maybe they're part of a sequence.

回答2:

I'm assuming the problem is more for being able to differentiate between sequenced files on disk than knowing any particular information about the filenames themselves.

If thats the case, and what you're looking for is something that is smart enough to take a list like:

/path/to/file_1.png
/path/to/file_2.png
/path/to/file_3.png
...
/path/to/file_10.png
/path/to/image_1.png
/path/to/image_2.png
...
/path/to/image_10.png

And get back a result saying - I have 2 sequences of files: /path/to/file_#.png and /path/to/image_#.png you are going to need 2 passes - 1st pass to determine valid expressions for files, 2nd pass to figure out what all other files meet that requirement.

You'll also need to know if you're going to support gaps (is it required to be sequential)

/path/to/file_1.png
/path/to/file_2.png
/path/to/file_3.png
/path/to/file_5.png
/path/to/file_6.png
/path/to/file_7.png

Is this 1 sequence (/path/to/file_#.png) or 2 sequences (/path/to/file_1-3.png, /path/to/file_5-7.png)

Also - how do you want to handle numeric files in sequences?

/path/to/file2_1.png
/path/to/file2_2.png
/path/to/file2_3.png

etc.

With that in mind, this is how I would accomplish it:

    import os.path
    import projex.sorting
    import re

    def find_sequences( filenames ):
        """
        Parse a list of filenames into a dictionary of sequences.  Filenames not
        part of a sequence are returned in the None key

        :param      filenames | [<str>, ..]

        :return     {<str> sequence: [<str> filename, ..], ..}
        """
        local_filenames   = filenames[:]
        sequence_patterns = {}
        sequences         = {None: []}

        # sort the files (by natural order) so we always generate a pattern
        # based on the first potential file in a sequence
        local_filenames.sort(projex.sorting.natural)

        # create the expression to determine if a sequence is possible
        # we are going to assume that its always going to be the 
        # last set of digits that makes a sequence, i.e.
        #
        #    test2_1.png
        #    test2_2.png
        #
        # test2 will be treated as part of the name
        # 
        #    test1.png
        #    test2.png
        #
        # whereas here the 1 and 2 are part of the sequence
        #
        # more advanced expressions would be needed to support
        # 
        #    test_01_2.png
        #    test_02_2.png
        #    test_03_2.png

        pattern_expr = re.compile('^(.*)(\d+)([^\d]*)$')

        # process the inputed files for sequences
        for filename in filenames:
            # first, check to see if this filename matches a sequence
            found = False
            for key, pattern in sequence_patterns.items():
                match = pattern.match(filename)
                if ( not match ):
                    continue

                sequences[key].append(filename)
                found = True
                break

            # if we've already been matched, then continue on
            if ( found ):
                continue

            # next, see if this filename should start a new sequence
            basename      = os.path.basename(filename)
            pattern_match = pattern_expr.match(basename)
            if ( pattern_match ):
                opts = (pattern_match.group(1), pattern_match.group(3))
                key  = '%s#%s' % opts

                # create a new pattern based on the filename
                sequence_pattern = re.compile('^%s\d+%s$' % opts)

                sequence_patterns[key] = sequence_pattern
                sequences[key] = [filename]
                continue

            # otherwise, add it to the list of non-sequences
            sequences[None].append(filename)

        # now that we have grouped everything, we'll merge back filenames
        # that were potential sequences, but only contain a single file to the
        # non-sequential list
        for key, filenames in sequences.items():
            if ( key is None or len(filenames) > 1 ):
                continue

            sequences.pop(key)
            sequences[None] += filenames

        return sequences

And an example usage:

>>> test =   ['test1.png','test2.png','test3.png','test4.png','test2_1.png','test2_2.png','test2_3.png','test2_4.png']
>>> results = find_sequences(test)
>>> results.keys()
[None, 'test#.png', 'test2_#.png']

There is a method in there that refers to natural sorting, which is a separate topic. I just used my natural sort method from my projex library. It is open-source, so if you want to use or see it, its here: http://dev.projexsoftware.com/projects/projex

But that topic has been covered elsewhere on the forums, so Just used the method from the library.

来源：https://stackoverflow.com/questions/11855801/whats-the-best-way-of-determining-if-an-image-is-part-of-a-sequence

标签

python

regex

sequential