How to slice a generator object or iterator?

后端 未结 4 674
小鲜肉
小鲜肉 2020-12-03 11:04

I would like to loop over a \"slice\" of an iterator. I\'m not sure if this is possible as I understand that it is not possible to slice an iterator. What I would like to do

相关标签:
4条回答
  • 2020-12-03 11:28

    let's clarify something first. Spouse you want to extract the first values ​​from your generator, based on the number of arguments you specified to the left of the expression. Starting from this moment, we have a problem, because in Python there are two alternatives to unpack something.

    Let's discuss these alternatives using the following example. Imagine you have the following list l = [1, 2, 3]

    1) The first alternative is to NOT use the "start" expression

    a, b, c = l # a=1, b=2, c=3
    

    This works great if the number of arguments at the left of the expression (in this case, 3 arguments) is equal to the number of elements in the list. But, if you try something like this

    a, b = l # ValueError: too many values to unpack (expected 2)
    

    This is because the list contains more arguments than those specified to the left of the expression

    2) The second alternative is to use the "start" expression; this solve the previous error

    a, b, c* = l #  a=1, b=2, c=[3]
    

    The "start" argument act like a buffer list. The buffer can have three possible values:

        a, b, *c = [1, 2] # a=1, b=2, c=[]
        a, b, *c = [1, 2, 3] # a=1, b=2, c=[3]
        a, b, *c = [1, 2, 3, 4, 5] # a=1, b=2, c=[3,4,5]
    

    Note that the list must contain at least 2 values (in the above example). If not, an error will be raised

    Now, jump to your problem. If you try something like this:

    a, b, c = generator
    

    This will work only if the generator contains only three values (the number of the generator values must be the same as the number of left arguments). Elese, an error will be raise.

    If you try something like this:

    a, b, *c = generator
    
    • If the number of values in the generator is lower than 2; an error will be raise because variables "a", "b" must have a value
    • If the number of values in the generator is 3; then a=, b=(val_2>, c=[]
    • If the numeber of values in the generator is greater than 3; then a=, b=(val_2>, c=[, ... ] In this case, if the generator is infinite; the program will be blocked trying to consume the generator

    What I propose for you is the following solution

    # Create a dummy generator for this example
    def my_generator():
    i = 0
    while i < 2:
        yield i
        i += 1
    
    # Our Generator Unpacker
    class GeneratorUnpacker:
        def __init__(self, generator):
            self.generator = generator
    
        def __iter__(self):
            return self
    
        def __next__(self):
            try:
                return next(self.generator)
            except StopIteration:
                return None # When the generator ends; we will return None as value
    
    if __name__ == '__main__':
        dummy_generator = my_generator()
        g =  GeneratorUnpacker(dummy_generator )
        a, b, c = next(g), next(g), next(g)
    
    0 讨论(0)
  • 2020-12-03 11:34

    You can't slice a generator object or iterator using a normal slice operations. Instead you need to use itertools.islice as @jonrsharpe already mentioned in his comment.

    import itertools    
    
    for i in itertools.islice(x, 95)
        print(i)
    

    Also note that islice returns an iterator and consume data on the iterator or generator. So you will need to convert you data to list or create a new generator object if you need to go back and do something or use the little known itertools.tee to create a copy of your generator.

    from itertools import tee
    
    
    first, second = tee(f())
    
    0 讨论(0)
  • 2020-12-03 11:38

    islice is the pythonic way

    from itertools import islice    
    
    g = (i for i in range(100))
    
    for num in islice(g, 95, None):
        print num
    
    0 讨论(0)
  • 2020-12-03 11:42

    In general, the answer is itertools.islice, but you should note that islice doesn't, and can't, actually skip values. It just grabs and throws away start values before it starts yield-ing values. So it's usually best to avoid islice if possible when you need to skip a lot of values and/or the values being skipped are expensive to acquire/compute. If you can find a way to not generate the values in the first place, do so. In your (obviously contrived) example, you'd just adjust the start index for the range object.

    In the specific cases of trying to run on a file object, pulling a huge number of lines (particularly reading from a slow medium) may not be ideal. Assuming you don't need specific lines, one trick you can use to avoid actually reading huge blocks of the file, while still testing some distance in to the file, is the seek to a guessed offset, read out to the end of the line (to discard the partial line you probably seeked to the middle of), then islice off however many lines you want from that point. For example:

    import itertools
    
    with open('myhugefile') as f:
        # Assuming roughly 80 characters per line, this seeks to somewhere roughly
        # around the 100,000th line without reading in the data preceding it
        f.seek(80 * 100000)
        next(f)  # Throw away the partial line you probably landed in the middle of
        for line in itertools.islice(f, 100):  # Process 100 lines
            # Do stuff with each line
    

    For the specific case of files, you might also want to look at mmap which can be used in similar ways (and is unusually useful if you're processing blocks of data rather than lines of text, possibly randomly jumping around as you go).

    Update: From your updated question, you'll need to look at your API docs and/or data format to figure out exactly how to skip around properly. It looks like skbio offers some features for skipping using seq_num, but that's still going to read if not process most of the file. If the data was written out with equal sequence lengths, I'd look at the docs on Alignment; aligned data may be loadable without processing the preceding data at all, by e.g by using Alignment.subalignment to create new Alignments that skip the rest of the data for you.

    0 讨论(0)
提交回复
热议问题