Generator Expressions vs. List Comprehension

前端 未结 9 1779
梦如初夏
梦如初夏 2020-11-21 06:56

When should you use generator expressions and when should you use list comprehensions in Python?

# Generator expression
(x*2 for x in range(256))

# List com         


        
相关标签:
9条回答
  • 2020-11-21 07:27

    Use list comprehensions when the result needs to be iterated over multiple times, or where speed is paramount. Use generator expressions where the range is large or infinite.

    See Generator expressions and list comprehensions for more info.

    0 讨论(0)
  • 2020-11-21 07:29

    The important point is that the list comprehension creates a new list. The generator creates a an iterable object that will "filter" the source material on-the-fly as you consume the bits.

    Imagine you have a 2TB log file called "hugefile.txt", and you want the content and length for all the lines that start with the word "ENTRY".

    So you try starting out by writing a list comprehension:

    logfile = open("hugefile.txt","r")
    entry_lines = [(line,len(line)) for line in logfile if line.startswith("ENTRY")]
    

    This slurps up the whole file, processes each line, and stores the matching lines in your array. This array could therefore contain up to 2TB of content. That's a lot of RAM, and probably not practical for your purposes.

    So instead we can use a generator to apply a "filter" to our content. No data is actually read until we start iterating over the result.

    logfile = open("hugefile.txt","r")
    entry_lines = ((line,len(line)) for line in logfile if line.startswith("ENTRY"))
    

    Not even a single line has been read from our file yet. In fact, say we want to filter our result even further:

    long_entries = ((line,length) for (line,length) in entry_lines if length > 80)
    

    Still nothing has been read, but we've specified now two generators that will act on our data as we wish.

    Lets write out our filtered lines to another file:

    outfile = open("filtered.txt","a")
    for entry,length in long_entries:
        outfile.write(entry)
    

    Now we read the input file. As our for loop continues to request additional lines, the long_entries generator demands lines from the entry_lines generator, returning only those whose length is greater than 80 characters. And in turn, the entry_lines generator requests lines (filtered as indicated) from the logfile iterator, which in turn reads the file.

    So instead of "pushing" data to your output function in the form of a fully-populated list, you're giving the output function a way to "pull" data only when its needed. This is in our case much more efficient, but not quite as flexible. Generators are one way, one pass; the data from the log file we've read gets immediately discarded, so we can't go back to a previous line. On the other hand, we don't have to worry about keeping data around once we're done with it.

    0 讨论(0)
  • 2020-11-21 07:29

    I'm using the Hadoop Mincemeat module. I think this is a great example to take a note of:

    import mincemeat
    
    def mapfn(k,v):
        for w in v:
            yield 'sum',w
            #yield 'count',1
    
    
    def reducefn(k,v): 
        r1=sum(v)
        r2=len(v)
        print r2
        m=r1/r2
        std=0
        for i in range(r2):
           std+=pow(abs(v[i]-m),2)  
        res=pow((std/r2),0.5)
        return r1,r2,res
    

    Here the generator gets numbers out of a text file (as big as 15GB) and applies simple math on those numbers using Hadoop's map-reduce. If I had not used the yield function, but instead a list comprehension, it would have taken a much longer time calculating the sums and average (not to mention the space complexity).

    Hadoop is a great example for using all the advantages of Generators.

    0 讨论(0)
提交回复
热议问题