Generator Expressions vs. List Comprehension

前端 未结 9 1783
梦如初夏
梦如初夏 2020-11-21 06:56

When should you use generator expressions and when should you use list comprehensions in Python?

# Generator expression
(x*2 for x in range(256))

# List com         


        
9条回答
  •  鱼传尺愫
    2020-11-21 07:29

    The important point is that the list comprehension creates a new list. The generator creates a an iterable object that will "filter" the source material on-the-fly as you consume the bits.

    Imagine you have a 2TB log file called "hugefile.txt", and you want the content and length for all the lines that start with the word "ENTRY".

    So you try starting out by writing a list comprehension:

    logfile = open("hugefile.txt","r")
    entry_lines = [(line,len(line)) for line in logfile if line.startswith("ENTRY")]
    

    This slurps up the whole file, processes each line, and stores the matching lines in your array. This array could therefore contain up to 2TB of content. That's a lot of RAM, and probably not practical for your purposes.

    So instead we can use a generator to apply a "filter" to our content. No data is actually read until we start iterating over the result.

    logfile = open("hugefile.txt","r")
    entry_lines = ((line,len(line)) for line in logfile if line.startswith("ENTRY"))
    

    Not even a single line has been read from our file yet. In fact, say we want to filter our result even further:

    long_entries = ((line,length) for (line,length) in entry_lines if length > 80)
    

    Still nothing has been read, but we've specified now two generators that will act on our data as we wish.

    Lets write out our filtered lines to another file:

    outfile = open("filtered.txt","a")
    for entry,length in long_entries:
        outfile.write(entry)
    

    Now we read the input file. As our for loop continues to request additional lines, the long_entries generator demands lines from the entry_lines generator, returning only those whose length is greater than 80 characters. And in turn, the entry_lines generator requests lines (filtered as indicated) from the logfile iterator, which in turn reads the file.

    So instead of "pushing" data to your output function in the form of a fully-populated list, you're giving the output function a way to "pull" data only when its needed. This is in our case much more efficient, but not quite as flexible. Generators are one way, one pass; the data from the log file we've read gets immediately discarded, so we can't go back to a previous line. On the other hand, we don't have to worry about keeping data around once we're done with it.

提交回复
热议问题