How do python Set Comprehensions work?

问题

Q1 - Is the following a set() of a generator expression or a set comprehension? (Or are they same? If so, are list & dict comprehensions also corresponding type-cast on generators?)

my_set = {x for x in range(10)}

Q2 - Does the evaluation consider duplicate values & then remove them by applying set()?

dup_set = {x for x in [0, 1, 2, 0, 1, 2]}

Does the comprehension perform (speed-wise) better than regular for loops?

Update - I tried using timeit for speed comparisons. Am not sure if I am being just (fair) about it.

C:\>python -m timeit "s = set()" "for x in range(10):" "
  s.add(x)"
100000 loops, best of 3: 2.3 usec per loop

C:\>python -m timeit "s = {x for x in range(10)}"
1000000 loops, best of 3: 1.68 usec per loop

Now, using some conditionals

C:\>python -m timeit "s = set()" "for x in range(10):" "
  if x%2: s.add(x)"
100000 loops, best of 3: 2.27 usec per loop

C:\>python -m timeit "s = {x for x in range(10) if x%2}"
1000000 loops, best of 3: 1.83 usec per loop

So, there is quite some difference, is it due to the functionality being hardcoded in c?

回答1:

Q1 : Yes, yes, yes and yes. Or at least they behave like this. It's a little bit different if you're taking a look at the bytecode. Let's disassembly this code (Python 2.7) :

def list_comp(l):
    return [x+1 for x in l]

def dict_comp(l):
    return {x+1:0 for x in l}

def set_comp(l):
    return {x+1 for x in l}

def generator(l):
    return (x+1 for x in l)

This is what you get:

Disassembly of list_comp:
  2           0 BUILD_LIST              0
              3 LOAD_FAST               0 (l)
              6 GET_ITER            
        >>    7 FOR_ITER               16 (to 26)
             10 STORE_FAST              1 (x)
             13 LOAD_FAST               1 (x)
             16 LOAD_CONST              1 (1)
             19 BINARY_ADD          
             20 LIST_APPEND             2
             23 JUMP_ABSOLUTE           7
        >>   26 RETURN_VALUE
Disassembly of dict_comp:
  5           0 LOAD_CONST              1 (<code object <dictcomp> at 029DEE30)
              3 MAKE_FUNCTION           0
              6 LOAD_FAST               0 (l)
              9 GET_ITER            
             10 CALL_FUNCTION           1
             13 RETURN_VALUE  
Disassembly of set_comp:
  8           0 LOAD_CONST              1 (<code object <setcomp> at 029DECC8)
              3 MAKE_FUNCTION           0
              6 LOAD_FAST               0 (l)
              9 GET_ITER            
             10 CALL_FUNCTION           1
             13 RETURN_VALUE  
Disassembly of generator:
 11           0 LOAD_CONST              1 (<code object <genexpr> at 02A8FD58)
              3 MAKE_FUNCTION           0
              6 LOAD_FAST               0 (l)
              9 GET_ITER            
             10 CALL_FUNCTION           1
             13 RETURN_VALUE

The bytecode is barely the same for the dict comprenhension, the set comprehension and the generator. They all load a code object (<dictcomp>, <setcomp> or <genexpr>) and then make a callable function out of it. The list comprehension is different because it generates the bytecode corresponding to your list comprehension. This time it is interpreted and thus not native.

Q2 : It doesn't really consider duplicate values since it creates a comprehension with the list you gave. And then it creates the set with the comprehension.

About timing : List/Dict/Set comprehensions tend to be faster than anything else. Even if they're interpreted, the bytecode generated is optimized for most of the cases with special bytecode instructions like SET_ADD, LIST_APPEND or MAP_ADD.

来源：https://stackoverflow.com/questions/20496536/how-do-python-set-comprehensions-work

标签

python

set

generator

set-comprehension