Merging the results of itertools.product?

后端 未结 3 678
长情又很酷
长情又很酷 2021-01-25 05:49

I am trying to create a list of numbers from 0-9999 using itertools.product. I am able to create a list from 0000-9999 by doing the follow

3条回答
  •  醉话见心
    2021-01-25 06:14

    Performance improvement on existing answers:

    from itertools import chain, product
    
    list(map(''.join, chain.from_iterable(product(numbers, repeat=i) for i in range(1, 5))))
    # Or on Python 3.5+ with additional unpacking generalizations:
    [*map(''.join, chain.from_iterable(product(numbers, repeat=i) for i in range(1, 5)))]
    

    omitting list()/[*...] wrapping if you're just iterating the results.

    The performance improves significantly (not so much in this case, but dramatically for larger products) on the CPython reference interpreter as (implementation details here):

    1. It pushes the vast majority of work to the C layer, avoiding byte code interpreter loop overhead
    2. product has an optimization that reuses the result tuple (including not needing to set the majority of the values in it) if no references exist when the next result is requested. This optimization isn't available to listcomps and genexprs (the loop structure keeps a reference to the resulting tuple alive just long enough that a reference exists when it's determining if it can reuse the tuple for the next result), but map(''.join avoids that (it only holds the reference to the tuple long enough to call the mapper function, discarding it before it yields the result of the mapper).

    Even in this case, the speedup is significant, percentage-wise, demonstrated with ipython microbenchmarks (in this case, on a Linux x64 3.6 install):

    >>> %timeit -r5 [''.join(p) for n in range(1, 5) for p in product(nums, repeat=n)]
    24.9 μs ± 95.2 ns per loop (mean ± std. dev. of 5 runs, 10000 loops each)
    >>> %timeit -r5 list(map(''.join, chain.from_iterable(product(numbers, repeat=i) for i in range(1, 5))))
    18.2 μs ± 41.2 ns per loop (mean ± std. dev. of 5 runs, 100000 loops each)
    

    As noted, the gains are large only in percentage terms here (~27% runtime reduction); 6.7 μs is pretty trivial in the grand scheme of things. But if the range to cover gets larger and/or the set of numbers to product over gets bigger, it matters more; for numbers = '0123456789' and range(1, 8), the reduction is from 2.54 s to 1.67 s; asymptotically the savings appears to be a savings of roughly a third, and when the total cost is measured in seconds, reducing that cost by a third is meaningful.

提交回复
热议问题