Python's itertools product memory consumption

后端 未结 2 511
礼貌的吻别
礼貌的吻别 2021-02-19 04:55

The documentation says that the cartesian product function

the actual implementation does not build up intermediate results in memory.

How can

相关标签:
2条回答
  • 2021-02-19 04:57

    Well, it also says:

    The nested loops cycle like an odometer with the rightmost element advancing on every iteration. This pattern creates a lexicographic ordering so that if the input’s iterables are sorted, the product tuples are emitted in sorted order.

    This is pretty much how it works in the implementation (Modules/itertoolsmodule.c)

    Here is the state object:

    typedef struct {
        PyObject_HEAD
        PyObject *pools;       /* tuple of pool tuples */
        Py_ssize_t *indices;   /* one index per pool */
        PyObject *result;      /* most recently returned result tuple */
        int stopped;           /* set to 1 when the product iterator is exhausted */
    } productobject;
    

    And the next item is returned by the function product_next, which uses this state and the algorithm described in the quote to generate the next state. See this answer to understand the memory requirements.

    For general education, you can read about how to create generators with state from C extensions here.

    0 讨论(0)
  • 2021-02-19 05:01

    Looking at the module's source code, itertools.product() actually converts every argument to a tuple:

    // product_new() in itertoolsmodule.c
    for (i=0; i < nargs ; ++i) {
        PyObject *item = PyTuple_GET_ITEM(args, i);
        PyObject *pool = PySequence_Tuple(item); //<==== Call tuple(arg)
        if (pool == NULL)
            goto error;
        PyTuple_SET_ITEM(pools, i, pool);
        indices[i] = 0;
    }
    

    In other words, itertools.product()'s memory consumption appears to be linear in the size of the input arguments.

    0 讨论(0)
提交回复
热议问题