Convert two lists into a dictionary

前端 未结 18 2577
囚心锁ツ
囚心锁ツ 2020-11-21 04:35

Imagine that you have:

keys = [\'name\', \'age\', \'food\']
values = [\'Monty\', 42, \'spam\']

What is the simplest way to produce the foll

相关标签:
18条回答
  • 2020-11-21 05:08
    • 2018-04-18

    The best solution is still:

    In [92]: keys = ('name', 'age', 'food')
    ...: values = ('Monty', 42, 'spam')
    ...: 
    
    In [93]: dt = dict(zip(keys, values))
    In [94]: dt
    Out[94]: {'age': 42, 'food': 'spam', 'name': 'Monty'}
    

    Tranpose it:

        lst = [('name', 'Monty'), ('age', 42), ('food', 'spam')]
        keys, values = zip(*lst)
        In [101]: keys
        Out[101]: ('name', 'age', 'food')
        In [102]: values
        Out[102]: ('Monty', 42, 'spam')
    
    0 讨论(0)
  • 2020-11-21 05:10

    Imagine that you have:

    keys = ('name', 'age', 'food')
    values = ('Monty', 42, 'spam')
    

    What is the simplest way to produce the following dictionary ?

    dict = {'name' : 'Monty', 'age' : 42, 'food' : 'spam'}
    

    Most performant, dict constructor with zip

    new_dict = dict(zip(keys, values))
    

    In Python 3, zip now returns a lazy iterator, and this is now the most performant approach.

    dict(zip(keys, values)) does require the one-time global lookup each for dict and zip, but it doesn't form any unnecessary intermediate data-structures or have to deal with local lookups in function application.

    Runner-up, dict comprehension:

    A close runner-up to using the dict constructor is to use the native syntax of a dict comprehension (not a list comprehension, as others have mistakenly put it):

    new_dict = {k: v for k, v in zip(keys, values)}
    

    Choose this when you need to map or filter based on the keys or value.

    In Python 2, zip returns a list, to avoid creating an unnecessary list, use izip instead (aliased to zip can reduce code changes when you move to Python 3).

    from itertools import izip as zip
    

    So that is still (2.7):

    new_dict = {k: v for k, v in zip(keys, values)}
    

    Python 2, ideal for <= 2.6

    izip from itertools becomes zip in Python 3. izip is better than zip for Python 2 (because it avoids the unnecessary list creation), and ideal for 2.6 or below:

    from itertools import izip
    new_dict = dict(izip(keys, values))
    

    Result for all cases:

    In all cases:

    >>> new_dict
    {'age': 42, 'name': 'Monty', 'food': 'spam'}
    

    Explanation:

    If we look at the help on dict we see that it takes a variety of forms of arguments:

    
    >>> help(dict)
    
    class dict(object)
     |  dict() -> new empty dictionary
     |  dict(mapping) -> new dictionary initialized from a mapping object's
     |      (key, value) pairs
     |  dict(iterable) -> new dictionary initialized as if via:
     |      d = {}
     |      for k, v in iterable:
     |          d[k] = v
     |  dict(**kwargs) -> new dictionary initialized with the name=value pairs
     |      in the keyword argument list.  For example:  dict(one=1, two=2)
    
    

    The optimal approach is to use an iterable while avoiding creating unnecessary data structures. In Python 2, zip creates an unnecessary list:

    >>> zip(keys, values)
    [('name', 'Monty'), ('age', 42), ('food', 'spam')]
    

    In Python 3, the equivalent would be:

    >>> list(zip(keys, values))
    [('name', 'Monty'), ('age', 42), ('food', 'spam')]
    

    and Python 3's zip merely creates an iterable object:

    >>> zip(keys, values)
    <zip object at 0x7f0e2ad029c8>
    

    Since we want to avoid creating unnecessary data structures, we usually want to avoid Python 2's zip (since it creates an unnecessary list).

    Less performant alternatives:

    This is a generator expression being passed to the dict constructor:

    generator_expression = ((k, v) for k, v in zip(keys, values))
    dict(generator_expression)
    

    or equivalently:

    dict((k, v) for k, v in zip(keys, values))
    

    And this is a list comprehension being passed to the dict constructor:

    dict([(k, v) for k, v in zip(keys, values)])
    

    In the first two cases, an extra layer of non-operative (thus unnecessary) computation is placed over the zip iterable, and in the case of the list comprehension, an extra list is unnecessarily created. I would expect all of them to be less performant, and certainly not more-so.

    Performance review:

    In 64 bit Python 3.8.2 provided by Nix, on Ubuntu 16.04, ordered from fastest to slowest:

    >>> min(timeit.repeat(lambda: dict(zip(keys, values))))
    0.6695233230129816
    >>> min(timeit.repeat(lambda: {k: v for k, v in zip(keys, values)}))
    0.6941362579818815
    >>> min(timeit.repeat(lambda: {keys[i]: values[i] for i in range(len(keys))}))
    0.8782548159942962
    >>> 
    >>> min(timeit.repeat(lambda: dict([(k, v) for k, v in zip(keys, values)])))
    1.077607496001292
    >>> min(timeit.repeat(lambda: dict((k, v) for k, v in zip(keys, values))))
    1.1840861019445583
    

    dict(zip(keys, values)) wins even with small sets of keys and values, but for larger sets, the differences in performance will become greater.

    A commenter said:

    min seems like a bad way to compare performance. Surely mean and/or max would be much more useful indicators for real usage.

    We use min because these algorithms are deterministic. We want to know the performance of the algorithms under the best conditions possible.

    If the operating system hangs for any reason, it has nothing to do with what we're trying to compare, so we need to exclude those kinds of results from our analysis.

    If we used mean, those kinds of events would skew our results greatly, and if we used max we will only get the most extreme result - the one most likely affected by such an event.

    A commenter also says:

    In python 3.6.8, using mean values, the dict comprehension is indeed still faster, by about 30% for these small lists. For larger lists (10k random numbers), the dict call is about 10% faster.

    I presume we mean dict(zip(... with 10k random numbers. That does sound like a fairly unusual use case. It does makes sense that the most direct calls would dominate in large datasets, and I wouldn't be surprised if OS hangs are dominating given how long it would take to run that test, further skewing your numbers. And if you use mean or max I would consider your results meaningless.

    Let's use a more realistic size on our top examples:

    import numpy
    import timeit
    l1 = list(numpy.random.random(100))
    l2 = list(numpy.random.random(100))
    

    And we see here that dict(zip(... does indeed run faster for larger datasets by about 20%.

    >>> min(timeit.repeat(lambda: {k: v for k, v in zip(l1, l2)}))
    9.698965263989521
    >>> min(timeit.repeat(lambda: dict(zip(l1, l2))))
    7.9965161079890095
    
    0 讨论(0)
  • 2020-11-21 05:12

    You can also use dictionary comprehensions in Python ≥ 2.7:

    >>> keys = ('name', 'age', 'food')
    >>> values = ('Monty', 42, 'spam')
    >>> {k: v for k, v in zip(keys, values)}
    {'food': 'spam', 'age': 42, 'name': 'Monty'}
    
    0 讨论(0)
  • 2020-11-21 05:13

    You may also try with one list which is a combination of two lists ;)

    a = [1,2,3,4]
    n = [5,6,7,8]
    
    x = []
    for i in a,n:
        x.append(i)
    
    print(dict(zip(x[0], x[1])))
    
    0 讨论(0)
  • 2020-11-21 05:15

    I had this doubt while I was trying to solve a graph-related problem. The issue I had was I needed to define an empty adjacency list and wanted to initialize all the nodes with an empty list, that's when I thought how about I check if it is fast enough, I mean if it will be worth doing a zip operation rather than simple assignment key-value pair. After all most of the times, the time factor is an important ice breaker. So I performed timeit operation for both approaches.

    import timeit
    def dictionary_creation(n_nodes):
        dummy_dict = dict()
        for node in range(n_nodes):
            dummy_dict[node] = []
        return dummy_dict
    
    
    def dictionary_creation_1(n_nodes):
        keys = list(range(n_nodes))
        values = [[] for i in range(n_nodes)]
        graph = dict(zip(keys, values))
        return graph
    
    
    def wrapper(func, *args, **kwargs):
        def wrapped():
            return func(*args, **kwargs)
        return wrapped
    
    iteration = wrapper(dictionary_creation, n_nodes)
    shorthand = wrapper(dictionary_creation_1, n_nodes)
    
    for trail in range(1, 8):
        print(f'Itertion: {timeit.timeit(iteration, number=trails)}\nShorthand: {timeit.timeit(shorthand, number=trails)}')
    

    For n_nodes = 10,000,000 I get,

    Iteration: 2.825081646999024 Shorthand: 3.535717916001886

    Iteration: 5.051560923002398 Shorthand: 6.255070794999483

    Iteration: 6.52859034499852 Shorthand: 8.221581164998497

    Iteration: 8.683652416999394 Shorthand: 12.599181543999293

    Iteration: 11.587241565001023 Shorthand: 15.27298851100204

    Iteration: 14.816342867001367 Shorthand: 17.162912737003353

    Iteration: 16.645022411001264 Shorthand: 19.976680120998935

    You can clearly see after a certain point, iteration approach at n_th step overtakes the time taken by shorthand approach at n-1_th step.

    0 讨论(0)
  • 2020-11-21 05:16

    Try this:

    >>> import itertools
    >>> keys = ('name', 'age', 'food')
    >>> values = ('Monty', 42, 'spam')
    >>> adict = dict(itertools.izip(keys,values))
    >>> adict
    {'food': 'spam', 'age': 42, 'name': 'Monty'}
    

    In Python 2, it's also more economical in memory consumption compared to zip.

    0 讨论(0)
提交回复
热议问题