Creating a dictionary with list of lists in Python

前端 未结 3 1848
夕颜
夕颜 2020-12-01 18:32

I have a huge file (with around 200k inputs). The inputs are in the form:

A B C D
B E F
C A B D
D  

I am reading this file and storing it i

相关标签:
3条回答
  • 2020-12-01 18:37

    A dictionary comprehension makes short work of this task:

    >>> s = [['A','B','C','D'], ['B','E','F'], ['C','A','B','D'], ['D']]
    >>> {t[0]:t[1:] for t in s}
    {'A': ['B', 'C', 'D'], 'C': ['A', 'B', 'D'], 'B': ['E', 'F'], 'D': []}
    
    0 讨论(0)
  • 2020-12-01 18:43

    Try using a slice:

    inlinkDict[docid] = adoc[1:]
    

    This will give you an empty list instead of a 0 for the case where only the key value is on the line. To get a 0 instead, use an or (which always returns one of the operands):

    inlinkDict[docid] = adoc[1:] or 0
    

    Easier way with a dict comprehension:

    >>> with open('/tmp/spam.txt') as f:
    ...     data = [line.split() for line in f]
    ... 
    >>> {d[0]: d[1:] for d in data}
    {'A': ['B', 'C', 'D'], 'C': ['A', 'B', 'D'], 'B': ['E', 'F'], 'D': []}
    >>> {d[0]: ' '.join(d[1:]) if d[1:] else 0 for d in data}
    {'A': 'B C D', 'C': 'A B D', 'B': 'E F', 'D': 0}
    

    Note: dict keys must be unique, so if you have, say, two lines beginning with 'C' the first one will be over-written.

    0 讨论(0)
  • 2020-12-01 18:44

    The accepted answer is correct, except that it reads the entire file into memory (may not be desirable if you have a large file), and it will overwrite duplicate keys.

    An alternate approach using defaultdict, which is available from Python 2.4 solves this:

    from collections import defaultdict
    d = defaultdict(list)
    with open('/tmp/spam.txt') as f:
      for line in f:
        parts = line.strip().split()
        d[parts[0]] += parts[1:]
    

    Input:

    A B C D
    B E F
    C A B D
    D  
    C H I J
    

    Result:

    >>> d = defaultdict(list)
    >>> with open('/tmp/spam.txt') as f:
    ...    for line in f:
    ...      parts = line.strip().split()
    ...      d[parts[0]] += parts[1:]
    ...
    >>> d['C']
    ['A', 'B', 'D', 'H', 'I', 'J']
    
    0 讨论(0)
提交回复
热议问题