python: group elements of a tuple having the same first element

前端 未结 4 697
予麋鹿
予麋鹿 2021-01-19 01:27

i have a tuple like this

[
(379146591, \'it\', 55, 1, 1, \'NON ENTRARE\', \'NonEntrate\', 55, 1), 
(4746004, \'it\', 28, 2, 2, \'NON ENTRARE\', \'NonEntrate\         


        
相关标签:
4条回答
  • 2021-01-19 01:40

    It's pretty simple with defaultdict; You initialize the default value to be a list and then append the item to the value of the same key:

    lst = [
        (379146591, 'it', 55, 1, 1, 'NON ENTRARE', 'NonEntrate', 55, 1), 
        (4746004, 'it', 28, 2, 2, 'NON ENTRARE', 'NonEntrate', 26, 2), 
        (4746004, 'it', 28, 2, 2, 'TheBestTroll Group', 'TheBestTrollGroup', 2, 3)
    ]
    
    from collections import defaultdict    ​
    d = defaultdict(list)
    
    for k, *v in lst:
        d[k].append(v)
    
    list(d.items())
    #[(4746004,
    #  [('it', 28, 2, 2, 'NON ENTRARE', 'NonEntrate', 26, 2),
    #   ('it', 28, 2, 2, 'TheBestTroll Group', 'TheBestTrollGroup', 2, 3)]),
    # (379146591, [('it', 55, 1, 1, 'NON ENTRARE', 'NonEntrate', 55, 1)])]
    

    If order is important, use an OrderedDict which can remember the insertion orders:

    from collections import OrderedDict
    d = OrderedDict()
    ​
    for k, *v in lst:
        d.setdefault(k, []).append(v)
    
    list(d.items())
    #[(379146591, [['it', 55, 1, 1, 'NON ENTRARE', 'NonEntrate', 55, 1]]),
    # (4746004,
    #  [['it', 28, 2, 2, 'NON ENTRARE', 'NonEntrate', 26, 2],
    #   ['it', 28, 2, 2, 'TheBestTroll Group', 'TheBestTrollGroup', 2, 3]])]
    
    0 讨论(0)
  • 2021-01-19 01:41

    u can use collection.defaultdict:

    data = [
        (379146591, 'it', 55, 1, 1, 'NON ENTRARE', 'NonEntrate', 55, 1), 
        (4746004, 'it', 28, 2, 2, 'NON ENTRARE', 'NonEntrate', 26, 2), 
        (4746004, 'it', 28, 2, 2, 'TheBestTroll Group', 'TheBestTrollGroup', 2, 3)
        ]
    from collections import defaultdict
    a = defaultdict(list)
    a = defaultdict(list)
    
    
    from collections import defaultdict
    a = defaultdict(list)
    
    for d in data:
        a[d[0]].append(d[1:])
    
    for k,v in a.items():
        a[k] = tuple(a[k])
    
    print(dict(a))
    
    0 讨论(0)
  • 2021-01-19 01:45

    You can use Python3 variable unpacking and OrderedDict to retain order:

    from collections import OrderedDict
    d = OrderedDict()
    l = [
      (379146591, 'it', 55, 1, 1, 'NON ENTRARE', 'NonEntrate', 55, 1), 
      (4746004, 'it', 28, 2, 2, 'NON ENTRARE', 'NonEntrate', 26, 2), 
     (4746004, 'it', 28, 2, 2, 'TheBestTroll Group', 'TheBestTrollGroup', 2, 3)
    ]
    
    for a, *b in l:
      if a in d:
         d[a].append(b)
      else:
         d[a] = [b]
    
    final_data = [(a, tuple(map(tuple, b))) for a, b in d.items()]
    

    Output:

    [(379146591, (('it', 55, 1, 1, 'NON ENTRARE', 'NonEntrate', 55, 1),)), (4746004, (('it', 28, 2, 2, 'NON ENTRARE', 'NonEntrate', 26, 2), ('it', 28, 2, 2, 'TheBestTroll Group', 'TheBestTrollGroup', 2, 3)))]
    
    0 讨论(0)
  • 2021-01-19 01:54

    Use itertools.groupby (and operator.itemgetter to get the first item). The only thing is that your data needs to already be sorted so that the groups appear one after the other (if you've used the uniq and sort bash commands, same idea), you can use sorted() for this

    import operator
    from itertools import groupby
    
    data = [
        (379146591, "it", 55, 1, 1, "NON ENTRARE", "NonEntrate", 55, 1),
        (4746004, "it", 28, 2, 2, "NON ENTRARE", "NonEntrate", 26, 2),
        (4746004, "it", 28, 2, 2, "TheBestTroll Group", "TheBestTrollGroup", 2, 3),
    ]
    
    data = sorted(data, key=operator.itemgetter(0))  # this might be unnecessary
    for k, g in groupby(data, operator.itemgetter(0)):
        print(k, list(g))
    

    Will output

    4746004 [(4746004, 'it', 28, 2, 2, 'NON ENTRARE', 'NonEntrate', 26, 2), (4746004, 'it', 28, 2, 2, 'TheBestTroll Group', 'TheBestTrollGroup', 2, 3)]
    379146591 [(379146591, 'it', 55, 1, 1, 'NON ENTRARE', 'NonEntrate', 55, 1)]
    

    In your case, you also need to remove the first element from your lists of values. Change the last two lines of the above to:

    for k, g in groupby(data, operator.itemgetter(0)):
        print(k, [item[1:] for item in g])
    

    Output:

    4746004 [('it', 28, 2, 2, 'NON ENTRARE', 'NonEntrate', 26, 2), ('it', 28, 2, 2, 'TheBestTroll Group', 'TheBestTrollGroup', 2, 3)]
    379146591 [('it', 55, 1, 1, 'NON ENTRARE', 'NonEntrate', 55, 1)]
    
    0 讨论(0)
提交回复
热议问题