Sorting and Grouping Nested Lists in Python

后端 未结 8 1569
没有蜡笔的小新
没有蜡笔的小新 2020-11-30 20:05

I have the following data structure (a list of lists)

[
 [\'4\', \'21\', \'1\', \'14\', \'2008-10-24 15:42:58\'], 
 [\'3\', \'22\', \'4\', \'2somename\', \'2         


        
相关标签:
8条回答
  • 2020-11-30 20:44

    Use a function to reorder the list so that I can group by each item in the list. For example I'd like to be able to group by the second column (so that all the 21's are together)

    Lists have a built in sort method and you can provide a function that extracts the sort key.

    >>> import pprint
    >>> l.sort(key = lambda ll: ll[1])
    >>> pprint.pprint(l)
    [['4', '21', '1', '14', '2008-10-24 15:42:58'],
     ['5', '21', '3', '19', '2008-10-24 15:45:45'],
     ['6', '21', '1', '1somename', '2008-10-24 15:45:49'],
     ['3', '22', '4', '2somename', '2008-10-24 15:22:03'],
     ['7', '22', '3', '2somename', '2008-10-24 15:45:51']]
    

    Use a function to only display certain values from each inner list. For example i'd like to reduce this list to only contain the 4th field value of '2somename'

    This looks like a job for list comprehensions

    >>> [ll[3] for ll in l]
    ['14', '2somename', '19', '1somename', '2somename']
    
    0 讨论(0)
  • 2020-11-30 20:44

    It looks a lot like you're trying to use a list as a database.

    Nowadays Python includes sqlite bindings in the core distribution. If you don't need persistence, it's really easy to create an in-memory sqlite database (see How do I create a sqllite3 in-memory database?).

    Then you can use SQL statements to do all this sorting and filtering without having to reinvent the wheel.

    0 讨论(0)
  • 2020-11-30 20:47

    If I understand your question correctly, the following code should do the job:

    l = [
     ['4', '21', '1', '14', '2008-10-24 15:42:58'], 
     ['3', '22', '4', '2somename', '2008-10-24 15:22:03'], 
     ['5', '21', '3', '19', '2008-10-24 15:45:45'], 
     ['6', '21', '1', '1somename', '2008-10-24 15:45:49'], 
     ['7', '22', '3', '2somename', '2008-10-24 15:45:51']
    ]
    
    def compareField(field):
       def c(l1,l2):
          return cmp(l1[field], l2[field])
       return c
    
    # Use compareField(1) as the ordering criterion, i.e. sort only with
    # respect to the 2nd field
    l.sort(compareField(1))
    for row in l: print row
    
    print
    # Select only those sublists for which 4th field=='2somename'
    l2somename = [row for row in l if row[3]=='2somename']
    for row in l2somename: print row
    

    Output:

    ['4', '21', '1', '14', '2008-10-24 15:42:58']
    ['5', '21', '3', '19', '2008-10-24 15:45:45']
    ['6', '21', '1', '1somename', '2008-10-24 15:45:49']
    ['3', '22', '4', '2somename', '2008-10-24 15:22:03']
    ['7', '22', '3', '2somename', '2008-10-24 15:45:51']
    
    ['3', '22', '4', '2somename', '2008-10-24 15:22:03']
    ['7', '22', '3', '2somename', '2008-10-24 15:45:51']
    
    0 讨论(0)
  • 2020-11-30 20:52

    For part (2), with x being your array, I think you want,

    [y for y in x if y[3] == '2somename']
    

    Which will return a list of just your data lists that have a fourth value being '2somename'... Although it seems Kamil is giving you the best advice with going for SQL...

    0 讨论(0)
  • 2020-11-30 20:52

    You're simply creating indexes on your structure, right?

    >>> from collections import defaultdict
    >>> def indexOn( things, pos ):
    ...     inx= defaultdict(list)
    ...     for t in things:
    ...             inx[t[pos]].append(t)
    ...     return inx
    ... 
    >>> a=[
    ...  ['4', '21', '1', '14', '2008-10-24 15:42:58'], 
    ...  ['3', '22', '4', '2somename', '2008-10-24 15:22:03'], 
    ...  ['5', '21', '3', '19', '2008-10-24 15:45:45'], 
    ...  ['6', '21', '1', '1somename', '2008-10-24 15:45:49'], 
    ...  ['7', '22', '3', '2somename', '2008-10-24 15:45:51']
    ... ]
    

    Here's your first request, grouped by position 1.

    >>> import pprint
    >>> pprint.pprint( dict(indexOn(a,1)) )
    {'21': [['4', '21', '1', '14', '2008-10-24 15:42:58'],
            ['5', '21', '3', '19', '2008-10-24 15:45:45'],
            ['6', '21', '1', '1somename', '2008-10-24 15:45:49']],
     '22': [['3', '22', '4', '2somename', '2008-10-24 15:22:03'],
            ['7', '22', '3', '2somename', '2008-10-24 15:45:51']]}
    

    Here's your second request, grouped by position 3.

    >>> dict(indexOn(a,3))
    {'19': [['5', '21', '3', '19', '2008-10-24 15:45:45']], '14': [['4', '21', '1', '14', '2008-10-24 15:42:58']], '2somename': [['3', '22', '4', '2somename', '2008-10-24 15:22:03'], ['7', '22', '3', '2somename', '2008-10-24 15:45:51']], '1somename': [['6', '21', '1', '1somename', '2008-10-24 15:45:49']]}
    >>> pprint.pprint(_)
    {'14': [['4', '21', '1', '14', '2008-10-24 15:42:58']],
     '19': [['5', '21', '3', '19', '2008-10-24 15:45:45']],
     '1somename': [['6', '21', '1', '1somename', '2008-10-24 15:45:49']],
     '2somename': [['3', '22', '4', '2somename', '2008-10-24 15:22:03'],
                   ['7', '22', '3', '2somename', '2008-10-24 15:45:51']]} 
    
    0 讨论(0)
  • 2020-11-30 20:56

    If you assigned it to var "a"...

    python 2.x:

    #1:

    a.sort(lambda x,y: cmp(x[1], y[1]))
    

    #2:

    filter(lambda x: x[3]=="2somename", a)
    

    python 3:

    #1:

    a.sort(key=lambda x: x[1])
    
    0 讨论(0)
提交回复
热议问题