I have the following data structure (a list of lists)
[
[\'4\', \'21\', \'1\', \'14\', \'2008-10-24 15:42:58\'],
[\'3\', \'22\', \'4\', \'2somename\', \'2
Use a function to reorder the list so that I can group by each item in the list. For example I'd like to be able to group by the second column (so that all the 21's are together)
Lists have a built in sort method and you can provide a function that extracts the sort key.
>>> import pprint
>>> l.sort(key = lambda ll: ll[1])
>>> pprint.pprint(l)
[['4', '21', '1', '14', '2008-10-24 15:42:58'],
['5', '21', '3', '19', '2008-10-24 15:45:45'],
['6', '21', '1', '1somename', '2008-10-24 15:45:49'],
['3', '22', '4', '2somename', '2008-10-24 15:22:03'],
['7', '22', '3', '2somename', '2008-10-24 15:45:51']]
Use a function to only display certain values from each inner list. For example i'd like to reduce this list to only contain the 4th field value of '2somename'
This looks like a job for list comprehensions
>>> [ll[3] for ll in l]
['14', '2somename', '19', '1somename', '2somename']
It looks a lot like you're trying to use a list as a database.
Nowadays Python includes sqlite bindings in the core distribution. If you don't need persistence, it's really easy to create an in-memory sqlite database (see How do I create a sqllite3 in-memory database?).
Then you can use SQL statements to do all this sorting and filtering without having to reinvent the wheel.
If I understand your question correctly, the following code should do the job:
l = [
['4', '21', '1', '14', '2008-10-24 15:42:58'],
['3', '22', '4', '2somename', '2008-10-24 15:22:03'],
['5', '21', '3', '19', '2008-10-24 15:45:45'],
['6', '21', '1', '1somename', '2008-10-24 15:45:49'],
['7', '22', '3', '2somename', '2008-10-24 15:45:51']
]
def compareField(field):
def c(l1,l2):
return cmp(l1[field], l2[field])
return c
# Use compareField(1) as the ordering criterion, i.e. sort only with
# respect to the 2nd field
l.sort(compareField(1))
for row in l: print row
print
# Select only those sublists for which 4th field=='2somename'
l2somename = [row for row in l if row[3]=='2somename']
for row in l2somename: print row
Output:
['4', '21', '1', '14', '2008-10-24 15:42:58']
['5', '21', '3', '19', '2008-10-24 15:45:45']
['6', '21', '1', '1somename', '2008-10-24 15:45:49']
['3', '22', '4', '2somename', '2008-10-24 15:22:03']
['7', '22', '3', '2somename', '2008-10-24 15:45:51']
['3', '22', '4', '2somename', '2008-10-24 15:22:03']
['7', '22', '3', '2somename', '2008-10-24 15:45:51']
For part (2), with x being your array, I think you want,
[y for y in x if y[3] == '2somename']
Which will return a list of just your data lists that have a fourth value being '2somename'... Although it seems Kamil is giving you the best advice with going for SQL...
You're simply creating indexes on your structure, right?
>>> from collections import defaultdict
>>> def indexOn( things, pos ):
... inx= defaultdict(list)
... for t in things:
... inx[t[pos]].append(t)
... return inx
...
>>> a=[
... ['4', '21', '1', '14', '2008-10-24 15:42:58'],
... ['3', '22', '4', '2somename', '2008-10-24 15:22:03'],
... ['5', '21', '3', '19', '2008-10-24 15:45:45'],
... ['6', '21', '1', '1somename', '2008-10-24 15:45:49'],
... ['7', '22', '3', '2somename', '2008-10-24 15:45:51']
... ]
Here's your first request, grouped by position 1.
>>> import pprint
>>> pprint.pprint( dict(indexOn(a,1)) )
{'21': [['4', '21', '1', '14', '2008-10-24 15:42:58'],
['5', '21', '3', '19', '2008-10-24 15:45:45'],
['6', '21', '1', '1somename', '2008-10-24 15:45:49']],
'22': [['3', '22', '4', '2somename', '2008-10-24 15:22:03'],
['7', '22', '3', '2somename', '2008-10-24 15:45:51']]}
Here's your second request, grouped by position 3.
>>> dict(indexOn(a,3))
{'19': [['5', '21', '3', '19', '2008-10-24 15:45:45']], '14': [['4', '21', '1', '14', '2008-10-24 15:42:58']], '2somename': [['3', '22', '4', '2somename', '2008-10-24 15:22:03'], ['7', '22', '3', '2somename', '2008-10-24 15:45:51']], '1somename': [['6', '21', '1', '1somename', '2008-10-24 15:45:49']]}
>>> pprint.pprint(_)
{'14': [['4', '21', '1', '14', '2008-10-24 15:42:58']],
'19': [['5', '21', '3', '19', '2008-10-24 15:45:45']],
'1somename': [['6', '21', '1', '1somename', '2008-10-24 15:45:49']],
'2somename': [['3', '22', '4', '2somename', '2008-10-24 15:22:03'],
['7', '22', '3', '2somename', '2008-10-24 15:45:51']]}
If you assigned it to var "a"...
python 2.x:
#1:
a.sort(lambda x,y: cmp(x[1], y[1]))
#2:
filter(lambda x: x[3]=="2somename", a)
python 3:
#1:
a.sort(key=lambda x: x[1])