In my method i have to return a list within a list. I would like to have a list comprehension, because of the performance since the list takes about 5 minutes to create.
I have the need to make @ted's answer (imo) more readable and to add some explanations.
Tidied up solution:
# Function to print the index, if the index is evenly divisable by 1000:
def report(index):
if index % 1000 == 0:
print(index)
# The function the user wants to apply on the list elements
def process(x, index, report):
report(index) # Call of the reporting function
return 'something ' + x # ! Just an example, replace with your desired application
# !Just an example, replace with your list to iterate over
mylist = ['number ' + str(k) for k in range(5000)]
# Running a list comprehension
[process(x, index, report) for index, x in enumerate(mylist)]
Explanation: of enumerate(mylist)
: using the function enumerate
it is possible to have indices in addition to the elements of an iterable object (cf. this question and its answers). For example
[(index, x) for index, x in enumerate(["a", "b", "c"])] #returns
[(0, 'a'), (1, 'b'), (2, 'c')]
Note: index
and x
are no reserved names, just names I found convenient - [(foo, bar) for foo, bar in enumerate(["a", "b", "c"])]
yields the same result.
def report(index):
if index % 1000 == 0:
print(index)
def process(token, index, report=None):
if report:
report(index)
return token['text']
l1 = [{'text': k} for k in range(5000)]
l2 = [process(token, i, report) for i, token in enumerate(l1)]
and
and or
statementsdef process(token):
return token['text']
l1 = [{'text': k} for k in range(5000)]
l2 = [(i % 1000 == 0 and print(i)) or process(token) for i, token in enumerate(l1)]
def process(token):
return token['text']
def report(i):
i % 1000 == 0 and print(i)
l1 = [{'text': k} for k in range(5000)]
l2 = [report(i) or process(token) for i, token in enumerate(l1)]
All 3 methods print:
0
1000
2000
3000
4000
How 2 works
i % 1000 == 0 and print(i)
: and
only checks the second statement if the first one is True
so only prints when i % 1000 == 0
or process(token)
: or
always checks both statements, but returns the first one which evals to True
.
i % 1000 != 0
then the first statement is False
and process(token)
is added to the list. None
(because print
returns None
) and likewise, the or
statement adds process(token)
to the listHow 3 works
Similarly as 2, because report(i)
does not return
anything, it evals to None
and or
adds process(token)
to the list
doc_collection = [[1, 2],
[3, 4],
[5, 6]]
result = [print(progress) or
[str(token) for token in document]
for progress, document in enumerate(doc_collection)]
print(result) # [['1', '2'], ['3', '4'], ['5', '6']]
I don't consider this good or readable code, but the idea is fun.
It works because print
always returns None
so print(progress) or x
will always be x
(by the definition of or
).
def show_progress(it, milestones=1):
for i, x in enumerate(it):
yield x
processed = i + 1
if processed % milestones == 0:
print('Processed %s elements' % processed)
Simply apply this function to anything you're iterating over. It doesn't matter if you use a loop or list comprehension and it's easy to use anywhere with almost no code changes. For example:
doc_collection = [[1, 2],
[3, 4],
[5, 6]]
result = [[str(token) for token in document]
for document in show_progress(doc_collection)]
print(result) # [['1', '2'], ['3', '4'], ['5', '6']]
If you only wanted to show progress for every 100 documents, write:
show_progress(doc_collection, 100)
Here is my implementation.
pip install progressbar2
from progressbar import progressbar
new_list = [your_function(list_item) for list_item in progressbar(old_list)]`
You will see a progress bar while running the code block above.