问题
I am trying to build a list of possible string combinations to then iterate against it. I am running out of memory executing the below line, which I get because it's several billion lines.
data = list(map(''.join,chain.from_iterable(product(string.digits+string.ascii_lowercase+'/',repeat = i) for i in range(0,7))))
So I think, rather than creating this massive iterable list, I create it and execute against it in waves with some kind of "holding string" that I save to memory and can restart from when I want. IE, generate and iterate against a million rows, then save the holding string to file. Then start up again with the next million rows, but start my mapping/iterations at the "holding string" or the next row. I have no clue how to do that. I think I might have to not use the .from_iterable(product(
code that I had implemented. If that idea is not clear (or is clear but stupid) let me know.
Also, another option rather than breaking up the memory issue, would be to somehow optimize the iterable list itself, I'm not sure how I would do that either. I'm trying to map an API that has no existing documentation. While I don't know that a non-exhaustive list is the route to take, I'm certainly open to suggestions.
Here is the code chunk I've been using:
import csv
import string
from itertools import product, chain
#Open stringfile. If it doesn't exist, create it
try:
with open(stringfile) as f:
reader = csv.reader(f,delimiter=',')
data = list(reader)
f.close()
except:
data = list(map(''.join, chain.from_iterable(product(string.digits+string.ascii_lowercase + '/', repeat = i) for i in range(0,6))))
f=open(stringfile,'w')
f.write(str('\n.join(data)))
f.close()
pass
#Iterate against
...
EDIT: Further poking at this led me to this thread, which is similar topic. There is discussion about using islice, which helps me post-mapping (the script crashed last night while doing the API calls due to an error with my exception handling). I just restarted it at the 400k-th iterable.
Can I use .islice within a product? So for the generator, generate items 10mil-12mil (for example) and operate on just those items as a way to preserve memory?
Here is the most recent snippet of what I'm doing. You can see I plugged in the islice further down in the actual iteration, but I want to islice in the actual generation (the data =
line).
#Open stringfile. If it doesn't exist, create it
try:
with open(stringfile) as f:
reader = csv.reader(f,delimiter=',')
data = list(reader)
f.close()
except:
data = list(map(''.join, chain.from_iterable(product(string.digits + string.ascii_lowercase + '/',repeat = i) for i in range(3,5))))
f=open(stringfile,'w')
f.write(str('\n'.join(data)))
f.close()
pass
print("Total items: " + str(len(data)-substart))
fdf = pd.DataFrame()
sdf = pd.DataFrame()
qdf = pd.DataFrame()
attctr = 0
#Iterate through the string combination list
for idx,kw in islice(enumerate(data),substart,substop):
#Attempt API call. Do the cooldown function if there is an issue.
if idx/1000 == int(idx/1000):
print("Iteration " + str(idx) + " of " + str(len(data)))
attctr +=1
if attctr == attcd:
print("Cooling down!")
time.sleep(cdtimer)
attctr = 0
try:
....
来源:https://stackoverflow.com/questions/65743187/running-out-of-memory-on-python-product-iteration-chain