I would like to make this process in batches, because of the volume.
Here\'s my code:
getconn = conexiones()
con = getconn.mysqlDWconnect()
with con:
You can use
SELECT id, date, product_id, sales FROM sales LIMIT X OFFSET Y;
where X is the size of the batch you need and Y is current offset (X times number of current iterations for example)
Thank you, here's how I implement it with your suggestions:
control = True
index = 0
while control==True:
getconn = conexiones()
con = getconn.mysqlDWconnect()
with con:
cur = con.cursor(mdb.cursors.DictCursor)
query = "SELECT id, date, product_id, sales FROM sales limit 10 OFFSET " + str(10 * (index))
cur.execute(query)
rows = cur.fetchall()
index = index+1
if len(rows)== 0:
control=False
for row in rows:
dataset.append(row)
To expand on akalikin's answer, you can use a stepped iteration to split the query into chunks, and then use LIMIT and OFFSET to execute the query.
cur = con.cursor(mdb.cursors.DictCursor)
cur.execute("SELECT COUNT(*) FROM sales")
for i in range(0,cur.fetchall(),5):
cur2 = con.cursor(mdb.cursors.DictCursor)
cur2.execute("SELECT id, date, product_id, sales FROM sales LIMIT %s OFFSET %s" %(5,i))
rows = cur2.fetchall()
print rows
First point: a python db-api.cursor
is an iterator, so unless you really need to load a whole batch in memory at once, you can just start with using this feature, ie instead of:
cursor.execute("SELECT * FROM mytable")
rows = cursor.fetchall()
for row in rows:
do_something_with(row)
you could just:
cursor.execute("SELECT * FROM mytable")
for row in cursor:
do_something_with(row)
Then if your db connector's implementation still doesn't make proper use of this feature, it will be time to add LIMIT and OFFSET to the mix:
# py2 / py3 compat
try:
# xrange is defined in py2 only
xrange
except NameError:
# py3 range is actually p2 xrange
xrange = range
cursor.execute("SELECT count(*) FROM mytable")
count = cursor.fetchone()[0]
batch_size = 42 # whatever
for offset in xrange(0, count, batch_size):
cursor.execute(
"SELECT * FROM mytable LIMIT %s OFFSET %s",
(batch_size, offset))
for row in cursor:
do_something_with(row)