Retrieving Data from MySQL in batches via Python

后端 未结 4 2014
日久生厌
日久生厌 2021-01-31 11:11

I would like to make this process in batches, because of the volume.

Here\'s my code:

 getconn = conexiones()
 con = getconn.mysqlDWconnect()
 with con:         


        
相关标签:
4条回答
  • 2021-01-31 11:50

    You can use

    SELECT id, date, product_id, sales FROM sales LIMIT X OFFSET Y;
    

    where X is the size of the batch you need and Y is current offset (X times number of current iterations for example)

    0 讨论(0)
  • 2021-01-31 11:52

    Thank you, here's how I implement it with your suggestions:

    control = True
    index = 0
    while control==True:
       getconn = conexiones()
       con = getconn.mysqlDWconnect()
       with con:
            cur = con.cursor(mdb.cursors.DictCursor)
            query = "SELECT id, date, product_id, sales FROM sales  limit 10 OFFSET " + str(10 * (index))
            cur.execute(query)
            rows = cur.fetchall()
            index = index+1        
            if len(rows)== 0:
                control=False
       for row in rows:
            dataset.append(row)
    
    0 讨论(0)
  • 2021-01-31 11:54

    To expand on akalikin's answer, you can use a stepped iteration to split the query into chunks, and then use LIMIT and OFFSET to execute the query.

    cur = con.cursor(mdb.cursors.DictCursor)
    cur.execute("SELECT COUNT(*) FROM sales")
    
    for i in range(0,cur.fetchall(),5):
        cur2 = con.cursor(mdb.cursors.DictCursor)
        cur2.execute("SELECT id, date, product_id, sales FROM sales LIMIT %s OFFSET %s" %(5,i))
        rows = cur2.fetchall()
        print rows
    
    0 讨论(0)
  • 2021-01-31 12:00

    First point: a python db-api.cursor is an iterator, so unless you really need to load a whole batch in memory at once, you can just start with using this feature, ie instead of:

    cursor.execute("SELECT * FROM mytable")
    rows = cursor.fetchall()
    for row in rows:
       do_something_with(row)
    

    you could just:

    cursor.execute("SELECT * FROM mytable")
    for row in cursor:
       do_something_with(row)
    

    Then if your db connector's implementation still doesn't make proper use of this feature, it will be time to add LIMIT and OFFSET to the mix:

    # py2 / py3 compat
    try:
        # xrange is defined in py2 only
        xrange
    except NameError:
        # py3 range is actually p2 xrange
        xrange = range
    
    cursor.execute("SELECT count(*) FROM mytable")
    count = cursor.fetchone()[0]
    batch_size = 42 # whatever
    
    for offset in xrange(0, count, batch_size):
        cursor.execute(
            "SELECT * FROM mytable LIMIT %s OFFSET %s", 
            (batch_size, offset))
       for row in cursor:
           do_something_with(row)
    
    0 讨论(0)
提交回复
热议问题