How can I do a batch insert into an Oracle database using Python?

删除回忆录丶 提交于 2019-11-27 14:04:57

Here's what I've come up with which appears to work well (but please comment if there's a way to improve this):

# build rows for each date and add to a list of rows we'll use to insert as a batch 
rows = [] 
numberOfYears = endYear - startYear + 1
for i in range(numberOfYears):
    for j in range(12):
        # make a date for the first day of the month
        dateValue = + i, j + 1, 1)
        index = (i * 12) + j
        row = (stationId, dateValue, temps[index], precips[index])

# insert all of the rows as a batch and commit
ip = '' 
port = 1521
SID = 'my_sid'
dsn = cx_Oracle.makedsn(ip, port, SID)
connection = cx_Oracle.connect('username', 'password', dsn)
cursor = cx_Oracle.Cursor(connection)
cursor.prepare('insert into ' + database_table_name + ' (id, record_date, temp, precip) values (:1, :2, :3, :4)')
cursor.executemany(None, rows)

Use Cursor.prepare() and Cursor.executemany().

From the cx_Oracle documentation:

Cursor.prepare(statement[, tag])

This can be used before a call to execute() to define the statement that will be executed. When this is done, the prepare phase will not be performed when the call to execute() is made with None or the same string object as the statement. [...]

Cursor.executemany(statement, parameters)

Prepare a statement for execution against a database and then execute it against all parameter mappings or sequences found in the sequence parameters. The statement is managed in the same way as the execute() method manages it.

Thus, using the above two functions, your code becomes:

connection_string = "scott/tiger@testdb"
connection = cx_Oracle.Connection(connection_string)
cursor = cx_Oracle.Cursor(connection)
station_id = 'STATION_1'
start_year = 2000

temps = [ 1, 3, 5, 7, 9, 1, 3, 5, 7, 9, 1, 3 ]
precips = [ 2, 4, 6, 8, 2, 4, 6, 8, 2, 4, 6, 8 ]
number_of_years = len(temps) / 12

# list comprehension of dates for the first day of the month
date_values = [ + i, j + 1, 1) for i in range(number_of_years) for j in range(12)]

# second argument to executemany() should be of the form:
# [{'1': value_a1, '2': value_a2}, {'1': value_b1, '2': value_b2}]
dict_sequence = [{'1': date_values[i], '2': temps[i], '3': precips[i]} for i in range(1, len(temps))]

sql_insert = 'insert into my_table (id, date_column, temp, precip) values (%s, :1, :2, :3)', station_id)
cursor.executemany(None, dict_sequence)

Also see Oracle's Mastering Oracle+Python series of articles.

I would create a large SQL insert statement using union:

insert into mytable(col1, col2, col3)
select a, b, c from dual union
select d, e, f from dual union
select g, h, i from dual

You can build the string in python and give it to oracle as one statement to execute.

As one of the comments says, consider using INSERT ALL. Supposedly it'll be significantly faster than using executemany().

For example:

  INTO mytable (column1, column2, column_n) VALUES (expr1, expr2, expr_n)
  INTO mytable (column1, column2, column_n) VALUES (expr1, expr2, expr_n)
  INTO mytable (column1, column2, column_n) VALUES (expr1, expr2, expr_n)

fyi my test result:

I insert into 5000 rows. 3 columns per row.

  1. run insert 5000 times, it costs 1.24 minutes.
  2. run with executemany, it costs 0.125 seconds.
  3. run with a insert all code: it costs 4.08 minutes.

python code, which setup the sql like insert all into t(a,b,c) select :1, :2, :3 from dual union all select :4, :5: :6 from daul...

The python code to setup this long sql, it cost 0.145329 seconds.

I test my code on a very old sun machine. cpu: 1415 MH.

in the third case, I checked the database side, the wait event is "SQL*Net more data from client". which means the server is waiting for more data from client.

The result of the third method is unbelievable for me without the test.

so the short suggestion from me is just to use executemany.
