问题
I have ~100,000 to 1,000,000 rows to insert into an Oracle18c database. I'm quite new with Oracle and this order of magnitude of data. I reckon there must be some optimal way to do it, but for now I've only managed to implement a line by line insertion:
def insertLines(connection, tableName, column_names, rows):
cursor = connection.cursor()
if tableExists(connection, tableName):
for row in rows:
sql = 'INSERT INTO {} ({}) VALUES ({})'.format(tableName, column_names, row)
cursor.execute(sql)
cursor.close()
Is there some clear way in Oracle to bulk the rows to reach higher effectivity using cx_Oracle (the python Oracle library)?
EDIT: I read the data from a CSV file.
回答1:
If your data is already in Python, then use executemany(). In your case with so many rows, you probably would still execute multiple calls to insert batches of records. See https://blogs.oracle.com/opal/efficient-and-scalable-batch-statement-execution-in-python-cx_oracle
data = [
(60, "Parent 60"),
(70, "Parent 70"),
(80, "Parent 80"),
(90, "Parent 90"),
(100, "Parent 100")
]
cursor.executemany("""
insert into ParentTable (ParentId, Description)
values (:1, :2)""", data)
As pointed out by others
- Avoid using string interpolation in statements because it is a security risk. It is also generally a scalability problem. Use bind variables. Where you need to use string interpolation for things like column names, make sure you santize any values.
- If the data is already on disk, then using something like SQL*Loader or Data Pump will be better than reading it into cx_Oracle and then sending it to the DB.
回答2:
I don't know what format you have the data in, but SQL Data Loader is a command line utility specifically created for adding large amounts of data to Oracle.
回答3:
The most optimal way in terms of performance and easy would be to create an External Table over your CSV file and then use SQL do the insert.
来源:https://stackoverflow.com/questions/55271615/how-to-insert-1-million-rows-into-oracle-database-with-python