Bulk Insert to Oracle using .NET

依然范特西╮ 提交于 2019-12-17 03:35:29

问题


What is the fastest way to do Bulk insert to Oracle using .NET? I need to transfer about 160K records using .NET to Oracle. Currently, I'm using insert statement and execute it 160K times.It takes about 25 minutes to complete. The source data is stored in a DataTable, as a result of query from another database (MySQL),

Is there any better way to do this?

EDIT : I'm currently using System.Data.OracleClient, but willing to accept solutions using another provider (ODP.NET, DevArt, etc..)


回答1:


I'm loading 50,000 records in 15 or so seconds using Array Binding in ODP.NET

It works by repeatedly invoking a stored procedure you specify (and in which you can do updates/inserts/deletes), but it passes the multiple parameter values from .NET to the database in bulk.

Instead of specifying a single value for each parameter to the stored procedure you specify an array of values for each parameter.

Oracle passes the parameter arrays from .NET to the database in one go, and then repeatedly invokes the stored procedure you specify using the parameter values you specified.

http://www.oracle.com/technetwork/issue-archive/2009/09-sep/o59odpnet-085168.html

/Damian




回答2:


I recently discovered a specialized class that's awesome for a bulk insert (ODP.NET). Oracle.DataAccess.Client.OracleBulkCopy! It takes a datatable as a parameter, then you call WriteTOServer method...it is very fast and effective, good luck!!




回答3:


The solution of Rob Stevenson-Legget is slow because he doesn't bind his values but he uses string.Format( ).

When you ask Oracle to execute a sql statement it starts with calculating the has value of this statement. After that it looks in a hash table whether it already knows this statement. If it already knows it statement it can retrieve its execution path from this hash table and execute this statement really fast because Oracle has executed this statement before. This is called the library cache and it doesn't work properly if you don't bind your sql statements.

For example don't do:

int n;

    for (n = 0; n < 100000; n ++)
    {
      mycommand.CommandText = String.Format("INSERT INTO [MyTable] ([MyId]) VALUES({0})", n + 1);
      mycommand.ExecuteNonQuery();
    }

but do:

      OracleParameter myparam = new OracleParameter();
      int n;

      mycommand.CommandText = "INSERT INTO [MyTable] ([MyId]) VALUES(?)";
      mycommand.Parameters.Add(myparam);

      for (n = 0; n < 100000; n ++)
      {
        myparam.Value = n + 1;
        mycommand.ExecuteNonQuery();
      }

Not using parameters can also cause sql injection.




回答4:


SQL Server's SQLBulkCopy is blindingly fast. Unfortunately, I found that OracleBulkCopy is far slower. Also it has problems:

  • You must be very sure that your input data is clean if you plan to use OracleBulkCopy. If a primary key violation occurs, an ORA-26026 is raised and it appears to be unrecoverable. Trying to rebuild the index does not help and any subsequent insert on the table fails, also normal inserts.
  • Even if the data is clean, I found that OracleBulkCopy sometimes gets stuck inside WriteToServer. The problem seems to depend on the batch size. In my test data, the problem would happen at the exact same point in my test when I repeat is. Use a larger or smaller batch size, and the problem does not happen. I see that the speed is more irregular on larger batch sizes, this points to problems related to memory management.

Actually System.Data.OracleClient.OracleDataAdapter is faster than OracleBulkCopy if you want to fill a table with small records but many rows. You need to tune the batch size though, the optimum BatchSize for OracleDataAdapter is smaller than for OracleBulkCopy.

I ran my test on a Windows 7 machine with an x86 executable and the 32 bits ODP.Net client 2.112.1.0. . The OracleDataAdapter is part of System.Data.OracleClient 2.0.0.0. My test set is about 600,000 rows with a record size of max. 102 bytes (average size 43 chars). Data source is a 25 MB text file, read in line by line as a stream.

In my test I built up the input data table to a fixed table size and then used either OracleBulkCopy or OracleDataAdapter to copy the data block to the server. I left BatchSize as 0 in OracleBulkCopy (so that the current table contents is copied as one batch) and set it to the table size in OracleDataAdapter (again that should create a single batch internally). Best results:

  • OracleBulkCopy: table size = 500, total duration 4'22"
  • OracleDataAdapter: table size = 100, total duration 3'03"

For comparison:

  • SqlBulkCopy: table size = 1000, total duration 0'15"
  • SqlDataAdapter: table size = 1000, total duration 8'05"

Same client machine, test server is SQL Server 2008 R2. For SQL Server, bulk copy is clearly the best way to go. Not only is it overall fastest, but server load is also lower than when using data adapter. It is a pity that OracleBulkCopy does not offer quite the same experience - the BulkCopy API is much easier to use than DataAdapter.




回答5:


A really fast way to solve this problem is to make a database link from the Oracle database to the MySQL database. You can create database links to non-Oracle databases. After you have created the database link you can retrieve your data from the MySQL database with a ... create table mydata as select * from ... statement. This is called heterogeneous connectivity. This way you don't have to do anything in your .net application to move the data.

Another way is to use ODP.NET. In ODP.NET you can use the OracleBulkCopy-class.

But I don't think that inserting 160k records in an Oracle table with System.Data.OracleClient should take 25 minutes. I think you commit too many times. And do you bind your values to the insert statement with parameters or do you concatenate your values. Binding is much faster.




回答6:


To follow up on Theo's suggestion with my findings (apologies - I don't currently have enough reputation to post this as a comment)

First, this is how to use several named parameters:

String commandString = "INSERT INTO Users (Name, Desk, UpdateTime) VALUES (:Name, :Desk, :UpdateTime)";
using (OracleCommand command = new OracleCommand(commandString, _connection, _transaction))
{
    command.Parameters.Add("Name", OracleType.VarChar, 50).Value = strategy;
    command.Parameters.Add("Desk", OracleType.VarChar, 50).Value = deskName ?? OracleString.Null;
    command.Parameters.Add("UpdateTime", OracleType.DateTime).Value = updated;
    command.ExecuteNonQuery();
}

However, I saw no variation in speed between:

  • constructing a new commandString for each row (String.Format)
  • constructing a now parameterized commandString for each row
  • using a single commandString and changing the parameters

I'm using System.Data.OracleClient, deleting and inserting 2500 rows inside a transaction




回答7:


Finding the linked examples somewhat confusing, I worked out some code that demonstrates a working array insert into a test table (jkl_test). Here's the table:

create table jkl_test (id number(9));

Here is .Net code for a simple Console application that connects to Oracle using ODP.Net and inserts an array of 5 integers:

using Oracle.DataAccess.Client;

namespace OracleArrayInsertExample
{
    class Program
    {
        static void Main(string[] args)
        {
            // Open a connection using ODP.Net
            var connection = new OracleConnection("Data Source=YourDatabase; Password=YourPassword; User Id=YourUser");
            connection.Open();

            // Create an insert command
            var command = connection.CreateCommand();
            command.CommandText = "insert into jkl_test values (:ids)";

            // Set up the parameter and provide values
            var param = new OracleParameter("ids", OracleDbType.Int32);
            param.Value = new int[] { 22, 55, 7, 33, 11 };

            // This is critical to the process; in order for the command to 
            // recognize and bind arrays, an array bind count must be specified.
            // Set it to the length of the array.
            command.ArrayBindCount = 5;
            command.Parameters.Add(param);
            command.ExecuteNonQuery();
        }
    }
}



回答8:


Oracle says (http://www.oracle.com/technology/products/database/utilities/htdocs/sql_loader_overview.html)

SQL*Loader is the primary method for quickly populating Oracle tables with data from external files

My experience is that their loader loads their tables faster than anything else.




回答9:


If you are using unmanaged oracle client (Oracle.DataAccess) then the fastest way is to use OracleBulkCopy, as was pointed by Tarik.

If you are using latest managed oracle client (Oracle.ManagedDataAccess) then the fastest way is to use array binding, as was pointed by Damien. If you wish keep your application code clean from array binding specifics, you could write your own implementation of OracleBulkCopy using array binding.

Here is usage example from real project:

var bulkWriter = new OracleDbBulkWriter();
    bulkWriter.Write(
        connection,
        "BULK_WRITE_TEST",
        Enumerable.Range(1, 10000).Select(v => new TestData { Id = v, StringValue=v.ToString() }).ToList());

10K records are inserted in 500ms!

Here is implementation:

public class OracleDbBulkWriter : IDbBulkWriter
{
    public void Write<T>(IDbConnection connection, string targetTableName, IList<T> data, IList<ColumnToPropertyMapping> mappings = null)
    {
        if (connection == null)
        {
            throw new ArgumentNullException(nameof(connection));
        }
        if (string.IsNullOrEmpty(targetTableName))
        {
            throw new ArgumentNullException(nameof(targetTableName));
        }
        if (data == null)
        {
            throw new ArgumentNullException(nameof(data));
        }
        if (mappings == null)
        {
            mappings = GetGenericMappings<T>();
        }

        mappings = GetUniqueMappings<T>(mappings);
        Dictionary<string, Array> parameterValues = InitializeParameterValues<T>(mappings, data.Count);
        FillParameterValues(parameterValues, data);

        using (var command = CreateCommand(connection, targetTableName, mappings, parameterValues))
        {
            command.ExecuteNonQuery();
        }
    }

    private static IDbCommand CreateCommand(IDbConnection connection, string targetTableName, IList<ColumnToPropertyMapping> mappings, Dictionary<string, Array> parameterValues)
    {
        var command = (OracleCommandWrapper)connection.CreateCommand();
        command.ArrayBindCount = parameterValues.First().Value.Length;

        foreach(var mapping in mappings)
        {
            var parameter = command.CreateParameter();
            parameter.ParameterName = mapping.Column;
            parameter.Value = parameterValues[mapping.Property];

            command.Parameters.Add(parameter);
        }

        command.CommandText = $@"insert into {targetTableName} ({string.Join(",", mappings.Select(m => m.Column))}) values ({string.Join(",", mappings.Select(m => $":{m.Column}")) })";
        return command;
    }

    private IList<ColumnToPropertyMapping> GetGenericMappings<T>()
    {
        var accessor = TypeAccessor.Create(typeof(T));

        var mappings = accessor.GetMembers()
            .Select(m => new ColumnToPropertyMapping(m.Name, m.Name))
            .ToList();

        return mappings;
    }

    private static IList<ColumnToPropertyMapping> GetUniqueMappings<T>(IList<ColumnToPropertyMapping> mappings)
    {
        var accessor = TypeAccessor.Create(typeof(T));
        var members = new HashSet<string>(accessor.GetMembers().Select(m => m.Name));

        mappings = mappings
                        .Where(m => m != null && members.Contains(m.Property))
                        .GroupBy(m => m.Column)
                        .Select(g => g.First())
                        .ToList();
        return mappings;
    }

    private static Dictionary<string, Array> InitializeParameterValues<T>(IList<ColumnToPropertyMapping> mappings, int numberOfRows)
    {
        var values = new Dictionary<string, Array>(mappings.Count);
        var accessor = TypeAccessor.Create(typeof(T));
        var members = accessor.GetMembers().ToDictionary(m => m.Name);

        foreach(var mapping in mappings)
        {
            var member = members[mapping.Property];

            values[mapping.Property] = Array.CreateInstance(member.Type, numberOfRows);
        }

        return values;
    }

    private static void FillParameterValues<T>(Dictionary<string, Array> parameterValues, IList<T> data)
    {
        var accessor = TypeAccessor.Create(typeof(T));
        for (var rowNumber = 0; rowNumber < data.Count; rowNumber++)
        {
            var row = data[rowNumber];
            foreach (var pair in parameterValues)
            {
                Array parameterValue = pair.Value;
                var propertyValue = accessor[row, pair.Key];
                parameterValue.SetValue(propertyValue, rowNumber);
            }
        }
    }
}

NOTE: this implementation uses Fastmember package for optimized access to properties(much faster than reflection)




回答10:


I guess that OracleBulkCopy is one of the fastest ways. I had some trouble to learn, that I needed a new ODAC version. Cf. Where is type [Oracle.DataAccess.Client.OracleBulkCopy] ?

Here is the complete PowerShell code to copy from a query into a suited existing Oracle table. I tried Sql-Server a datasource, but other valid OLE-DB sources will go to.

if ($ora_dll -eq $null)
{
    "Load Oracle dll"
    $ora_dll = [System.Reflection.Assembly]::LoadWithPartialName("Oracle.DataAccess") 
    $ora_dll
}

# sql-server or Oracle source example is sql-server
$ConnectionString ="server=localhost;database=myDatabase;trusted_connection=yes;Provider=SQLNCLI10;"

# Oracle destination
$oraClientConnString = "Data Source=myTNS;User ID=myUser;Password=myPassword"

$tableName = "mytable"
$sql = "select * from $tableName"

$OLEDBConn = New-Object System.Data.OleDb.OleDbConnection($ConnectionString)
$OLEDBConn.open()
$readcmd = New-Object system.Data.OleDb.OleDbCommand($sql,$OLEDBConn)
$readcmd.CommandTimeout = '300'
$da = New-Object system.Data.OleDb.OleDbDataAdapter($readcmd)
$dt = New-Object system.Data.datatable
[void]$da.fill($dt)
$OLEDBConn.close()
#Write-Output $dt

if ($dt)
{
    try
    {
        $bulkCopy = new-object ("Oracle.DataAccess.Client.OracleBulkCopy") $oraClientConnString
        $bulkCopy.DestinationTableName = $tableName
        $bulkCopy.BatchSize = 5000
        $bulkCopy.BulkCopyTimeout = 10000
        $bulkCopy.WriteToServer($dt)
        $bulkcopy.close()
        $bulkcopy.Dispose()
    }
    catch
    {
        $ex = $_.Exception
        Write-Error "Write-DataTable$($connectionName):$ex.Message"
        continue
    }
}

BTW: I use this to copy table with CLOB columns. I didn't get that to work using linked servers cf. question on dba. I didn't retry linked serves with the new ODAC.



来源:https://stackoverflow.com/questions/343299/bulk-insert-to-oracle-using-net

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!