Improve efficiency of database row reading without boxing

问题

I use a boxing approach to read data from database via a type-switch helper class described here.

The boxing approach is mainly used for custom types that I derive from/to default types (e.g. I have a DB_Image element stored as an int32 value, which correspongs to the index in a image list).

I discovered how bad this approach was in terms of performance. The question is, can I do better?

Examples welcome.

回答1:

Ok, so I just went to write a little test program about different approaches to read data from a DataTable.

I use following 4 methods to set int, string, double and DateTime properties as follows:

Set Convert: myObj.IntValue = Convert.ToInt32(row["int"]);

Set cast: myObj.IntValue = (int)row["int"];

Set Field: myObj.IntValue = row.Field<int>("int");

Set TypeSwitch: myObj.IntValue = TypeSwitch.GetValue(typeof(int), row["int"]);

Test 1: data table without DBNull values Table with 10 rows, setting the 4 (int, string, double and DateTime) properties for each row, performed 1 million times:

Set Convert: 9002 ms

Set cast: 8504 ms

Set Field: 9312 ms

Set TypeSwitch: 10779 ms

The difference between the 3 first methods is not huge (+/-10%) but the TypeSwitch is paying the dictionary access price (27% slower than casting)...

Test 2: data table with DBNull values Now, of course, there are DBNull.Value problems, so I need to revise methods 1 to 3, which would crash otherwise.

Table with 10 rows (5 with only non-null values, 5 with only null-values) setting the 4 (int, string, double and DateTime) properties for each row, performed 1 million times:

I therefore add a test as follows to methods 1 through 3:

row["int"] != DBNull.Value ? xxxx : default(int)

where xxxx is as in previous tests.

For TypeSwitch, testing for DBNull.Value is also done. If not null, it returns Convert value, otherwise it uses another switch and returns a default value as the other methods.

Here the results:

Set Convert: 16995 ms

Set cast: 16487 ms

Set Field: 17204 ms

Set TypeSwitch: 10652 ms

Now big surprise - we're actually much faster with TypeSwitch than the other methods... Actually, this is because we're accessing the field twice in the row in methods 1-3, so I use this revised version for methods 1&2

var val = row["int"];
int i = val != DBNull.Value ? xxxx : default(val);

Here the revised results (still 5 non-empty and 5 null rows, executed 1 million times):

Set Convert: 9256.1556 ms

Set cast: 8895.1208 ms

Set Field: 13549.9371 ms

Set TypeSwitch: 10935.5278 ms

Ok, we're back where we were, casting is fastest, TypeSwitch is now 18% slower and Field is a clear loser.

Now for the sake of completeness, let's just look at results when we remove the row field access out of the time measurements var val = row["int"]; :

Set Convert: 630.113 ms

Set cast: 314.9697 ms

Set Field: 5149.6852 ms

Set TypeSwitch: 2354.9869 ms

Kaboooom! Clearly, what takes most of the time is actually the random access to the datable's columns, not the box conversions.

So, I went on trying to access columns by ID instead of column name (I re-added row access times to the time measurements)

Set Convert: 1480.0548 ms

Set cast: 1139.0841 ms

Set Field: 5928.8186 ms

Set TypeSwitch: 2923.8451 ms

Now we're at the bottom line of this test : cast is now the clear winner!

Conclusion For sure, it is a bad idea to use r.Field<int>("aId") to set values, and TypeSwitch suffers the overhead of the dictionary access.

Also, it's a bad performance thing to access fields by name instead of index.

UPDATE

I've written another test to compare DataTable.Fill, FbDataReader and Dapper.Query<T> to fill a list of ObjectA items declared as follows :

public class ObjectA : ObjectBase {
        public int aID;
        public string aStr;
        public double aDbl;
        public DateTime aDate;
    }

I've run 20 times a sample reading the data from a Firebird 2.5 database with 322759 rows and building the corresponding List<ObjectA>. I added a test to check all three methods load the same number of items and that all items have matching properties. For DataTable and FbDataReader I used the Cast method described above, accessing elements with their Id for DataTable. I realized that for FbDataReader, access times don't change too much whether using the field name or the column ID, I therefore kept the field name as accessor.

Here the results

DataTable:       42742.922 ms
FbDataReader:    31088.6197 ms
Dapper.Query<T>: 28626.5392 ms

So winner is clearly Dapper, DataTable is the loser, as expected. Note that times are for loading over 6Mio records, which in all three cases is pretty impressive...

来源：https://stackoverflow.com/questions/40812841/improve-efficiency-of-database-row-reading-without-boxing

标签

performance

boxing