问题
I use a boxing approach to read data from database via a type-switch helper class described here.
The boxing approach is mainly used for custom types that I derive from/to default types (e.g. I have a DB_Image
element stored as an int32
value, which correspongs to the index in a image list).
I discovered how bad this approach was in terms of performance. The question is, can I do better?
Examples welcome.
回答1:
Ok, so I just went to write a little test program about different approaches to read data from a DataTable
.
I use following 4 methods to set int
, string
, double
and DateTime
properties as follows:
Set Convert: myObj.IntValue = Convert.ToInt32(row["int"]);
Set cast: myObj.IntValue = (int)row["int"];
Set Field: myObj.IntValue = row.Field<int>("int");
Set TypeSwitch
: myObj.IntValue = TypeSwitch.GetValue(typeof(int), row["int"]);
Test 1: data table without DBNull
values
Table with 10 rows, setting the 4 (int
, string
, double
and DateTime
) properties for each row, performed 1 million times:
Set Convert: 9002 ms
Set cast: 8504 ms
Set Field: 9312 ms
Set TypeSwitch
: 10779 ms
The difference between the 3 first methods is not huge (+/-10%) but the TypeSwitch
is paying the dictionary access price (27% slower than casting)...
Test 2: data table with DBNull
values
Now, of course, there are DBNull.Value
problems, so I need to revise methods 1 to 3, which would crash otherwise.
Table with 10 rows (5 with only non-null values, 5 with only null-values) setting the 4 (int
, string
, double
and DateTime
) properties for each row, performed 1 million times:
I therefore add a test as follows to methods 1 through 3:
row["int"] != DBNull.Value ? xxxx : default(int)
where xxxx
is as in previous tests.
For TypeSwitch
, testing for DBNull.Value
is also done. If not null, it returns Convert
value, otherwise it uses another switch and returns a default value as the other methods.
Here the results:
Set Convert: 16995 ms
Set cast: 16487 ms
Set Field: 17204 ms
Set TypeSwitch
: 10652 ms
Now big surprise - we're actually much faster with TypeSwitch
than the other methods... Actually, this is because we're accessing the field twice in the row in methods 1-3, so I use this revised version for methods 1&2
var val = row["int"];
int i = val != DBNull.Value ? xxxx : default(val);
Here the revised results (still 5 non-empty and 5 null rows, executed 1 million times):
Set Convert: 9256.1556 ms
Set cast: 8895.1208 ms
Set Field: 13549.9371 ms
Set TypeSwitch
: 10935.5278 ms
Ok, we're back where we were, casting is fastest, TypeSwitch
is now 18% slower and Field is a clear loser.
Now for the sake of completeness, let's just look at results when we remove the row field access out of the time measurements var val = row["int"];
:
Set Convert: 630.113 ms
Set cast: 314.9697 ms
Set Field: 5149.6852 ms
Set TypeSwitch
: 2354.9869 ms
Kaboooom! Clearly, what takes most of the time is actually the random access to the datable's columns, not the box conversions.
So, I went on trying to access columns by ID instead of column name (I re-added row access times to the time measurements)
Set Convert: 1480.0548 ms
Set cast: 1139.0841 ms
Set Field: 5928.8186 ms
Set TypeSwitch
: 2923.8451 ms
Now we're at the bottom line of this test : cast is now the clear winner!
Conclusion
For sure, it is a bad idea to use r.Field<int>("aId")
to set values, and TypeSwitch
suffers the overhead of the dictionary access.
Also, it's a bad performance thing to access fields by name instead of index.
UPDATE
I've written another test to compare DataTable.Fill
, FbDataReader
and Dapper.Query<T>
to fill a list of ObjectA
items declared as follows :
public class ObjectA : ObjectBase {
public int aID;
public string aStr;
public double aDbl;
public DateTime aDate;
}
I've run 20 times a sample reading the data from a Firebird 2.5 database with 322759 rows and building the corresponding List<ObjectA>
. I added a test to check all three methods load the same number of items and that all items have matching properties.
For DataTable
and FbDataReader
I used the Cast
method described above, accessing elements with their Id for DataTable
. I realized that for FbDataReader
, access times don't change too much whether using the field name or the column ID, I therefore kept the field name as accessor.
Here the results
DataTable: 42742.922 ms
FbDataReader: 31088.6197 ms
Dapper.Query<T>: 28626.5392 ms
So winner is clearly Dapper
, DataTable
is the loser, as expected. Note that times are for loading over 6Mio records, which in all three cases is pretty impressive...
来源:https://stackoverflow.com/questions/40812841/improve-efficiency-of-database-row-reading-without-boxing