When returning data from a DataReader
I would typically use the ordinal reference on the DataReader
to grab the relevant column:
if
The string name lookup is much more expensive than the ordinal call, but more maintainable and less "fragile" than hard coding the ordinals. So here's how I've been doing it. It's the best of both worlds. I don't have to remember the ordinal values or care if the column order changes, but I get the performance benefits of using ordinals.
var dr = command.ExecuteQuery();
if (dr.HasRows)
{
//Get your ordinals here, before you run through the reader
int ordinalColumn1 = dr.GetOrdinal("Column1");
int ordinalColumn2 = dr.GetOrdinal("Column2");
int ordinalColumn3 = dr.GetOrdinal("Column3");
while(dr.Read())
{
// now access your columns by ordinal inside the Read loop.
//This is faster than doing a string column name lookup every time.
Console.WriteLine("Column1 = " + dr.GetString(ordinalColumn1);
Console.WriteLine("Column2 = " + dr.GetString(ordinalColumn2);
Console.WriteLine("Column3 = " + dr.GetString(ordinalColumn3);
}
}
Note: this only really makes sense for readers you expect to have a decent number of rows in. The GetOrdinal()
calls are extra, and only pay for themselves if your combined savings from calling GetString(int ordinalNumber)
in the loop are greater than the cost of calling GetOrdinal
.
Edit: missed the second part of this question. Regarding DBNull values, I've started writing extension methods that deal with that possibility. example: dr.GetDatetimeSafely()
Within those extension methods, you can do whatever you need to feel confident you get the expected value back.
I do think that indexed fields is the better approach, if it would only be for avoiding field names changes from the underlying database, which would require a recompile of your application because you hardcoded the field name.
For each of the fields, you will need to manually check for nulls.
var dr = command.ExecuteQuery();
if (dr.HasRows) {
var customer = new Customer();
// I'm assuming that you know each field's position returned from your query.
// Where comes the importance to write all of your selected fields and not just "*" or "ALL".
customer.Id = dr[0] == null || dr[0] == DBNull.Value ? null : Convert.ToInt32(dr[0]);
...
}
In addition to it, it would allow you to use reflection and make this "GetData()" method more generic providing a typeof(T) and getting the proper constructor for the proper type. The binding to the order of each columns is the only one thing some wish to avoid, but in this case, it becomes worthy.
The problem with ordinal is if the order of the columns change and you are forced to modify the consumer code for the DataReader, unlike using column names.
I dont think there is a performance gain when using ordinal or column names, it is more on best practice and coding standards and code maintainability really
Indexing the data reader by name is slightly more expensive. There are two main reasons for this.
However, in most cases the performance penalty incurred by looking up a field by name is dwarfed by the amount of time it takes for the database to execute the command. Do not let the performance penalty dictate your choice between name and numeric indexing.
Despite the slight performance penalty I normally choose name indexing for two reasons.
If you feel like the performance penalty of name indexing is becoming problematic (maybe the command executes quickly, but returns a lot of rows) then lookup the numeric index once by name, save it away, and use it for the remaining rows.
It is possible to argue both sides in this situation. As already pointed out by others, using the name is more readable and will not break if someone changes the order of columns in the underlying database. But one might also argue the case that using an ordinal has the advantage of not breaking if someone changes the column name in the underlying database. I prefer the former argument, though, and think the readability argument for column names trumps the second argument in general. And an additional argument for names is that it is that it can “self-detect” errors. If someone does change a field name, then the code has a better chance of breaking rather than having the subtle bug of appearing to work while it reads the wrong field.
It seems obvious but maybe it is worth mentioning a usage case that has both the self-detecting error and the performance of ordinals. If you specify the SELECT list explicitly in the SQL, then using ordinals won’t be a problem because the statement in the code guarantees the order:
SELECT name, address, phone from mytable
In this case, it would be fairly safe to use ordinals to access the data. It doesn’t matter if someone moves fields around in the table. And if someone changes a name, then the SQL statement produce an error when it runs.
And one final point. I just ran a test on a provider I helped write. The test read 1 million rows and accessed the “lastname” field on each record (compared against a value). The usage of rdr[“lastname”]
took 3301 milliseconds to process while rdr.GetString(1)
took 2640 milliseconds (approximately a 25% speedup). In this particular provider, the lookup of the name uses a sorted lookup to translate the name to ordinal.
I always go with the string name approach just because reading the code is cleaner. Having to mentally parse index to column name is horrible.