How to force ADO.Net to use only the System.String DataType in the readers TableSchema

别来无恙 提交于 2019-12-01 14:12:03

问题


I am using an OleDbConnection to query an Excel 2007 Spreadsheet. I want force the OleDbDataReader to use only string as the column datatype.

The system is looking at the first 8 rows of data and inferring the data type to be Double. The problem is that on row 9 I have a string in that column and the OleDbDataReader is returning a Null value since it could not be cast to a Double.

I have used these connection strings:

Provider=Microsoft.ACE.OLEDB.12.0;Data Source="ExcelFile.xlsx";Persist Security Info=False;Extended Properties="Excel 12.0;IMEX=1;HDR=No"

Provider=Microsoft.Jet.OLEDB.4.0;Data Source="ExcelFile.xlsx";Persist Security Info=False;Extended Properties="Excel 8.0;HDR=No;IMEX=1"

Looking at the reader.GetSchemaTable().Rows[7].ItemArray[5], it's dataType is Double.

Row 7 in this schema correlates with the specific column in Excel I am having issues with. ItemArray[5] is its DataType column

Is it possible to create a custom TableSchema for the reader so when accessing the ExcelFiles, I can treat all cells as text instead of letting the system attempt to infer the datatype?


I found some good info at this page: Tips for reading Excel spreadsheets using ADO.NET

The main quirk about the ADO.NET interface is how datatypes are handled. (You'll notice I've been carefully avoiding the question of which datatypes are returned when reading the spreadsheet.) Are you ready for this? ADO.NET scans the first 8 rows of data, and based on that guesses the datatype for each column. Then it attempts to coerce all data from that column to that datatype, returning NULL whenever the coercion fails!

Thank you,
Keith


Here is a reduced version of my code:

using (OleDbConnection connection = new OleDbConnection(BuildConnectionString(dataMapper).ToString()))
{
    connection.Open();
    using (OleDbCommand cmd = new OleDbCommand())
    {
        cmd.Connection = connection;
        cmd.CommandText = SELECT * from [Sheet1$];
        using (OleDbDataReader reader = cmd.ExecuteReader())
        {
            using (DataTable dataTable = new DataTable("TestTable"))
            {
                dataTable.Load(reader);
                base.SourceDataSet.Tables.Add(dataTable);
            }
        }
    }
}

回答1:


As you have discovered, OLEDB uses Jet which is limited in the manner in which it can be tweaked. If you are set on using an OleDbConnection to read from an Excel file, then you need to set the HKLM\...\Microsoft\Jet\4.0\Engines\Excel\TypeGuessRows value to zero so that the system will scan the entire resultset.

That said, if you are open to using an alternative engine to read from an Excel file, you might consider trying the ExcelDataReader. It reads all columns as strings but will let you use dataReader.Getxxx methods to get typed values. Here's a sample that fills a DataSet:

DataSet result;
const string path = @"....\Test.xlsx";
using ( var fileStream = new FileStream( path, FileMode.Open, FileAccess.Read ) )
{
    using ( var excelReader = ExcelReaderFactory.CreateOpenXmlReader( fileStream ) )
    {
        excelReader.IsFirstRowAsColumnNames = true;
        result = excelReader.AsDataSet();
    }
}



回答2:


Note for 64bit OS it is here:

My Computer\HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Jet\4.0\Engines\Excel



回答3:


Check out the final answer on this page.


Just noticed the page you refer to says the same thing ...


Update:

The problem seems to be with the JET engine itself and not ADO. Once JET decides on the type, it sticks to it. Anything done after that has no effect; like casting the values to string in the SQL (e.g. Cstr([Column])) just results in an empty string being returned.

At this point (if there are no other answers) I'd opt for other methods: modifying the spreadsheet; modifying registry (not ideal since you will be messing with the settings for every other app the uses JET); Excel automation or a third party component that does not use JET.

If Automation option is to slow then maybe just use it to save the spreadsheet in a different format which is easier to handle.




回答4:


I have faced the same issue and determined that this is something that many people commonly experience. Here are a number of solutions that have been suggested, many of which I have attempted to implement:


  1. Add the following to your connection string(Source):

TypeGuessRows=0;ImportMixedTypes=Text

  1. Add the following to your connection string(Source, More Discussion, Even More):

IMEX=1;HDR=NO;

  1. Edit the following registry settings, disable "TypeGuessRows", and "ImportMixedTypes" set to "Text"(Source, Not Recommended, More Documentation):

Hkey_Local_Machine/Software/Microsoft/Jet/4.0/Engines/Excel/TypeGuessRows Hkey_Local_Machine/Software/Microsoft/Jet/4.0/Engines/Excel/ImportMixedTypes

  1. Consider using an alternative library for reading the excel file:

    • EPPlus
    • ExcelDataReader (also suggested be @Thomas)
    • OpenXml
  2. Format all data in the source file as Text(at least the first 8 rows), though I understand that's typically impractical(Source, though this is relation to SSIS, but it's the same concepts)

  3. Use a Schema.ini file to define the data type before importing the file, I found this in relation to using "Jet.OleDb" directly, maybe requiring you to modifying your connection string. This may only be applicable to CSV's I have not tried this approach.(Source, Related Post)


None of these have worked for me(though I believe they have worked for others). I am of the opinion expressed by @Asher that there is really no good solution to this problem. In my software I simply display an error message to the user(if any required column contain empty values) instructing them to format all columns as "Text".

Honestly, I think this book is more applicable to situation. The issue, already stated multiple times is:

  • "The data type at the destination is varchar but the assumed data type of "double" nullifies any data that doesn't fit."(Source)

  • "But the problem is actually with the OLEDBDataReader. The problem is that if it sees mostly numbers in a column, it assumes everything is a number - if a row item being read is not a number, it simply sets it to null! Ouch!"(Source)

  • "The problem seems to be with the JET engine itself and not ADO. Once JET decides on the type, it sticks to it."(@Asher)

While I haven't found any of this documented in an official capacity I think that it's very clear that this is an intentional design decision and simply how the Jet Database Library works. I hesitate to call this library entirely useless because I think for many people some of these solutions do work, but so far for my project, I have come to the conclusion that this library cannot read multiple data types in a single column and is ill suited for general data retrieval.



来源:https://stackoverflow.com/questions/2567673/how-to-force-ado-net-to-use-only-the-system-string-datatype-in-the-readers-table

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!