问题
I am working with a client to import a rather larger Excel file (over 37K rows) into a custom system and utilizing the excellent LinqToExcel library to do so. While reading all of the data in, I noticed it was breaking on records about 80% in and dug a little further. The reason it fails is the majority of records (with associated dates ranging 2011 - 2015) are normal, e.g. 1/3/2015
, however starting in 2016, the structure changes to look like this: '1/4/2016
(note the "tick" at the beginning of the date) and LinqToExcel starts returning a DBNull for that column.
Any ideas on why it would do that and ways around it? Note that this isn't a casting issue - I can use the Immediate Window to see all the values of the LinqToExcel.Row
value and where that column index is, it's empty.
Edit
Here is the code I am using to read in the file:
var excel = new LinqToExcel.ExcelQueryFactory(Path.Combine(this.FilePath, this.CurrentFilename));
foreach (var row in excel.Worksheet(file.WorksheetName))
{
data.Add(this.FillEntity(row));
}
The problem I'm referring to is inside the row
variable, which is a LinqToExcel.Row
instance and contains the raw data from Excel. The values inside row
all line up, with the exception of the column for the date which is empty.
** Edit 2 **
I downloaded the LinqToExcel code from GitHub and connected it to my project and it looks like the issue is even deeper than this library. It uses an IDataReader
to read in all of the values and the cells in question that aren't being read are empty from that level. Here is the block of code from the
LinqToExcel.ExcelQueryExecutor
class that is failing:
private IEnumerable<object> GetRowResults(IDataReader data, IEnumerable<string> columns)
{
var results = new List<object>();
var columnIndexMapping = new Dictionary<string, int>();
for (var i = 0; i < columns.Count(); i++)
columnIndexMapping[columns.ElementAt(i)] = i;
while (data.Read())
{
IList<Cell> cells = new List<Cell>();
for (var i = 0; i < columns.Count(); i++)
{
var value = data[i];
//I added this in, since the worksheet has over 37K rows and
//I needed to snag right before it hit the values I was looking for
//to see what the IDataReader was exposing. The row inside the
//IDataReader relevant to the column I'm referencing is null,
//even though the data definitely exists in the Excel file
if (value.GetType() == typeof(DateTime) && value.Cast<DateTime>() == new DateTime(2015, 12, 31))
{
}
value = TrimStringValue(value);
cells.Add(new Cell(value));
}
results.CallMethod("Add", new Row(cells, columnIndexMapping));
}
return results.AsEnumerable();
}
Since their class uses an OleDbDataReader
to retrieve the results, I think that is what can't find the value of the cell in question. I don't even know where to go from there.
回答1:
Found it! Once I traced down that it was the OleDbDataReader
that was failing and not the LinqToExcel library itself, it sent me down a different path to look around. Apparently, when an Excel file is read by an OleDbDataReader
(as virtually all utilities do under the covers), the first few records are scanned to determine the type of content associated with the column. In my scenario, over 20K records had "normal" dates, so it assumed everything was a date. Once it got to the "bad" records, the '
in front of the date meant it couldn't be parsed into a date, so the value was null.
To circumvent this, I load the file and tell it to ignore column headers. Since the header for this column is a string and most of the values are dates, it makes everything a string because of the mismatched types and the values I need are loaded properly. From there, I can parse accordingly and get it to work.
Source: What is IMEX in the OLEDB connection string?
来源:https://stackoverflow.com/questions/45314971/linqtoexcel-not-parsing-date