I am using the below code to trim all cells in my DataTable.
The problem is, that I am doing it through a loop, and depending on what I fill the DataTable with, if it has
OleDb Objects
Actually what I meant is, to get formatted/trimmed string values from the Excel Sheet and create a DataTable with DataColumn objects of string type only, use the forward-only OleDbDataReader to create both, DataColumn and DataRow objects as it reads. Doing so, the data will be modified and filled in one step hence no need to call another routine to loop again and waste some more time. Also, consider using asynchronous calls to speed up the process and avoid freezing the UI while executing the lengthy task.
Something might help you to go:
private async void TheCaller()
{
using (var ofd = new OpenFileDialog
{
Title = "Select File",
Filter = "Excel WorkBook|*.xlsx|Excel WorkBook 97 - 2003|*.xls|All Files(*.*)|*.*",
AutoUpgradeEnabled = true,
})
{
if (ofd.ShowDialog() != DialogResult.OK) return;
var conString = string.Empty;
var msg = "Loading... Please wait.";
try
{
switch (ofd.FilterIndex)
{
case 1: //xlsx
conString = $"Provider=Microsoft.ACE.OLEDB.12.0;Data Source={ofd.FileName};Extended Properties='Excel 12.0;HDR=Yes;IMEX=1;'";
break;
case 2: //xls
conString = $"Provider=Microsoft.Jet.OLEDB.4.0;Data Source={ofd.FileName};Extended Properties='Excel 8.0;HDR=Yes;IMEX=1;'";
break;
default:
throw new FileFormatException();
}
var sheetName = "sheet1";
var dt = new DataTable();
//Optional: a label to show the current status
//or maybe show a ProgressBar with ProgressBarStyle = Marquee
lblStatus.Text = msg;
await Task.Run(() =>
{
using (var con = new OleDbConnection(conString))
using (var cmd = new OleDbCommand($"SELECT * From [{sheetName}$]", con))
{
con.Open();
using (var r = cmd.ExecuteReader())
while (r.Read())
{
if (dt.Columns.Count == 0)
for (var i = 0; i < r.FieldCount; i++)
dt.Columns.Add(r.GetName(i).Trim(), typeof(string));
object[] values = new object[r.FieldCount];
r.GetValues(values);
dt.Rows.Add(values.Select(x => x?.ToString().Trim()).ToArray());
}
}
});
//If you want...
dataGridView1.DataSource = null;
dataGridView1.DataSource = dt;
msg = "Loading Completed";
}
catch (FileFormatException)
{
msg = "Unknown Excel file!";
}
catch (Exception ex)
{
msg = ex.Message;
}
finally
{
lblStatus.Text = msg;
}
}
}
Here's a demo, reading sheets with 8 columns and 5000 rows from both xls
and xlsx
files:
Less than a second. Not bad.
However, this will not work correctly if the Sheet has mixed-types columns like your case where the third column has string
and int
values in different rows. That because the data type of a column is guessed in Excel by examining the first 8 rows by default. Changing this behavior requires changing the registry value of TypeGuessRows
in HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\x.0\Engines\Excel
from 8 to 0 to force checking all the rows instead of just the first 8. This action will dramatically slow down the performance.
Office Interop Objects
Alternatively, you could use the Microsoft.Office.Interop.Excel
objects to read the Excel Sheet, get and format the values of the cells regardless of their types.
using Excel = Microsoft.Office.Interop.Excel;
//...
private async void TheCaller()
{
using (var ofd = new OpenFileDialog
{
Title = "Select File",
Filter = "Excel WorkBook|*.xlsx|Excel WorkBook 97 - 2003|*.xls|All Files(*.*)|*.*",
AutoUpgradeEnabled = true,
})
{
if (ofd.ShowDialog() != DialogResult.OK) return;
var msg = "Loading... Please wait.";
Excel.Application xlApp = null;
Excel.Workbook xlWorkBook = null;
try
{
var dt = new DataTable();
lblStatus.Text = msg;
await Task.Run(() =>
{
xlApp = new Excel.Application();
xlWorkBook = xlApp.Workbooks.Open(ofd.FileName, Type.Missing, true);
var xlSheet = xlWorkBook.Sheets[1] as Excel.Worksheet;
var xlRange = xlSheet.UsedRange;
dt.Columns.AddRange((xlRange.Rows[xlRange.Row] as Excel.Range)
.Cells.Cast<Excel.Range>()
.Where(h => h.Value2 != null)
.Select(h => new DataColumn(h.Value2.ToString()
.Trim(), typeof(string))).ToArray());
foreach (var r in xlRange.Rows.Cast<Excel.Range>().Skip(1))
dt.Rows.Add(r.Cells.Cast<Excel.Range>()
.Take(dt.Columns.Count)
.Select(v => v.Value2 is null
? string.Empty
: v.Value2.ToString().Trim()).ToArray());
});
(dataGridView1.DataSource as DataTable)?.Dispose();
dataGridView1.DataSource = null;
dataGridView1.DataSource = dt;
msg = "Loading Completed";
}
catch (FileFormatException)
{
msg = "Unknown Excel file!";
}
catch (Exception ex)
{
msg = ex.Message;
}
finally
{
xlWorkBook?.Close(false);
xlApp?.Quit();
Marshal.FinalReleaseComObject(xlWorkBook);
Marshal.FinalReleaseComObject(xlApp);
xlWorkBook = null;
xlApp = null;
GC.Collect();
GC.WaitForPendingFinalizers();
lblStatus.Text = msg;
}
}
}
Note: You need to add reference to the mentioned library.
Not fast especially with a big number of cells but it gets the desired output.