Trimming all cells in DataTable

后端 未结 1 616
说谎
说谎 2021-01-28 17:02

I am using the below code to trim all cells in my DataTable.

The problem is, that I am doing it through a loop, and depending on what I fill the DataTable with, if it has

相关标签:
1条回答
  • 2021-01-28 17:44

    OleDb Objects


    Actually what I meant is, to get formatted/trimmed string values from the Excel Sheet and create a DataTable with DataColumn objects of string type only, use the forward-only OleDbDataReader to create both, DataColumn and DataRow objects as it reads. Doing so, the data will be modified and filled in one step hence no need to call another routine to loop again and waste some more time. Also, consider using asynchronous calls to speed up the process and avoid freezing the UI while executing the lengthy task.

    Something might help you to go:

    private async void TheCaller()
    {
        using (var ofd = new OpenFileDialog
        {
            Title = "Select File",
            Filter = "Excel WorkBook|*.xlsx|Excel WorkBook 97 - 2003|*.xls|All Files(*.*)|*.*",
            AutoUpgradeEnabled = true,
        })
        {
            if (ofd.ShowDialog() != DialogResult.OK) return;
    
            var conString = string.Empty;
            var msg = "Loading... Please wait.";
    
            try
            {
                switch (ofd.FilterIndex)
                {
                    case 1: //xlsx
                        conString = $"Provider=Microsoft.ACE.OLEDB.12.0;Data Source={ofd.FileName};Extended Properties='Excel 12.0;HDR=Yes;IMEX=1;'";                            
                        break;
                    case 2: //xls
                        conString = $"Provider=Microsoft.Jet.OLEDB.4.0;Data Source={ofd.FileName};Extended Properties='Excel 8.0;HDR=Yes;IMEX=1;'";
                        break;
                    default:
                        throw new FileFormatException();
                }
    
                var sheetName = "sheet1";
                var dt = new DataTable();
    
                //Optional: a label to show the current status
                //or maybe show a ProgressBar with ProgressBarStyle = Marquee
                lblStatus.Text = msg;
    
                await Task.Run(() =>
                {
                    using (var con = new OleDbConnection(conString))
                    using (var cmd = new OleDbCommand($"SELECT * From [{sheetName}$]", con))
                    {
                        con.Open();
    
                        using (var r = cmd.ExecuteReader())
                            while (r.Read())
                            {
                                if (dt.Columns.Count == 0)
                                    for (var i = 0; i < r.FieldCount; i++)
                                        dt.Columns.Add(r.GetName(i).Trim(), typeof(string));
    
                                object[] values = new object[r.FieldCount];
    
                                r.GetValues(values);
                                dt.Rows.Add(values.Select(x => x?.ToString().Trim()).ToArray());
                            }
                    }
                });
    
                //If you want...
                dataGridView1.DataSource = null;
                dataGridView1.DataSource = dt;
    
                msg = "Loading Completed";
            }
            catch (FileFormatException)
            {
                msg = "Unknown Excel file!";
            }
            catch (Exception ex)
            {
                msg = ex.Message;
            }
            finally
            {
                lblStatus.Text = msg;
            }
        }
    }
    

    Here's a demo, reading sheets with 8 columns and 5000 rows from both xls and xlsx files:

    Less than a second. Not bad.

    However, this will not work correctly if the Sheet has mixed-types columns like your case where the third column has string and int values in different rows. That because the data type of a column is guessed in Excel by examining the first 8 rows by default. Changing this behavior requires changing the registry value of TypeGuessRows in HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\x.0\Engines\Excel from 8 to 0 to force checking all the rows instead of just the first 8. This action will dramatically slow down the performance.

    Office Interop Objects


    Alternatively, you could use the Microsoft.Office.Interop.Excel objects to read the Excel Sheet, get and format the values of the cells regardless of their types.

    using Excel = Microsoft.Office.Interop.Excel;
    //...
    
    private async void TheCaller()
    {
        using (var ofd = new OpenFileDialog
        {
            Title = "Select File",
            Filter = "Excel WorkBook|*.xlsx|Excel WorkBook 97 - 2003|*.xls|All Files(*.*)|*.*",
            AutoUpgradeEnabled = true,
        })
        {
            if (ofd.ShowDialog() != DialogResult.OK) return;
    
            var msg = "Loading... Please wait.";
            Excel.Application xlApp = null;
            Excel.Workbook xlWorkBook = null;
    
            try
            {
                var dt = new DataTable();
    
                lblStatus.Text = msg;
    
                await Task.Run(() =>
                {
                    xlApp = new Excel.Application();
                    xlWorkBook = xlApp.Workbooks.Open(ofd.FileName, Type.Missing, true);
    
                    var xlSheet = xlWorkBook.Sheets[1] as Excel.Worksheet;
                    var xlRange = xlSheet.UsedRange;
    
                    dt.Columns.AddRange((xlRange.Rows[xlRange.Row] as Excel.Range)
                    .Cells.Cast<Excel.Range>()
                    .Where(h => h.Value2 != null)
                    .Select(h => new DataColumn(h.Value2.ToString()
                    .Trim(), typeof(string))).ToArray());
    
                    foreach (var r in xlRange.Rows.Cast<Excel.Range>().Skip(1))
                        dt.Rows.Add(r.Cells.Cast<Excel.Range>()
                            .Take(dt.Columns.Count)
                            .Select(v => v.Value2 is null
                            ? string.Empty
                            : v.Value2.ToString().Trim()).ToArray());
                });
    
                (dataGridView1.DataSource as DataTable)?.Dispose();
                dataGridView1.DataSource = null;
                dataGridView1.DataSource = dt;
    
                msg = "Loading Completed";
            }
            catch (FileFormatException)
            {
                msg = "Unknown Excel file!";
            }
            catch (Exception ex)
            {
                msg = ex.Message;
            }
            finally
            {
                xlWorkBook?.Close(false);
                xlApp?.Quit();
    
                Marshal.FinalReleaseComObject(xlWorkBook);
                Marshal.FinalReleaseComObject(xlApp);
    
                xlWorkBook = null;
                xlApp = null;
    
                GC.Collect();
                GC.WaitForPendingFinalizers();
    
                lblStatus.Text = msg;
            }
        }
    }
    

    Note: You need to add reference to the mentioned library.

    Not fast especially with a big number of cells but it gets the desired output.

    0 讨论(0)
提交回复
热议问题