How to open a huge excel file efficiently

前端 未结 11 719
佛祖请我去吃肉
佛祖请我去吃肉 2021-01-30 21:29

I have a 150MB one-sheet excel file that takes about 7 minutes to open on a very powerful machine using the following:

# using python
import xlrd
wb = xlrd.open_         


        
11条回答
  •  囚心锁ツ
    2021-01-30 22:10

    Most programming languages that work with Office products have some middle layer and this is usually where the bottleneck is, a good example is using PIA's/Interop or Open XML SDK.

    One way to get the data at a lower level (bypassing the middle layer) is using a Driver.

    150MB one-sheet excel file that takes about 7 minutes.

    The best I could do is a 130MB file in 135 seconds, roughly 3 times faster:

    Stopwatch sw = new Stopwatch();
    sw.Start();
    
    DataSet excelDataSet = new DataSet();
    
    string filePath = @"c:\temp\BigBook.xlsx";
    
    // For .XLSXs we use =Microsoft.ACE.OLEDB.12.0;, for .XLS we'd use Microsoft.Jet.OLEDB.4.0; with  "';Extended Properties=\"Excel 8.0;HDR=YES;\"";
    string connectionString = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source='" + filePath + "';Extended Properties=\"Excel 12.0;HDR=YES;\"";
    
    using (OleDbConnection conn = new OleDbConnection(connectionString))
    {
        conn.Open();
        OleDbDataAdapter objDA = new System.Data.OleDb.OleDbDataAdapter
        ("select * from [Sheet1$]", conn);
        objDA.Fill(excelDataSet);
        //dataGridView1.DataSource = excelDataSet.Tables[0];
    }
    sw.Stop();
    Debug.Print("Load XLSX tool: " + sw.ElapsedMilliseconds + " millisecs. Records = "  + excelDataSet.Tables[0].Rows.Count);
    

    Win 7x64, Intel i5, 2.3ghz, 8GB ram, SSD250GB.

    If I could recommend a hardware solution as well, try to resolve it with an SSD if you're using standard HDD's.

    Note: I cant download your Excel spreadsheet example as I'm behind a corporate firewall.

    PS. See MSDN - Fastest Way to import xlsx files with 200 MB of Data, the consensus being OleDB is the fastest.

    PS 2. Here's how you can do it with python: http://code.activestate.com/recipes/440661-read-tabular-data-from-excel-spreadsheets-the-fast/

提交回复
热议问题