I have a 150MB one-sheet excel file that takes about 7 minutes to open on a very powerful machine using the following:
# using python
import xlrd
wb = xlrd.open_
The c# and ole solution still have some bottleneck.So i test it by c++ and ado.
_bstr_t connStr(makeConnStr(excelFile, header).c_str());
TESTHR(pRec.CreateInstance(__uuidof(Recordset)));
TESTHR(pRec->Open(sqlSelectSheet(connStr, sheetIndex).c_str(), connStr, adOpenStatic, adLockOptimistic, adCmdText));
while(!pRec->adoEOF)
{
for(long i = 0; i < pRec->Fields->GetCount(); ++i)
{
_variant_t v = pRec->Fields->GetItem(i)->Value;
if(v.vt == VT_R8)
num[i] = v.dblVal;
if(v.vt == VT_BSTR)
str[i] = v.bstrVal;
++cellCount;
}
pRec->MoveNext();
}
In i5-4460 and HDD machine,i find 500 thousands of cell in xls will take 1.5s.But same data in xlsx will take 2.829s.so it's possible for manipulating your data under 30s.
If you really need under 30s,use RAM Drive to reduce file IO.It will significantly improve your process. I cannot download your data to test it,so please tell me the result.