I am trying to replace a bunch of strings in an .xlsx sheet (~70k rows, 38 columns). I have a list of the strings to be searched and replaced in a file, formatted as below:-
For reading and writing xls with Python, use xlrd and xlwt, see http://www.python-excel.org/
A simple xlrd example:
from xlrd import open_workbook
wb = open_workbook('simple.xls')
for s in wb.sheets():
print 'Sheet:',s.name
for row in range(s.nrows):
values = []
for col in range(s.ncols):
print(s.cell(row,col).value)
and for replacing target text, use a dict
replace = {
'bird produk': 'bird product',
'pig': 'pork',
'ayam': 'chicken'
...
'kuda': 'horse'
}
Dict will give you O(1)
(most of the time, if keys don't collide) time complexity when checking membership using 'text' in replace
. there's no way to get better performance than that.
Since I don't know what your bunch of strings
look like, this answer may be inaccurate or incomplete.
Similar idea to @coder_A 's, but use a dictionary to do the "translation" for you, where the keys are the original words and the value for each key is what it gets translated to.
Make 2 arrays A[bird produk, pig, ayam, kuda] //words to be changed B[bird product, pork, chicken, horse] //result after changing the word
Now check each row of your excel and compare it with every element of A. If i matches then replace it with corresponding element of B.
for example // not actual code something like pseudocode
for (i=1 to no. of rows.)
{
for(j=1 to 200)
{
if(contents of row[i] == A[j])
then contents of row[i]=B[j] ;
break;
}
}
To make it fast you have to stop the current iteration as soon as the word is replaced and check the next row.
I would copy the contents of your text file into a new worksheet in the excel file and name that sheet "Lookup." Then use text to columns to get the data in the first two columns of this new sheet starting in the first row.
Paste the following code into a module in Excel and run it:
Sub Replacer()
Dim w1 As Worksheet
Dim w2 As Worksheet
'The sheet with the words from the text file:
Set w1 = ThisWorkbook.Sheets("Lookup")
'The sheet with all of the data:
Set w2 = ThisWorkbook.Sheets("Data")
For i = 1 To w1.Range("A1").CurrentRegion.Rows.Count
w2.Cells.Replace What:=w1.Cells(i, 1), Replacement:=w1.Cells(i, 2), LookAt:=xlPart, _
SearchOrder:=xlByRows, MatchCase:=False, SearchFormat:=False, _
ReplaceFormat:=False
Next i
End Sub