Find and replace strings in Excel (.xlsx) using Python

后端 未结 4 1371
没有蜡笔的小新
没有蜡笔的小新 2021-01-07 02:00

I am trying to replace a bunch of strings in an .xlsx sheet (~70k rows, 38 columns). I have a list of the strings to be searched and replaced in a file, formatted as below:-

相关标签:
4条回答
  • 2021-01-07 02:47

    For reading and writing xls with Python, use xlrd and xlwt, see http://www.python-excel.org/

    A simple xlrd example:

    from xlrd import open_workbook
    wb = open_workbook('simple.xls')
    
    for s in wb.sheets():
        print 'Sheet:',s.name
        for row in range(s.nrows):
            values = []
            for col in range(s.ncols):
                print(s.cell(row,col).value)
    

    and for replacing target text, use a dict

    replace = {
        'bird produk': 'bird product',
        'pig': 'pork',
        'ayam': 'chicken'
        ...
        'kuda': 'horse'
    }
    

    Dict will give you O(1)(most of the time, if keys don't collide) time complexity when checking membership using 'text' in replace. there's no way to get better performance than that.

    Since I don't know what your bunch of strings look like, this answer may be inaccurate or incomplete.

    0 讨论(0)
  • 2021-01-07 02:48

    Similar idea to @coder_A 's, but use a dictionary to do the "translation" for you, where the keys are the original words and the value for each key is what it gets translated to.

    0 讨论(0)
  • 2021-01-07 02:50

    Make 2 arrays A[bird produk, pig, ayam, kuda] //words to be changed B[bird product, pork, chicken, horse] //result after changing the word

    Now check each row of your excel and compare it with every element of A. If i matches then replace it with corresponding element of B.

    for example // not actual code something like pseudocode

    for (i=1 to no. of rows.)
    {
    for(j=1 to 200)
    {
    if(contents of row[i] == A[j])
    then contents of row[i]=B[j] ;
    break;
    }
    }
    

    To make it fast you have to stop the current iteration as soon as the word is replaced and check the next row.

    0 讨论(0)
  • 2021-01-07 02:52

    I would copy the contents of your text file into a new worksheet in the excel file and name that sheet "Lookup." Then use text to columns to get the data in the first two columns of this new sheet starting in the first row.

    Paste the following code into a module in Excel and run it:

    Sub Replacer()
        Dim w1 As Worksheet
        Dim w2 As Worksheet
    
        'The sheet with the words from the text file:
        Set w1 = ThisWorkbook.Sheets("Lookup")
        'The sheet with all of the data:
        Set w2 = ThisWorkbook.Sheets("Data")
    
        For i = 1 To w1.Range("A1").CurrentRegion.Rows.Count
            w2.Cells.Replace What:=w1.Cells(i, 1), Replacement:=w1.Cells(i, 2), LookAt:=xlPart, _
            SearchOrder:=xlByRows, MatchCase:=False, SearchFormat:=False, _
            ReplaceFormat:=False
        Next i
    
    End Sub
    
    0 讨论(0)
提交回复
热议问题