Unable to remove rows with specific cell value python openpyxl

北战南征 提交于 2021-01-28 08:26:47

问题


I'm having a weird issue where the logic and code tells me it should work but it does not.

My code is below

import shutil, sys
from distutils.version import StrictVersion
import openpyxl
from openpyxl import Workbook
from openpyxl import load_workbook

wb = load_workbook('testing.xlsx')
ws = wb.get_sheet_by_name('Sheet1')
x = ws.max_row
y = ws.max_column

for r in range(1,x+1):
        for j in range(1, y+1):
                d=ws.cell(row=r,column=j)
        if str(d.value).lower() == "false":
                ws.delete_rows(r)

wb.save("test_1.xlsx")

The excel is made out of 5 columns, A B C D E first Row have titles so can be ignored A2 has time, B2 has name C2 has username,D2 path, E2 contains value of either TRUE or FALSE

The point of my script is to look at all cells and if the value of FALSE is found it will remove that row. So for example row 10

01/01/1999 John Smith JohnS /path/ FALSE This should be removed as it contains FALSE or more specifically E10 has FALSE. The TRUE FALSE values only appear in column E so for the sake of speed we could specify that we are only interested in column E but any row. I have done that in other version.

To the problem The problem is that my testing excel has total of 25 rows and columns A B C D E as stated above but the script only removes 5 rows that had value FALSE. Also it seems the script removes the rows that contain FALSE at random so in my testing excel there are total of 10 rows with FALSE cell. the usernames in order will be t1, t2, t3, t4, t5, t6, t7, t8, t9, t10 but the script just now removed t1, t3, t5, t6, t7, t9 which looking at it now seems I have an issue with logic and its checking odd numbers

EDIT it seems that if I repeat the loop enough times it will remove all rows that contain FALSE

current code that's working

import shutil, sys
from distutils.version import StrictVersion
import openpyxl
from openpyxl import Workbook
from openpyxl import load_workbook

wb = load_workbook('testing.xlsx')
ws = wb.get_sheet_by_name('Sheet1')
x = ws.max_row
y = ws.max_column

for r in range(1,x+1):
        for j in range(1, y+1):
                d=ws.cell(row=r,column=j)
        if str(d.value).lower() == "false":
                ws.delete_rows(r)


for r in range(1,x+1):
        for j in range(1, y+1):
                d=ws.cell(row=r,column=j)
        if str(d.value).lower() == "false":
                ws.delete_rows(r)

for r in range(1,x+1):
        for j in range(1, y+1):
                d=ws.cell(row=r,column=j)
        if str(d.value).lower() == "false":
                ws.delete_rows(r)

for r in range(1,x+1):
        for j in range(1, y+1):
                d=ws.cell(row=r,column=j)
        if str(d.value).lower() == "false":
                ws.delete_rows(r)

wb.save("test_1.xlsx")

its not pretty so any tips will be appreciated


回答1:


I think it's the problem of the indentation, try this:

import shutil, sys
from distutils.version import StrictVersion
import openpyxl
from openpyxl import Workbook
from openpyxl import load_workbook

wb = load_workbook('testing.xlsx')
ws = wb.get_sheet_by_name('Sheet1')
x = ws.max_row
y = ws.max_column

for r in range(1,x+1):
    for j in range(1, y+1):
        d=ws.cell(row=x+1-r,column=j)
        if str(d.value).lower() == "false":
            ws.delete_rows(x+1-r)
            break

wb.save("test_1.xlsx")

and I change the row number from r to x+1-r, which means iterating the rows from the last to the first(so that when one row is deleted, the rest rows won't be affected), and It's necessary to break the inner loop(cause the looping row is deleted, you can't loop this row any more)




回答2:


You can try something like this:

from openpyxl import load_workbook
from openpyxl.workbook import Workbook

# open workbook and get active worksheet
wb = load_workbook('original.xlsx')
ws = wb.active

# extract headers from row 1
headers = [cell.value for row in ws.iter_rows(min_row=1, max_row=1) for cell in row]

# want to keep headers by default
new_rows = [headers]

# go through every row(>=2) except headers
for row in ws.iter_rows(min_row=2):
    values = [cell.value for cell in row]

    # create dictionary of row 
    row_dict = dict(zip(headers, values))

    # only append if 'enabled' is True
    if row_dict['enabled']:
        new_rows.append(values)

# create a new workbook and sheet to write to
new_wb = Workbook()
new_ws = new_wb.active

# iterate though rows and columns of nested list
for row, line in enumerate(new_rows, start=1):
    for column, cell in enumerate(line, start=1):

        # write new cell to output worksheet
        new_ws.cell(row=row, column=column).value = cell

# save output workbook
new_wb.save('output.xlsx')

Which gives a new output.xlsx file with all rows containing FALSE in the enabled column removed.

It first creates a dictionary for each row, and if the key enabled is set to True, keep that row. At the end it iterates through all the rows and writes them seperately back to the output file.



来源:https://stackoverflow.com/questions/51510669/unable-to-remove-rows-with-specific-cell-value-python-openpyxl

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!