问题
I'm having a weird issue where the logic and code tells me it should work but it does not.
My code is below
import shutil, sys
from distutils.version import StrictVersion
import openpyxl
from openpyxl import Workbook
from openpyxl import load_workbook
wb = load_workbook('testing.xlsx')
ws = wb.get_sheet_by_name('Sheet1')
x = ws.max_row
y = ws.max_column
for r in range(1,x+1):
for j in range(1, y+1):
d=ws.cell(row=r,column=j)
if str(d.value).lower() == "false":
ws.delete_rows(r)
wb.save("test_1.xlsx")
The excel is made out of 5 columns, A B C D E first Row have titles so can be ignored A2 has time, B2 has name C2 has username,D2 path, E2 contains value of either TRUE or FALSE
The point of my script is to look at all cells and if the value of FALSE is found it will remove that row. So for example row 10
01/01/1999 John Smith JohnS /path/ FALSE This should be removed as it contains FALSE or more specifically E10 has FALSE. The TRUE FALSE values only appear in column E so for the sake of speed we could specify that we are only interested in column E but any row. I have done that in other version.
To the problem The problem is that my testing excel has total of 25 rows and columns A B C D E as stated above but the script only removes 5 rows that had value FALSE. Also it seems the script removes the rows that contain FALSE at random so in my testing excel there are total of 10 rows with FALSE cell. the usernames in order will be t1, t2, t3, t4, t5, t6, t7, t8, t9, t10 but the script just now removed t1, t3, t5, t6, t7, t9 which looking at it now seems I have an issue with logic and its checking odd numbers
EDIT it seems that if I repeat the loop enough times it will remove all rows that contain FALSE
current code that's working
import shutil, sys
from distutils.version import StrictVersion
import openpyxl
from openpyxl import Workbook
from openpyxl import load_workbook
wb = load_workbook('testing.xlsx')
ws = wb.get_sheet_by_name('Sheet1')
x = ws.max_row
y = ws.max_column
for r in range(1,x+1):
for j in range(1, y+1):
d=ws.cell(row=r,column=j)
if str(d.value).lower() == "false":
ws.delete_rows(r)
for r in range(1,x+1):
for j in range(1, y+1):
d=ws.cell(row=r,column=j)
if str(d.value).lower() == "false":
ws.delete_rows(r)
for r in range(1,x+1):
for j in range(1, y+1):
d=ws.cell(row=r,column=j)
if str(d.value).lower() == "false":
ws.delete_rows(r)
for r in range(1,x+1):
for j in range(1, y+1):
d=ws.cell(row=r,column=j)
if str(d.value).lower() == "false":
ws.delete_rows(r)
wb.save("test_1.xlsx")
its not pretty so any tips will be appreciated
回答1:
I think it's the problem of the indentation, try this:
import shutil, sys
from distutils.version import StrictVersion
import openpyxl
from openpyxl import Workbook
from openpyxl import load_workbook
wb = load_workbook('testing.xlsx')
ws = wb.get_sheet_by_name('Sheet1')
x = ws.max_row
y = ws.max_column
for r in range(1,x+1):
for j in range(1, y+1):
d=ws.cell(row=x+1-r,column=j)
if str(d.value).lower() == "false":
ws.delete_rows(x+1-r)
break
wb.save("test_1.xlsx")
and I change the row number from r
to x+1-r
, which means iterating the rows from the last to the first(so that when one row is deleted, the rest rows won't be affected), and It's necessary to break the inner loop(cause the looping row is deleted, you can't loop this row any more)
回答2:
You can try something like this:
from openpyxl import load_workbook
from openpyxl.workbook import Workbook
# open workbook and get active worksheet
wb = load_workbook('original.xlsx')
ws = wb.active
# extract headers from row 1
headers = [cell.value for row in ws.iter_rows(min_row=1, max_row=1) for cell in row]
# want to keep headers by default
new_rows = [headers]
# go through every row(>=2) except headers
for row in ws.iter_rows(min_row=2):
values = [cell.value for cell in row]
# create dictionary of row
row_dict = dict(zip(headers, values))
# only append if 'enabled' is True
if row_dict['enabled']:
new_rows.append(values)
# create a new workbook and sheet to write to
new_wb = Workbook()
new_ws = new_wb.active
# iterate though rows and columns of nested list
for row, line in enumerate(new_rows, start=1):
for column, cell in enumerate(line, start=1):
# write new cell to output worksheet
new_ws.cell(row=row, column=column).value = cell
# save output workbook
new_wb.save('output.xlsx')
Which gives a new output.xlsx file with all rows containing FALSE
in the enabled
column removed.
It first creates a dictionary for each row, and if the key enabled
is set to True
, keep that row. At the end it iterates through all the rows and writes them seperately back to the output file.
来源:https://stackoverflow.com/questions/51510669/unable-to-remove-rows-with-specific-cell-value-python-openpyxl