How to use field name or column header in openpyxl?

后端 未结 4 1064
情深已故
情深已故 2021-02-15 14:42

See my code below. This code works very well, but I would like to do two things. One thing is I made if statement with or much shorter than actual for example. I have many colum

相关标签:
4条回答
  • 2021-02-15 15:11

    Since row returns a generator, you can easily extract headers in the first iteration, treat them as you need, and then continue to consume it. For instance:

    headers = [cell.value for cell in next(sheet.rows)]
    # find indexes of targeted columns
    cols = [headers.index(header) for header in 'HILM']
    
    conv = {'OldValue1': 1, 'OldValue2': 2}
    
    for row in sheet.rows:
        values = [cell.value for cell in row]
        for col in cols:
            values[col] = conv[values[col]] 
    
    0 讨论(0)
  • 2021-02-15 15:15

    You can access cells from the first row and and column using the sheet.cell(row=#, column = #) syntax. For example:

    for row in enumerate(sheet.iter_rows()):
        for j, cellObj in enumerate(row):
            header_cell = sheet.cell(row=1, column=j)
    
            if cellObj.column in ['H', 'I', 'L', 'M', 'AA', 'AB']:
                print(cellObj.value),
                if cellObj.value.upper() == 'OldValue1':
                    cellObj.value = 1
                    print(cellObj.value)
                elif cellObj.value.upper() == 'OldValue2':
                    cellObj.value = 2
                    print(cellObj.value)
    
    0 讨论(0)
  • 2021-02-15 15:24

    You have many ways to do this. some approach that i used:

    1. Brute force

    Assuming "sheet" and "workbook" are defined.

    header = [cell for cell in sheet['A1:XFD1'][0] if cell.value is not None and cell.value.strip() != ''] #you get all non-null columns
    target_values = ['NameOfField', 'NameOfField1', 'NameOfField2'] #filter list
    target_header = [cell.column for cell in header if cell.value in target_values] #get column index
    
    data = {'OldValue1': 1, 'OldValue2': 2}
    
    for row in sheet.iter_rows(max_row=sheet.max_row, max_col=sheet.max_column):
     for cell in row:
         if cell.column in target_header and cell.value in data :
             cell.value = data[cell.value]
    

    In this case, the brute force is in "sheet['A1:XFD1']". we have to check for all columns the first time. But you'll get all cells references for columns. After that, we create target_values (our columns names...) and we create a list with column index (target_header). Finally we iterated over sheet. We check if the cell's column is in the column index and check if the cell's value is in data, so we're able to change the value.

    Downside:if exists cell with random whitespace outside "data area". max_row and max_column will consider that cells (iterate over blank cells).

    2. Check for bundaries

    You can use your own max row and max column if the data has table form(no empty space between columns, a column with "id"-> not null, not whitespace).

    from openpyxl.utils import get_column_letter 
    
    def find_limit_sheet(direction):
        max_limit_value = 1
        while (direction(max_limit_value).value is not None) and (direction(max_limit_value).value.strip() != ''):
            max_limit_value = max_limit_value + 1
        return (max_limit_value - 1) if max_limit_value != 1 else 1
    
    
    max_qrow = find_limit_sheet(direction=lambda increment: sheet.cell(row=increment, column=1))
    max_qcolumn = find_limit_sheet(direction=lambda increment: sheet.cell(column=increment, row=1))
    
    header = [cell for cell in sheet[f'A1:{get_column_letter(max_qcolumn)}1']] #you get all non-null columns
    target_values = ['NameOfField', 'NameOfField1', 'NameOfField2'] #filter list
    target_header = [cell.column for cell in header[0] if cell.value in target_values] #get column names
    
    data = {'OldValue1': 1, 'OldValue2': 2}
    
    for row in sheet.iter_rows(max_row=max_qrow, max_col=max_qcolumn):
     for cell in row:
         if cell.column in target_header and cell.value in data :
             cell.value = data[cell.value]
    

    In this case we are inside "data area" only.

    3. Optional: Using Pandas

    If you need more complex operation on excel data(i have to read a lots of excel in my work :( as data source). I prefer convert to pandas dataframe-> make operation -> save result .

    In this case we use all the data.

    from openpyxl.utils import get_column_letter 
    import pandas as pd
    
    def find_limit_sheet(direction):
        max_limit_value = 1
        while (direction(max_limit_value).value is not None) and (direction(max_limit_value).value.strip() != ''):
            max_limit_value = max_limit_value + 1
        return (max_limit_value - 1) if max_limit_value != 1 else 1
    
    
    max_qrow = find_limit_sheet(direction=lambda increment: sheet.cell(row=increment, column=1))
    max_qcolumn = find_limit_sheet(direction=lambda increment: sheet.cell(column=increment, row=1))
    
    header = [cell.value for cell in sheet[f'A1:{get_column_letter(max_qcolumn)}1'][0]] #you get all non-null columns
    raw_data = []
    for row in sheet.iter_rows(max_row=max_qrow, max_col=max_qcolumn):
        row_data = [cell.value for cell in row]
        raw_data.append(dict(zip(header, row_data)))
    
    df = pandas.DataFrame(raw_data)
    df.columns = df.iloc[0]
    df = df[1:]
    

    You can also use a sub-set of columns using target_data for example 2.

    ...
    target_header = [cell.column for cell in header[0] if cell.value in target_values] #get column names
    ...
    raw_data = []
    for row in sheet.iter_rows(max_row=max_qrow, max_col=max_qcolumn):
        row_data = [cell.value for cell in row if cell.column in target_header]
        raw_data.append(dict(zip(header, row_data)))
    
    df = pd.DataFrame(raw_data)
    df.columns = df.iloc[0]
    df = df[1:]
    ...
    

    INFO

    • openpyxl: 2.6.2
    • pandas: 0.24.2
    • python: 3.7.3
    • Data Structures: List Comprehensions doc
    • lambda expr: lambda expression
    0 讨论(0)
  • 2021-02-15 15:35

    EDIT

    Assuming these are the header names you are looking for:

    colnames = ['Header1', 'Header2', 'Header3']
    

    Find the indices for these columns:

    col_indices = {n for n, cell in enumerate(sheet.rows[0]) if cell.value in colnames}
    

    Now iterate over the remain rows:

    for row in sheet.rows[1:]:
        for index, cell in enumerate(row):
             if index in col_indices:
                 if cell.value.upper() == 'OldValue1':
                      cell.value = 1
                      print(cell.value)
                 elif cell.value.upper() == 'OldValue2':
                     cell.value = 2
                     print(cell.value)
    

    Use a dictionary instead of a set to keep the column names around:

    col_indices = {n: cell.value for n, cell in enumerate(sheet.rows[0]) 
                   if cell.value in colnames}
    
    for row in sheet.rows[1:]:
        for index, cell in enumerate(row):
            if index in col_indices:
                print('col: {}, row: {}, content: {}'.format(
                       col_indices[index], index, cell.value))
                if cell.value.upper() == 'OldValue1':
                     cell.value = 1
                elif cell.value.upper() == 'OldValue2':
                     cell.value = 2
    

    Old answer

    This makes your if statement shorter:

    if cellObj.column in 'HILM':
        print(cellObj.value),
    

    For multi letter column coordinates you need to use a list:

    if cellObj.column in ['H', 'AA', 'AB', 'AD']:
        print(cellObj.value),
    
    0 讨论(0)
提交回复
热议问题