Reading an excel with pandas basing on columns' colors

前端 未结 2 819
一整个雨季
一整个雨季 2020-12-17 03:55

I have an xlsx file, with columns with various coloring.

I want to read only the white columns of this excel in python using pandas, but I have no clues on hot to d

相关标签:
2条回答
  • 2020-12-17 04:19

    (Disclosure: I'm one of the authors of the library I'm going to suggest)

    With StyleFrame (that wraps pandas) you can read an excel file into a dataframe without loosing the style data.

    Consider the following sheet:

    And the following code:

    from styleframe import StyleFrame, utils
    # from StyleFrame import StyleFrame, utils (if using version < 3.X)
    
    sf = StyleFrame.read_excel('test.xlsx', read_style=True)
    print(sf)
    
    #          b  p                  y
    #     0  nan  3             1000.0
    #     1  3.0  4                2.0
    #     2  4.0  5  42902.72396767148
    
    sf = sf[[col for col in sf.columns
             if col.style.fill.fgColor.rgb in ('FFFFFFFF', utils.colors.white)]]
             # "white" can be represented as 'FFFFFFFF' or
             # '00FFFFFF' (which is what utils.colors.white is set to)
    print(sf)
    
    #          b
    #    0   nan
    #    1   3.0
    #    2   4.0
    
    0 讨论(0)
  • 2020-12-17 04:41

    This can not be done in pandas. You will need to use other library to read the xlsx file and determine what columns are white. I'd suggest using openpyxl library.

    Then your script will follow this steps:

    1. Open xlsx file
    2. Read and filter the data (you can access the cell color) and save the results
    3. Create pandas dataframe

    Edit: Switched xlrd to openpyxl as xlrd is no longer actively maintained

    0 讨论(0)
提交回复
热议问题