I have an xlsx file, with columns with various coloring.
I want to read only the white columns of this excel in python using pandas, but I have no clues on hot to d
(Disclosure: I'm one of the authors of the library I'm going to suggest)
With StyleFrame (that wraps pandas) you can read an excel file into a dataframe without loosing the style data.
Consider the following sheet:
And the following code:
from styleframe import StyleFrame, utils
# from StyleFrame import StyleFrame, utils (if using version < 3.X)
sf = StyleFrame.read_excel('test.xlsx', read_style=True)
print(sf)
# b p y
# 0 nan 3 1000.0
# 1 3.0 4 2.0
# 2 4.0 5 42902.72396767148
sf = sf[[col for col in sf.columns
if col.style.fill.fgColor.rgb in ('FFFFFFFF', utils.colors.white)]]
# "white" can be represented as 'FFFFFFFF' or
# '00FFFFFF' (which is what utils.colors.white is set to)
print(sf)
# b
# 0 nan
# 1 3.0
# 2 4.0
This can not be done in pandas. You will need to use other library to read the xlsx file and determine what columns are white. I'd suggest using openpyxl
library.
Then your script will follow this steps:
Edit: Switched xlrd
to openpyxl
as xlrd
is no longer actively maintained