I have the following code, and I am trying to write a data frame into an "existing" worksheet of an Excel file (referred here as test.xlsx). Sheet3 is the targeted
You can use openpyxl
as the engine when you are creating an instance of pd.ExcelWriter
.
import pandas as pd
import openpyxl
df1 = pd.DataFrame({'A':[1, 2, -3],'B':[1,2,6]})
book = openpyxl.load_workbook('examples/ex1.xlsx') #Already existing workbook
writer = pd.ExcelWriter('examples/ex1.xlsx', engine='openpyxl') #Using openpyxl
#Migrating the already existing worksheets to writer
writer.book = book
writer.sheets = {x.title: x for x in book.worksheets}
df1.to_excel(writer, sheet_name='sheet4')
writer.save()
Hope this works for you.
openpyxl has support for Pandas dataframes so you're best off using it directly. See http://openpyxl.readthedocs.io/en/latest/pandas.html for more details.
Here is a helper function:
def append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None,
truncate_sheet=False,
**to_excel_kwargs):
"""
Append a DataFrame [df] to existing Excel file [filename]
into [sheet_name] Sheet.
If [filename] doesn't exist, then this function will create it.
Parameters:
filename : File path or existing ExcelWriter
(Example: '/path/to/file.xlsx')
df : dataframe to save to workbook
sheet_name : Name of sheet which will contain DataFrame.
(default: 'Sheet1')
startrow : upper left cell row to dump data frame.
Per default (startrow=None) calculate the last row
in the existing DF and write to the next row...
truncate_sheet : truncate (remove and recreate) [sheet_name]
before writing DataFrame to Excel file
to_excel_kwargs : arguments which will be passed to `DataFrame.to_excel()`
[can be dictionary]
Returns: None
"""
from openpyxl import load_workbook
# ignore [engine] parameter if it was passed
if 'engine' in to_excel_kwargs:
to_excel_kwargs.pop('engine')
writer = pd.ExcelWriter(filename, engine='openpyxl')
try:
# try to open an existing workbook
writer.book = load_workbook(filename)
# get the last row in the existing Excel sheet
# if it was not specified explicitly
if startrow is None and sheet_name in writer.book.sheetnames:
startrow = writer.book[sheet_name].max_row
# truncate sheet
if truncate_sheet and sheet_name in writer.book.sheetnames:
# index of [sheet_name] sheet
idx = writer.book.sheetnames.index(sheet_name)
# remove [sheet_name]
writer.book.remove(writer.book.worksheets[idx])
# create an empty sheet [sheet_name] using old index
writer.book.create_sheet(sheet_name, idx)
# copy existing sheets
writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
except FileNotFoundError:
# file does not exist yet, we will create it
pass
if startrow is None:
startrow = 0
# write out the new sheet
df.to_excel(writer, sheet_name, startrow=startrow, **to_excel_kwargs)
# save the workbook
writer.save()
Usage:
append_df_to_excel('test.xlsx', df, sheet_name="Sheet3", startcol=0, startrow=20)
Some details:
**to_excel_kwargs
- used in order to pass additional named parameters to df.to_excel()
like i did in the example above - parameter startcol
is unknown to append_df_to_excel()
so it will be treated as a part of **to_excel_kwargs
parameter (dictionary).
writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
is used in order to copy existing sheets to writer
openpyxl object. I can't explain why it's not done automatically when reading writer = pd.ExcelWriter(filename, engine='openpyxl')
- you should ask authors of openpyxl
module about that...