how to concatenate multiple excel sheets from the same file?

前端 未结 4 1384
鱼传尺愫
鱼传尺愫 2020-12-17 03:59

I have a big excel file that contains many different sheets. All the sheets have the same structure like:

Name
col1  col2  col3  col4
1     1     2     4
4           


        
相关标签:
4条回答
  • 2020-12-17 04:30

    Taking a note from this question:

    import pandas as pd
    
    file = pd.ExcelFile('file.xlsx')
    
    names = file.sheet_names  # see all sheet names
    
    df = pd.concat([file.parse(name) for name in names])
    

    Results:

    df
    Out[6]: 
       A  B
    0  1  3
    1  2  4
    0  5  6
    1  7  8
    

    Then you can run df.reset_index(), to, well, reset the index.

    Edit: pandas.ExcelFile.parse is, according to the pandas docs:

    Equivalent to read_excel(ExcelFile, ...) See the read_excel docstring for more info on accepted parameters

    0 讨论(0)
  • 2020-12-17 04:33

    First add parameter sheetname=None for dict of DataFrames and skiprows=1 for omit first row and then use concat for MultiIndex DataFrame.

    Last use reset_index for column from first level:

    df = pd.concat(pd.read_excel('multiple_sheets.xlsx', sheetname=None, skiprows=1))
    df = df.reset_index(level=1, drop=True).rename_axis('filenames').reset_index()
    
    0 讨论(0)
  • 2020-12-17 04:36

    Try this:

    dfs = pd.read_excel(filename, sheet_name=None, skiprows=1)
    

    this will return you a dictionary of DFs, which you can easily concatenate using pd.concat(dfs) or as @jezrael has already posted in his answer:

    df = pd.concat(pd.read_excel(filename, sheet_name=None, skiprows=1))
    

    sheet_name: None -> All sheets as a dictionary of DataFrames

    UPDATE:

    Is there a way to create a variable in the resulting dataframe that identifies the sheet name from which the data comes from?

    dfs = pd.read_excel(filename, sheet_name=None, skiprows=1)
    

    assuming we've got the following dict:

    In [76]: dfs
    Out[76]:
    {'d1':    col1  col2  col3  col4
     0     1     1     2     4
     1     4     3     2     1, 'd2':    col1  col2  col3  col4
     0     3     3     4     6
     1     6     5     4     3}
    

    Now we can add a new column:

    In [77]: pd.concat([df.assign(name=n) for n,df in dfs.items()])
    Out[77]:
       col1  col2  col3  col4 name
    0     1     1     2     4   d1
    1     4     3     2     1   d1
    0     3     3     4     6   d2
    1     6     5     4     3   d2
    
    0 讨论(0)
  • 2020-12-17 04:42
    file_save_location='myfolder'                                
    file_name='filename'
    
    location = ''myfolder1'
    os.chdir(location)
    files_xls = glob.glob("*.xls*")
    excel_names=[f for f in files_xls]
    sheets = pd.ExcelFile(files_xls[0]).sheet_names
    def combine_excel_to_dfs(excel_names, sheet_name):
        sheet_frames = [pd.read_excel(x, sheet_name=sheet_name) for x in excel_names]
        combined_df = pd.concat(sheet_frames).reset_index(drop=True)
        return combined_df
    
    i = 0
    
    while i < len(sheets):
        process = sheets[i]
        consolidated_file= combine_excel_to_dfs(excel_names, process)
        consolidated_file.to_csv(file_save_location+file_name+'.csv')
        i = i+1
    else:
        "we done on consolidation part"
    
    0 讨论(0)
提交回复
热议问题