How can I combine the first two rows in 52 files as a header row and combine all the files in a single one in Power Query?

南笙酒味 提交于 2021-01-29 19:05:27

问题


I have 52 files with >50 columns. The first two rows in each of them are header rows, so they need to be combined into a single row and I want to combine them all into a single file.

There are some files with a different number of columns or different column names, so if PowerQuery identifies those, I can deal with them seperately.

The only way I can think of is opening each file, running a macro to combine the first two rows in all of the files, but is there a more efficient way to do that in Power query on all the files?

Thanks!


回答1:


If the file has the same structure and you need to combine them, you can use the 'Load from Folder' option. You go through the load from folder options as normal and then select 'Combine & Transform Data'.

In the query editor, look for the 'Helper Queries'.

This query uses a selected example of the file, to determine metadata for the rest of the file loading. If you look at the 'Transform Sample File' you can add your in steps to combine the rows. It will apply any steps here to every file in the folder when loading. You can then 'Use first row as headers' to promote you combined rows to the header. You may need to remove the automatically applied promoted headers step if Power Query has done that already.

Once that is done, it will combine the files into one table in Power Query. It is best to do this on a sample set of files to see it if works, then add in the rest of the files. For those files that are of different structures it can get a bit messy, so it would be best to create different folders for them and load them that way.




回答2:


let Source = Folder.Files("C:\directory\subdirectory"),
#"Filtered Rows" = Table.SelectRows(Source, each ([Extension] = ".xlsx")),

// get all data from all XLS files
#"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"Name", "Content"}),
#"Added Custom" = Table.AddColumn(#"Removed Other Columns", "GetFileData", each Excel.Workbook([Content],true)),
#"Expanded GetFileData" = Table.ExpandTableColumn(#"Added Custom", "GetFileData", {"Data", "Hidden", "Item", "Kind", "Name"}, {"Data", "Hidden", "Item", "Kind", "Sheet"}),
// filter for specific sheet
#"Filtered Rows1" = Table.SelectRows(#"Expanded GetFileData", each ([Sheet] = "Sheet1")),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows1",{"Content", "Hidden", "Item", "Kind"}),

//demote, then combine first two rows of each column, then recombine with original data
#"Added Custom1" = Table.AddColumn(#"Removed Columns","Top", each Table.TransformColumnTypes(Table.Transpose(Table.FirstN(Table.DemoteHeaders([Data]),2)),{{"Column1", type text}, {"Column2", type text}})),
#"Added Custom2" = Table.AddColumn(#"Added Custom1" ,"Bottom", each Table.Transpose(Table.CombineColumns([Top] ,{"Column1", "Column2"},Combiner.CombineTextByDelimiter(" ", QuoteStyle.None),"Merged"))),
#"Added Custom3" = Table.AddColumn(#"Added Custom2" ,"Data3", each Table.PromoteHeaders( [Bottom]  & Table.Skip(Table.DemoteHeaders([Data]),2))),

// expand all columns and remove extras
List = List.Union(List.Transform(#"Added Custom3"[Data3], each Table.ColumnNames(_))),
#"Expanded Data" = Table.ExpandTableColumn(#"Added Custom3", "Data3", List,List),
#"Removed Columns1" = Table.RemoveColumns(#"Expanded Data",{"Data", "Top", "Bottom"})
in #"Removed Columns1"

This is how I'd do it

  1. Opens all the xlsx files in the directory (Change extension and path as needed)

  2. Reads in all columns on all tabs in each file

  3. In each data table, combines first two rows of each column then combines with the data under that

  4. Then expands all tables



来源:https://stackoverflow.com/questions/63453719/how-can-i-combine-the-first-two-rows-in-52-files-as-a-header-row-and-combine-all

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!