Reading the first n rows of a csv without parsing the whole file in Power Query

∥☆過路亽.° 提交于 2020-01-25 08:55:07

问题


I have these csv data files with my relevant data in the first five rows and a bunch of mal formmatted data below it. When I use filter rows it still reads in all of the data below which causes problems for me. This is problematic because I am reading in a whole folder's worth and some of these files have a different number of columns below the rows I'm interested in. Those columns or data below those first five rows I do not need, but power query throws errors when it is looking for the same number of columns as the last file. I would like it to just read in the first n rows which are uniform between all files in the folder. 

Is there a way to do this or bypass the error? Let me know if there is anything else I can provide to help my question be better understood. 

I have already tried filter rows, but that still reads the whole document and throws some errors.

This is similar to what I'm looking for, but it's not clear how I can edit this to achieve what I want.

Skip 6 rows before "reading" into powerquery

Ultimately I'm going to be reading in the first 5 rows of all documents in the file folder. This is much easier in Pandas, but I need an Excel solution for a coworker. The error I get is "unexpected number of columns." I have confirmed this with doing a subset of files that has the same number of columns for the lower extra data I don't need. I would like a solution robust enough to handle all of the files.


回答1:


When you load a CSV into the Query Editor, it will likely generate M code like this:

let
    Source = Csv.Document(File.Contents("C:\FilePath\FileName.csv"), [Delimiter=",", Columns=3, Encoding=1252, QuoteStyle=QuoteStyle.None]),
    #"Changed Type" = Table.TransformColumnTypes(Source, {{"Column1", type text}, {"Column2", type text}, {"Column3", type text}})
in
    #"Changed Type"

Delete the last step, #Changed Type, and change Columns=3 in the first step to the number you actually want instead of what it automatically detected.




回答2:


If you're loading from a folder, then you can write a query along these lines:

let
    Source = Folder.Files("C:\FilePath\FolderName"),
    #"Filtered Rows" = Table.SelectRows(Source, each ([Extension] = ".csv")),
    #"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"Content", "Name"}),
    #"Extracted First 5 Lines" = Table.TransformColumns(#"Removed Other Columns", {{"Content", each Table.FirstN(Csv.Document(_),5)}}),
    #"Expanded Content" = Table.ExpandTableColumn(#"Extracted First 5 Lines", "Content", {"Column1", "Column2", "Column3"})
in
    #"Expanded Content"

With the key part being this transformation: each Table.FirstN(Csv.Document(_),5)

When you create that last step, it will expand to as many columns as it sees. If you want to make it more dynamic, then try something like this instead of the #"Expanded Content" line:

MaxColumns = List.Max(List.Transform(#"Extracted First 5 Lines"[Content], each Table.ColumnCount(_))),
#"Expanded Content" = Table.ExpandTableColumn(#"Extracted First 5 Lines", "Content", List.Transform({1..MaxColumns}, each "Column" & Number.ToText(_)))

This makes sure you have enough columns for the widest table in the #"Extracted First 5 Lines" step.



来源:https://stackoverflow.com/questions/57777373/reading-the-first-n-rows-of-a-csv-without-parsing-the-whole-file-in-power-query

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!