powerquery split column by variable field lengths

ぃ、小莉子 提交于 2020-01-13 13:09:52

问题


In PowerQuery I need to import a fixed width txt file (each line is the concatenation of a number of fields, each field has a fixed specific length). When I import it I get a table with one single column that contains the txt lines, e.g. in the following format:

AAAABBCCCCCDDD

I want to add more columns in this way:

Column1: AAAA
Column2: BB
Column3: CCCCC
Column4: DDD

In other words the fields composing the source column are of known length, but this length is not the same for all fields (in the example above the lengths are: 4,2,5,3).

I'd like to use the "Split Column">"By number of character" utility but I can only insert one single length at a time, and to get the desired output I'd have to repeat the process 3 times, adding one column each time and using the "Once, as far left as possible" option for the "Split Column">"By number of character" utility.

My real life case has many different line types (files) to import and convert, each with more then 20 fields, so a less manual approach is needed; I'd like to somehow specify the record structure (the length of each field) and get the lines split automagically :)

There would probably be the need for some M code, which I know nothing about: can anybody point me to the right direction?

Thanks!


回答1:


Create a query with the formula below. Let's call this query SplitText:

let
    SplitText = (text, lengths) => 
    let
        LengthsCount = List.Count(lengths),
        // Keep track of the index in the lengths list and the position in the text to take the next characters from. Use this information to get the next segment and put it into a list.
        Split = List.Generate(() => {0, 0}, each _{0} < LengthsCount, each {_{0} + 1, _{1} + lengths{_{0}}}, each Text.Range(text, _{1}, lengths{_{0}}))
    in
        Split,
    // Convert the list to a record to 
    ListToRecord = (text, lengths) => 
    let
        List = SplitText(text, lengths),
        Record = Record.FromList(List, List.Transform({1 .. List.Count(List)}, each Number.ToText(_)))
    in
        Record
in
    ListToRecord

Then, in your table, add a custom column that uses this formula:

each SplitText([Column1], {4, 2, 5, 3})

The first argument is the text to split, and the second argument is a list of lengths to split by.

Finally, expand the column to get the split text values into your table. You may want to rename the columns since they will be named 1, 2, etc.



来源:https://stackoverflow.com/questions/32247086/powerquery-split-column-by-variable-field-lengths

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!