fill time gaps with power query

試著忘記壹切 提交于 2020-01-11 11:59:40

问题


I have following data

   start        stop       status
+-----------+-----------+-----------+
| 09:01:10  | 09:01:40  |  active   |
| 09:02:30  | 09:04:50  |  active   |
| 09:10:01  | 09:11:50  |  active   |
+-----------+-----------+-----------+

I want to fill in the gaps with "passive"

   start        stop       status
+-----------+-----------+-----------+
| 09:01:10  | 09:01:40  |  active   |
| 09:01:40  | 09:02:30  |  passive  |
| 09:02:30  | 09:04:50  |  active   |
| 09:04:50  | 09:10:01  |  passive  |
| 09:10:01  | 09:11:50  |  active   |
+-----------+-----------+-----------+

How can I do this in M Query language?


回答1:


I think I may have a better performing solution.

From your source table (assuming it's sorted), add an index column starting from 0 and an index column starting from 1 and then merge the table with itself doing a left outer join on the index columns and expand the start column.

Remove columns except for stop, status, and start.1 and filter out nulls.

Rename columns to start, status, and stop and replace "active" with "passive".

Finally, append this table to your original table.

let
    Source = Table.RenameColumns(#"Removed Columns",{{"Column1.2", "start"}, {"Column1.3", "stop"}, {"Column1.4", "status"}}),
    Add1Index = Table.AddIndexColumn(Source, "Index", 1, 1),
    Add0Index = Table.AddIndexColumn(Add1Index, "Index.1", 0, 1),
    SelfMerge = Table.NestedJoin(Add0Index,{"Index"},Add0Index,{"Index.1"},"Added Index1",JoinKind.LeftOuter),
    ExpandStart1 = Table.ExpandTableColumn(SelfMerge, "Added Index1", {"start"}, {"start.1"}),
    RemoveCols = Table.RemoveColumns(ExpandStart1,{"start", "Index", "Index.1"}),
    FilterNulls = Table.SelectRows(RemoveCols, each ([start.1] <> null)),
    RenameCols = Table.RenameColumns(FilterNulls,{{"stop", "start"}, {"start.1", "stop"}}),
    ActiveToPassive = Table.ReplaceValue(RenameCols,"active","passive",Replacer.ReplaceText,{"status"}),
    AppendQuery = Table.Combine({Source, ActiveToPassive}),
    #"Sorted Rows" = Table.Sort(AppendQuery,{{"start", Order.Ascending}})
in
    #"Sorted Rows"

This should be O(n) complexity with similar logic to @chillin, but I think should be faster than using a custom function since it will be using a built-in merge which is likely to be highly optimized.




回答2:


You could try something like the below (my first two steps someTable and changedTypes are just to re-create your sample data on my end):

let
    someTable = Table.FromColumns({{"09:01:10", "09:02:30", "09:10:01"}, {"09:01:40", "09:04:50", "09:11:50"}, {"active", "active", "active"}}, {"start","stop","status"}),
    changedTypes = Table.TransformColumnTypes(someTable, {{"start", type duration}, {"stop", type duration}, {"status", type text}}),
    listOfRecords = Table.ToRecords(changedTypes),
    transformList = List.Accumulate(List.Skip(List.Positions(listOfRecords)), {listOfRecords{0}}, (listState, currentIndex) =>
        let
            previousRecord = listOfRecords{currentIndex-1},
            currentRecord = listOfRecords{currentIndex},
            thereIsAGap = currentRecord[start] <> previousRecord[stop],
            recordsToAdd = if thereIsAGap then {[start=previousRecord[stop], stop=currentRecord[start], status="passive"], currentRecord} else {currentRecord},
            append = listState & recordsToAdd
        in
            append
    ),
    backToTable = Table.FromRecords(transformList, type table [start=duration, stop=duration, status=text])
in
    backToTable

This is what I start off with (at the changedTypes step):

This is what I end up with:

To integrate with your existing M code, you'll probably need to:

  • remove someTable and changedTypes from my code (and replace with your existing query)
  • change changedTypes in the listOfRecords step to whatever your last step is called (otherwise you'll get an error if you don't have a changedTypes expression in your code).

Edit:

Further to my answer, what I would suggest is:

Try changing this line in the code above:

listOfRecords = Table.ToRecords(changedTypes),

to

listOfRecords = List.Buffer(Table.ToRecords(changedTypes)),

I found that storing the list in memory reduced my refresh time significantly (maybe ~90% if quantified). I imagine there are limits and drawbacks (e.g. if the list can't fit), but might be okay for your use case.

Do you experience similar behaviour? Also, my basic graph indicates non-linear complexity of the code overall unfortunately.

Final note: I found that generating and processing 100k rows resulted in a stack overflow whilst refreshing the query (this might have been due to the generation of input rows and may not the insertion of new rows, don't know). So clearly, this approach has limits.




回答3:


I would approach this as follows:

  1. Duplicate the first table.
  2. Replace "active" with "passive".
  3. Remove the start column.
  4. Rename stop to start.
  5. Create a new stop column by looking up the earliest start time from your original table that occurs after the current stop time.
  6. Filter out nulls in this new column.
  7. Append this table to the original table.

The M code will look something like this:

let
    Source = <...your starting table...>
    PassiveStatus = Table.ReplaceValue(Source,"active","passive",Replacer.ReplaceText,{"status"}),
    RemoveStart = Table.RemoveColumns(PassiveStatus,{"start"}),
    RenameStart = Table.RenameColumns(RemoveStart,{{"stop", "start"}}),
    AddStop = Table.AddColumn(RenameStart, "stop", (C) => List.Min(List.Select(Source[start], each _ > C[start])), type time),
    RemoveNulls = Table.SelectRows(AddStop, each ([stop] <> null)),
    CombineTables = Table.Combine({Source, RemoveNulls}),
    #"Sorted Rows" = Table.Sort(CombineTables,{{"start", Order.Ascending}})
in
    #"Sorted Rows"

The only tricky bit above is the custom column part where I define the new column like this:

(C) => List.Min(List.Select(Source[start], each _ > C[start]))

This takes each item in the column/list Source[start] and compares it to the time in the current row. It selects only the ones that occur after the time in the current row and then take the min over that list to find the earliest one.



来源:https://stackoverflow.com/questions/54313405/fill-time-gaps-with-power-query

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!