问题
I have following data
start stop status
+-----------+-----------+-----------+
| 09:01:10 | 09:01:40 | active |
| 09:02:30 | 09:04:50 | active |
| 09:10:01 | 09:11:50 | active |
+-----------+-----------+-----------+
I want to fill in the gaps with "passive"
start stop status
+-----------+-----------+-----------+
| 09:01:10 | 09:01:40 | active |
| 09:01:40 | 09:02:30 | passive |
| 09:02:30 | 09:04:50 | active |
| 09:04:50 | 09:10:01 | passive |
| 09:10:01 | 09:11:50 | active |
+-----------+-----------+-----------+
How can I do this in M Query language?
回答1:
I think I may have a better performing solution.
From your source table (assuming it's sorted), add an index column starting from 0
and an index column starting from 1
and then merge the table with itself doing a left outer join on the index columns and expand the start
column.
Remove columns except for stop
, status
, and start.1
and filter out nulls.
Rename columns to start
, status
, and stop
and replace "active"
with "passive"
.
Finally, append this table to your original table.
let
Source = Table.RenameColumns(#"Removed Columns",{{"Column1.2", "start"}, {"Column1.3", "stop"}, {"Column1.4", "status"}}),
Add1Index = Table.AddIndexColumn(Source, "Index", 1, 1),
Add0Index = Table.AddIndexColumn(Add1Index, "Index.1", 0, 1),
SelfMerge = Table.NestedJoin(Add0Index,{"Index"},Add0Index,{"Index.1"},"Added Index1",JoinKind.LeftOuter),
ExpandStart1 = Table.ExpandTableColumn(SelfMerge, "Added Index1", {"start"}, {"start.1"}),
RemoveCols = Table.RemoveColumns(ExpandStart1,{"start", "Index", "Index.1"}),
FilterNulls = Table.SelectRows(RemoveCols, each ([start.1] <> null)),
RenameCols = Table.RenameColumns(FilterNulls,{{"stop", "start"}, {"start.1", "stop"}}),
ActiveToPassive = Table.ReplaceValue(RenameCols,"active","passive",Replacer.ReplaceText,{"status"}),
AppendQuery = Table.Combine({Source, ActiveToPassive}),
#"Sorted Rows" = Table.Sort(AppendQuery,{{"start", Order.Ascending}})
in
#"Sorted Rows"
This should be O(n) complexity with similar logic to @chillin, but I think should be faster than using a custom function since it will be using a built-in merge which is likely to be highly optimized.
回答2:
You could try something like the below (my first two steps someTable
and changedTypes
are just to re-create your sample data on my end):
let
someTable = Table.FromColumns({{"09:01:10", "09:02:30", "09:10:01"}, {"09:01:40", "09:04:50", "09:11:50"}, {"active", "active", "active"}}, {"start","stop","status"}),
changedTypes = Table.TransformColumnTypes(someTable, {{"start", type duration}, {"stop", type duration}, {"status", type text}}),
listOfRecords = Table.ToRecords(changedTypes),
transformList = List.Accumulate(List.Skip(List.Positions(listOfRecords)), {listOfRecords{0}}, (listState, currentIndex) =>
let
previousRecord = listOfRecords{currentIndex-1},
currentRecord = listOfRecords{currentIndex},
thereIsAGap = currentRecord[start] <> previousRecord[stop],
recordsToAdd = if thereIsAGap then {[start=previousRecord[stop], stop=currentRecord[start], status="passive"], currentRecord} else {currentRecord},
append = listState & recordsToAdd
in
append
),
backToTable = Table.FromRecords(transformList, type table [start=duration, stop=duration, status=text])
in
backToTable
This is what I start off with (at the changedTypes
step):
This is what I end up with:
To integrate with your existing M
code, you'll probably need to:
- remove
someTable
andchangedTypes
from my code (and replace with your existing query) - change
changedTypes
in thelistOfRecords
step to whatever your last step is called (otherwise you'll get an error if you don't have achangedTypes
expression in your code).
Edit:
Further to my answer, what I would suggest is:
Try changing this line in the code above:
listOfRecords = Table.ToRecords(changedTypes),
to
listOfRecords = List.Buffer(Table.ToRecords(changedTypes)),
I found that storing the list in memory reduced my refresh time significantly (maybe ~90% if quantified). I imagine there are limits and drawbacks (e.g. if the list can't fit), but might be okay for your use case.
Do you experience similar behaviour? Also, my basic graph indicates non-linear complexity of the code overall unfortunately.
Final note: I found that generating and processing 100k rows resulted in a stack overflow whilst refreshing the query (this might have been due to the generation of input rows and may not the insertion of new rows, don't know). So clearly, this approach has limits.
回答3:
I would approach this as follows:
- Duplicate the first table.
- Replace "active" with "passive".
- Remove the
start
column. - Rename
stop
tostart
. - Create a new
stop
column by looking up the earlieststart
time from your original table that occurs after the currentstop
time. - Filter out nulls in this new column.
- Append this table to the original table.
The M code will look something like this:
let
Source = <...your starting table...>
PassiveStatus = Table.ReplaceValue(Source,"active","passive",Replacer.ReplaceText,{"status"}),
RemoveStart = Table.RemoveColumns(PassiveStatus,{"start"}),
RenameStart = Table.RenameColumns(RemoveStart,{{"stop", "start"}}),
AddStop = Table.AddColumn(RenameStart, "stop", (C) => List.Min(List.Select(Source[start], each _ > C[start])), type time),
RemoveNulls = Table.SelectRows(AddStop, each ([stop] <> null)),
CombineTables = Table.Combine({Source, RemoveNulls}),
#"Sorted Rows" = Table.Sort(CombineTables,{{"start", Order.Ascending}})
in
#"Sorted Rows"
The only tricky bit above is the custom column part where I define the new column like this:
(C) => List.Min(List.Select(Source[start], each _ > C[start]))
This takes each item in the column/list Source[start]
and compares it to the time in the current row. It selects only the ones that occur after the time in the current row and then take the min over that list to find the earliest one.
来源:https://stackoverflow.com/questions/54313405/fill-time-gaps-with-power-query