问题
I'm trying to produce a Kusto query to measure the "own" duration of the requests (subtracting out durations of dependencies). However, I can't really figure out how to work this out through a pure Kusto query.
To better understand what would would expected, below a sample case:
High level view (where R is the request and Dx the dependencies)
R =============================== (31ms)
D1 ******* (7ms)
D2 ******** (8ms)
D3 ****** (6ms)
D4 ** (2ms)
D5 **** (4ms)
Proj ==*************======******====
D1
overlapsD2
during 2msD5
andD4
shouldn't be taken into account as completely overlapped by other dependenciesProj
being a projection of a potential intermediate step where only meaningful dependencies segments are shown
Given the following testbed dataset
let reqs = datatable (timestamp: datetime, id:string, duration: real)
[
datetime("2020-12-15T08:00:00.000Z"), "r1", 31 // R
];
let deps = datatable (timestamp: datetime, operation_ParentId:string, duration: real)
[
datetime("2020-12-15T08:00:00.002Z"), "r1", 7, // D1
datetime("2020-12-15T08:00:00.007Z"), "r1", 8, // D2
datetime("2020-12-15T08:00:00.021Z"), "r1", 6, // D3
datetime("2020-12-15T08:00:00.023Z"), "r1", 2, // D4
datetime("2020-12-15T08:00:00.006Z"), "r1", 4, // D5
];
In this particular case, the Kusto query, joining the two data tables, should be able to retrieve 12
(duration of the request, removing all dependencies), ie.
Expected total duration = 31 - (7 + 8 - 2) - (6) = 12
Any help to move this forward would be greatly appreciated <3
回答1:
I succeeded to solve that using that using row_window_session(). This is a Window function. You can read more about it at Window functions overview.
The solution is:
let reqs = datatable (timestamp: datetime, operation_ParentId:string, duration: real)
[
datetime("2020-12-15T08:00:00.000Z"), "r1", 31 // R
];
let deps = datatable (timestamp: datetime, operation_ParentId:string, duration: real)
[
datetime("2020-12-15T08:00:00.002Z"), "r1", 7, // D1
datetime("2020-12-15T08:00:00.007Z"), "r1", 8, // D2
datetime("2020-12-15T08:00:00.021Z"), "r1", 6, // D3
datetime("2020-12-15T08:00:00.006Z"), "r1", 4, // D5
datetime("2020-12-15T08:00:00.023Z"), "r1", 2, // D4
];
deps
| extend endTime = timestamp + totimespan(duration * 10000)
| sort by timestamp asc
| serialize | extend SessionStarted = row_window_session(timestamp, 1h, 1h, timestamp > prev(endTime))
| summarize max(endTime) by operation_ParentId, SessionStarted
| extend diff = max_endTime - SessionStarted
| summarize todouble(sum(diff)) by operation_ParentId
| join reqs on operation_ParentId
| extend diff = duration - sum_diff / 10000
| project diff
The idea here is to sort the entries by the open time, and as long as the next previous end time is later than the current start time, we don't open a new session. Let's explain each line of this query to see how this is being done:
- Calculate the
endTime
based on the duration. To normalize the data I'll multiply by 10000 the duration:| extend endTime = timestamp + totimespan(duration * 10000)
- Sort by start time:
| sort by timestamp asc
- This is the key of this solution. It is calculated on the
timestamp
column. The next two parameters are limits when to start new buckets. Since we don't want to seal a bucket based on time that have passed, I provided 1 hour which will not hit with this input. The forth argument helps us to create a new session based on the data. As long as there are more rows that will result intimestamp > prev(endTime)
they will have the same start time.| serialize | extend SessionStarted = row_window_session(timestamp, 1h, 1h, timestamp > prev(endTime))
- Now we have multiple rows per session start. So we want to keep only the latest time per session. We also keep
operation_ParentId
to later on join on that key:| summarize max(endTime) by operation_ParentId, SessionStarted
- Calculate the time of each session:
| extend diff = max_endTime - SessionStarted
- Sum up all session times:
| summarize todouble(sum(diff)) by operation_ParentId
- Join on
req
to get the total starting time:| join reqs on operation_ParentId
- Calculate the diff between the total time and the session times. Unnormalize the data:
| extend diff = duration - sum_diff / 10000
- Project the final result:
| project diff
You can find this query running at Kusto Samples open database.
Having said that, please note that this is a linear operation. Meaning that if there are 2 following segments, that should be under the same segment, but they do not intersect, it will fail. For example, adding the following into deps
:
datetime("2020-12-15T08:00:00.026Z"), "r1", 1, // D6
which should not add anything to the calculation, cause it to misbehave. This is because d4
is the previous point, and it has no point of contact with d6
, although d3
covers them both.
To solve that, you need to repeat the same logic of steps 3-5. Unfortunately Kusto does not have recursions, therefore you cannot solve this for any kind of input. But assuming there are no really depth such cases that breaks this logic, I think it is good enough.
回答2:
Take a look at the query below to see if it can meet your requirement:
let reqs = datatable (timestamp: datetime, id:string, duration: real, key1:string)
[
datetime("2020-12-15T08:00:00.000Z"), "r1", 31 , "k1" // R
];
let deps = datatable (timestamp: datetime, operation_ParentId:string, duration: real,name:string)
[
datetime("2020-12-15T08:00:00.002Z"), "r1", 7, "D1",
datetime("2020-12-15T08:00:00.007Z"), "r1", 8, "D2",
datetime("2020-12-15T08:00:00.021Z"), "r1", 6, "D3",
datetime("2020-12-15T08:00:00.023Z"), "r1", 2, "D4",
datetime("2020-12-15T08:00:00.006Z"), "r1", 4, "D5"
];
let d2 = deps
| where name !in ("D4","D5")
| summarize a=sum(duration)-2
| extend key1="k1";
reqs
| join d2 on key1
| extend result = duration - a
| project result
Test result:
来源:https://stackoverflow.com/questions/65303372/how-to-evaluate-application-insights-requests-own-duration-without-considerin