问题
For every 20 minute period interval, I am trying to find/rank the unique ip addresses, with their corresponding port number, which generate the highest volume of traffic, in mbps (megabits per second), descending order.
Each IP address may or may be recorded more than once in each 20-minute period. Each time an IP address gets recorded in the 20-minute period interval, it may or may not have the same port number listed.
For example, in the table below, the ip address 192.168.10.1 shows up four times during the period listed as 12:20, with port numbers 443, 80, 80 and 80 respectively. In another scenario, the ip address 192.168.10.2 shows up twice during the period 12:40, with the same port number 443, listed twice, but with different values for the mbps (bandwidth) column.
If the ip address shows up more than once in a specific period, check for its corresponding port, and if the same port is listed more than once, only select/list the instance whose port generated the most traffic. No duplicates of ips and ports for each 20-minute period allowed.
The table is partitioned, based on the time of data injection. The rows per 20-minute interval are in millions.
The query is to be in standard SQL. The data is captured in bytes, so I need to somehow also incorporate this conversion to mbps in the query.
original table:
Row time ip_address port mbps
1 01/01/2019 00:00 192.168.10.1 443 100
2 01/01/2019 00:00 192.168.10.1 443 150
3 01/01/2019 00:00 192.168.10.1 80 120
4 01/01/2019 00:00 192.168.10.1 80 123
5 01/01/2019 00:20 192.168.10.2 80 200
6 01/01/2019 00:20 192.168.10.1 80 100
7 01/01/2019 00:20 192.168.10.2 80 210
8 01/01/2019 00:20 192.168.10.1 80 110
9 01/01/2019 00:40 192.168.10.2 443 200
10 01/01/2019 00:40 192.168.10.3 443 300
11 01/01/2019 00:40 192.168.10.2 443 220
12 01/01/2019 00:40 192.168.10.1 443 300
13 01/01/2019 00:00 192.168.10.3 443 90
14 01/01/2019 00:00 192.168.10.2 80 100
15 01/01/2019 00:00 192.168.10.1 443 500
Passing the above through a query, I would like to get the following results:
Row time ip_address port mbps
1 01/01/2019 00:00 192.168.10.1 443 150
2 01/01/2019 00:00 192.168.10.1 80 123
3 01/01/2019 00:20 192.168.10.1 80 110
4 01/01/2019 00:20 192.168.10.2 80 200
5 01/01/2019 00:40 192.168.10.1 443 300
6 01/01/2019 00:40 192.168.10.2 443 220
7 01/01/2019 00:40 192.168.10.3 443 300
8 01/01/2019 00:00 192.168.10.1 443 500
9 01/01/2019 00:00 192.168.10.2 80 100
10 01/01/2019 00:00 192.168.10.3 443 90
I tried using several queries to achieve the above with no luck. Any help/pointing in the right direction, would be appreciated. Thanks!
来源:https://stackoverflow.com/questions/54583510/bigquery-rank-rows-by-desc-order-based-on-values-in-one-of-the-columns-remov