BigQuery - rank rows by desc order, based on values in one of the columns, removing duplicates

喜夏-厌秋 提交于 2019-12-23 04:40:53

问题


For every 20 minute period interval, I am trying to find/rank the unique ip addresses, with their corresponding port number, which generate the highest volume of traffic, in mbps (megabits per second), descending order.

Each IP address may or may be recorded more than once in each 20-minute period. Each time an IP address gets recorded in the 20-minute period interval, it may or may not have the same port number listed.

For example, in the table below, the ip address 192.168.10.1 shows up four times during the period listed as 12:20, with port numbers 443, 80, 80 and 80 respectively. In another scenario, the ip address 192.168.10.2 shows up twice during the period 12:40, with the same port number 443, listed twice, but with different values for the mbps (bandwidth) column.

If the ip address shows up more than once in a specific period, check for its corresponding port, and if the same port is listed more than once, only select/list the instance whose port generated the most traffic. No duplicates of ips and ports for each 20-minute period allowed.

The table is partitioned, based on the time of data injection. The rows per 20-minute interval are in millions.

The query is to be in standard SQL. The data is captured in bytes, so I need to somehow also incorporate this conversion to mbps in the query.

original table:

Row  time                ip_address          port        mbps

1    01/01/2019 00:00    192.168.10.1        443         100
2    01/01/2019 00:00    192.168.10.1        443         150
3    01/01/2019 00:00    192.168.10.1        80          120
4    01/01/2019 00:00    192.168.10.1        80          123
5    01/01/2019 00:20    192.168.10.2        80          200
6    01/01/2019 00:20    192.168.10.1        80          100
7    01/01/2019 00:20    192.168.10.2        80          210
8    01/01/2019 00:20    192.168.10.1        80          110
9    01/01/2019 00:40    192.168.10.2        443         200
10   01/01/2019 00:40    192.168.10.3        443         300
11   01/01/2019 00:40    192.168.10.2        443         220
12   01/01/2019 00:40    192.168.10.1        443         300
13   01/01/2019 00:00    192.168.10.3        443         90
14   01/01/2019 00:00    192.168.10.2        80          100
15   01/01/2019 00:00    192.168.10.1        443         500

Passing the above through a query, I would like to get the following results:

Row  time                ip_address          port        mbps

1    01/01/2019 00:00    192.168.10.1        443         150
2    01/01/2019 00:00    192.168.10.1        80          123
3    01/01/2019 00:20    192.168.10.1        80          110
4    01/01/2019 00:20    192.168.10.2        80          200
5    01/01/2019 00:40    192.168.10.1        443         300
6    01/01/2019 00:40    192.168.10.2        443         220
7    01/01/2019 00:40    192.168.10.3        443         300
8    01/01/2019 00:00    192.168.10.1        443         500
9    01/01/2019 00:00    192.168.10.2        80          100
10   01/01/2019 00:00    192.168.10.3        443         90

I tried using several queries to achieve the above with no luck. Any help/pointing in the right direction, would be appreciated. Thanks!

来源:https://stackoverflow.com/questions/54583510/bigquery-rank-rows-by-desc-order-based-on-values-in-one-of-the-columns-remov

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!