How to do a between join clause in KDB?

|▌冷眼眸甩不掉的悲伤 提交于 2020-01-15 10:21:50

问题


Suppose I have a table A with the columns bucket_start_date, bucket_end_date,

A
bucket_start_date | bucket_end_date
2015.05.02        | 2015.05.08
2015.05.08        | 2015.05.12

Also suppose i have a table B with the columns date, coins.

A
date        | coins
2015.05.02  | 5
2015.05.06  | 11     
2015.05.09  | 32

How do I do a join in kdb that logically looks like

select A.bucket_start_date, A.bucket_end_date, sum(coins) from A join B where B.date BETWEEN A.bucket_start_date and A.bucket_end_date group by A.bucket_start_date, A.bucket_end_date

So I want the result to look like

bucket_start_date | bucket_end_date | sum(coins) 
2015.05.02        | 2015.05.08      | 16 
2015.05.08        | 2015.05.12      | 32

回答1:


A window join is a natural way of acheiving this result. Below is a wj1 function that will get what you are after:

q)wj1[A`bucket_start_date`bucket_end_date;`date;A;(B;(sum;`coins))]
bucket_start_date bucket_end_date coins
---------------------------------------
2015.05.02        2015.05.08      16
2015.05.08        2015.05.12      32

The first variable is a pair of lists of dates, with the first being beginning dates and last being end dates.

The second variable is the common columns, in this case you want to use the date column, since you are looking in which window each date fits in.

The third and fourth variable contains the simple tables to join, and finally (sum;`coins) is a list of the function to be applied to the given column. Again, in this case you are summing the coins column within each window.

A wj considers prevailing values on entry to each interval, whilst wj1 considers only values occuring in each interval. You can change wj1 to wj in the function to see the difference.




回答2:


Firstly it is good convention not to use _ in naming conventions as _ is also used as the drop operator in q.

q)data:([]bucketSt:2015.05.02 2015.05.08;bucketEnd:2015.05.08 2015.05.12)
q)daterange:([]date:2015.05.02 2015.05.06 2015.05.09; coins: 5 11 32)

But the solution to the question without window join can be a fairly straightforward select statement.

update coins:({exec sum coins from daterange where date within x} each get each data) from data

starting from the inside of the () brackets.

q)get each data
2015.05.02 2015.05.08
2015.05.08 2015.05.12

returns the start and end times for each row. Where a simple exec statement with aggregation gets the necessary results from the daterange table. Finally using an update statement on the original table with the new values. Returning the table as follows:

bucketSt   bucketEnd  coins
---------------------------
2015.05.02 2015.05.08 16
2015.05.08 2015.05.12 32

There is a possibility to do a window join as well which is more effective, but this should be easily understandable. Hope it helps!



来源:https://stackoverflow.com/questions/57845802/how-to-do-a-between-join-clause-in-kdb

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!