问题
Suppose I have a table A with the columns bucket_start_date, bucket_end_date,
A
bucket_start_date | bucket_end_date
2015.05.02 | 2015.05.08
2015.05.08 | 2015.05.12
Also suppose i have a table B with the columns date, coins.
A
date | coins
2015.05.02 | 5
2015.05.06 | 11
2015.05.09 | 32
How do I do a join in kdb that logically looks like
select A.bucket_start_date, A.bucket_end_date, sum(coins) from A join B where B.date BETWEEN A.bucket_start_date and A.bucket_end_date group by A.bucket_start_date, A.bucket_end_date
So I want the result to look like
bucket_start_date | bucket_end_date | sum(coins)
2015.05.02 | 2015.05.08 | 16
2015.05.08 | 2015.05.12 | 32
回答1:
A window join is a natural way of acheiving this result. Below is a wj1
function that will get what you are after:
q)wj1[A`bucket_start_date`bucket_end_date;`date;A;(B;(sum;`coins))]
bucket_start_date bucket_end_date coins
---------------------------------------
2015.05.02 2015.05.08 16
2015.05.08 2015.05.12 32
The first variable is a pair of lists of dates, with the first being beginning dates and last being end dates.
The second variable is the common columns, in this case you want to use the date
column, since you are looking in which window each date fits in.
The third and fourth variable contains the simple tables to join, and finally (sum;`coins)
is a list of the function to be applied to the given column. Again, in this case you are summing the coins column within each window.
A wj
considers prevailing values on entry to each interval, whilst wj1
considers only values occuring in each interval. You can change wj1
to wj
in the function to see the difference.
回答2:
Firstly it is good convention not to use _ in naming conventions as _ is also used as the drop operator in q.
q)data:([]bucketSt:2015.05.02 2015.05.08;bucketEnd:2015.05.08 2015.05.12)
q)daterange:([]date:2015.05.02 2015.05.06 2015.05.09; coins: 5 11 32)
But the solution to the question without window join can be a fairly straightforward select statement.
update coins:({exec sum coins from daterange where date within x} each get each data) from data
starting from the inside of the () brackets.
q)get each data
2015.05.02 2015.05.08
2015.05.08 2015.05.12
returns the start and end times for each row. Where a simple exec statement with aggregation gets the necessary results from the daterange table. Finally using an update statement on the original table with the new values. Returning the table as follows:
bucketSt bucketEnd coins
---------------------------
2015.05.02 2015.05.08 16
2015.05.08 2015.05.12 32
There is a possibility to do a window join as well which is more effective, but this should be easily understandable. Hope it helps!
来源:https://stackoverflow.com/questions/57845802/how-to-do-a-between-join-clause-in-kdb