问题
I am new to couchbase and I have been going through couchbase documents and other online resources for a while but I could't get my query working. Below is the data structure and my query:
Table1:
{
"jobId" : "101",
"jobName" : "abcd",
"jobGroup" : "groupa",
"created" : " "2018-05-06T19:13:43.318Z",
"region" : "dev"
},
{
"jobId" : "102",
"jobName" : "abcd2",
"jobGroup" : "groupa",
"created" : " "2018-05-06T22:13:43.318Z",
"region" : "dev"
},
{
"jobId" : "103",
"jobName" : "abcd3",
"jobGroup" : "groupb",
"created" : " "2018-05-05T19:11:43.318Z",
"region" : "test"
}
I need to get the jobId which has the latest job information (max on created timestamp) for a given jobGroup and region (group by jobGroup and region).
My sql query doesn't help me using self-join on jobId.
Query:
select * from (select max(DATE_FORMAT_STR(j.created,'1111-11-11T00:00:00+00:00')) as latest, j.jobGroup, j.region from table1 j
group by jobGroup, region) as viewtable
join table t
on keys meta(t).id
where viewtable.latest in t.created and t.jobGroup = viewtable.jobGroup and
viewtable.region = t.region
Error Result: No result displayed
Desired result :
{
"jobId" : "102",
"jobName":"abcd2",
"jobGroup":"groupa",
"latest" :"2018-05-06T22:13:43.318Z",
"region":"dev"
},
{
"jobId" : "103",
"jobName" : "abcd3",
"jobGroup" : "groupb",
"created" : " "2018-05-05T19:11:43.318Z",
"region" : "test"
}
回答1:
If I understand your query correctly, this can be answered using 'group by' and no join. I tried entering your sample data and the following query gives the correct result:
select max([created,d])[1] max_for_group_region
from default d
group by jobGroup, region;
How does it work? It uses 'group by' to group documents by jobGroup and region, then creates a two-element array holding, for every document in the group:
- the 'created' timestamp field
- the document where the timestamp came from
It then applies the max function on the set of 2-element arrays. The max of a set of arrays looks for the maximum value in the first array position, and if there's a tie look at the second position, and so on. In this case we are getting the two-element array with the max timestamp.
Now we have an array [ timestamp, document ], so we apply [1] to extract just the document.
回答2:
I'm seeing some inconsistencies and invalid JSON in your examples, so I'm going to do the best I can. First off, I'm using Couchbase Server 5.5 which provides the new ANSI JOIN syntax. There might be a way to do this in an earlier version of Couchbase Server.
Next, I created an index on the created
field: CREATE INDEX ix_created ON bucketname(created)
.
Then, I use a subquery to get the latest date, aggregated by jobGroup and region. I then join the latest date from this query to the entire bucket and select the fields that (I think) you want in your desired result:
SELECT k.jobId, k.jobName, k.jobGroup, k.created AS latest, k.region
FROM (
SELECT j.jobGroup, j.region, MAX(j.created) as latestDate
FROM so j
GROUP BY j.jobGroup, j.region
) dt
LEFT JOIN so k ON k.created = dt.latestDate;
Problems with this approach:
- If two documents have the exact same date, this isn't a reliable way to determine the latest. You can add a
LIMIT 1
to the subquery, which would just pick one arbitrarily, or you couldORDER BY
whatever your preference is. - Subquery performance: I don't know how large your data set is, but this could be pretty slow.
- Requires Couchbase Server 5.5, which is currently in beta.
If you are using a different version of Couchbase Server, you may want to consider asking in the Couchbase N1QL Forums for a more expert answer.
来源:https://stackoverflow.com/questions/50223030/couchbase-n1ql-query-select-with-non-group-by-fields