couchbase N1ql query select with non-group by fields

问题

I am new to couchbase and I have been going through couchbase documents and other online resources for a while but I could't get my query working. Below is the data structure and my query:

Table1:
{
    "jobId" : "101",
    "jobName" : "abcd",
    "jobGroup" : "groupa",
    "created" : " "2018-05-06T19:13:43.318Z",
    "region" : "dev"
},
{
    "jobId" : "102",
    "jobName" : "abcd2",
    "jobGroup" : "groupa",
    "created" : " "2018-05-06T22:13:43.318Z",
    "region" : "dev"
},
{
    "jobId" : "103",
    "jobName" : "abcd3",
    "jobGroup" : "groupb",
    "created" : " "2018-05-05T19:11:43.318Z",
    "region" : "test"
}

I need to get the jobId which has the latest job information (max on created timestamp) for a given jobGroup and region (group by jobGroup and region).

My sql query doesn't help me using self-join on jobId.
Query:

/* Idea is to pull out the job which was executed latest for all possible groups and region and print the details of that particular job

select * from (select max(DATE_FORMAT_STR(j.created,'1111-11-11T00:00:00+00:00')) as latest, j.jobGroup, j.region from table1 j 
group by jobGroup, region) as viewtable
join table t
on keys meta(t).id
where viewtable.latest in t.created and t.jobGroup = viewtable.jobGroup and 
viewtable.region = t.region

Error Result: No result displayed

Desired result :

{
    "jobId" : "102",
    "jobName":"abcd2",
    "jobGroup":"groupa",
    "latest" :"2018-05-06T22:13:43.318Z",
    "region":"dev" 
},
{ 
    "jobId" : "103", 
    "jobName" : "abcd3",
    "jobGroup" : "groupb",
    "created" : " "2018-05-05T19:11:43.318Z",
    "region" : "test"
}

回答1:

If I understand your query correctly, this can be answered using 'group by' and no join. I tried entering your sample data and the following query gives the correct result:

select max([created,d])[1] max_for_group_region from default d group by jobGroup, region;

How does it work? It uses 'group by' to group documents by jobGroup and region, then creates a two-element array holding, for every document in the group:

the 'created' timestamp field
the document where the timestamp came from

It then applies the max function on the set of 2-element arrays. The max of a set of arrays looks for the maximum value in the first array position, and if there's a tie look at the second position, and so on. In this case we are getting the two-element array with the max timestamp.

Now we have an array [ timestamp, document ], so we apply [1] to extract just the document.

回答2:

I'm seeing some inconsistencies and invalid JSON in your examples, so I'm going to do the best I can. First off, I'm using Couchbase Server 5.5 which provides the new ANSI JOIN syntax. There might be a way to do this in an earlier version of Couchbase Server.

Next, I created an index on the created field: CREATE INDEX ix_created ON bucketname(created).

Then, I use a subquery to get the latest date, aggregated by jobGroup and region. I then join the latest date from this query to the entire bucket and select the fields that (I think) you want in your desired result:

SELECT k.jobId, k.jobName, k.jobGroup, k.created AS latest, k.region
FROM (
  SELECT j.jobGroup, j.region, MAX(j.created) as latestDate
  FROM so j
  GROUP BY j.jobGroup, j.region
) dt
LEFT JOIN so k ON k.created = dt.latestDate;

Problems with this approach:

If two documents have the exact same date, this isn't a reliable way to determine the latest. You can add a LIMIT 1 to the subquery, which would just pick one arbitrarily, or you could ORDER BY whatever your preference is.
Subquery performance: I don't know how large your data set is, but this could be pretty slow.
Requires Couchbase Server 5.5, which is currently in beta.

If you are using a different version of Couchbase Server, you may want to consider asking in the Couchbase N1QL Forums for a more expert answer.

来源：https://stackoverflow.com/questions/50223030/couchbase-n1ql-query-select-with-non-group-by-fields

标签

couchbase

n1ql