I have a query, consider the following example as a intermediate data after performing some aggregation on a sample dataset;
fileid field contains the id of a file, and
This is a somewhat involved solution. The idea is to first use the DB to get the population of possible pairs, then turn around and ask the DB to find the pairs in the _user
field. Beware that 1000s of users will create a pretty darn big pairing list. We use $addFields
just in case there's more to the input records than we see in the example, but if not, for efficiency replace with $project
to cut down the amount of material flowing through the pipe.
//
// Stage 1: Get unique set of username pairs.
//
c=db.foo.aggregate([
{$unwind: "$_user"}
// Create single deduped list of users:
,{$group: {_id:null, u: {$addToSet: "$_user"} }}
// Nice little double map here creates the pairs, effectively doing this:
// for index in range(0, len(list)):
// first = list[index]
// for p2 in range(index+1, len(list)):
// pairs.append([first,list[p2]])
//
,{$addFields: {u:
{$map: {
input: {$range:[0,{$size:"$u"}]},
as: "z",
in: {
$map: {
input: {$range:[{$add:[1,"$$z"]},{$size:"$u"}]},
as: "z2",
in: [
{$arrayElemAt:["$u","$$z"]},
{$arrayElemAt:["$u","$$z2"]}
]
}
}
}}
}}
// Turn the array of array of pairs in to a nice single array of pairs:
,{$addFields: {u: {$reduce:{
input: "$u",
initialValue:[],
in:{$concatArrays: [ "$$value", "$$this"]}
}}
}}
]);
// Stage 2: Find pairs and tally up the fileids
doc = c.next(); // Get single output from Stage 1 above.
u = doc['u'];
c2=db.foo.aggregate([
{$addFields: {_x: {$map: {
input: u,
as: "z",
in: {
n: "$$z",
q: {$setIsSubset: [ "$$z", "$_user" ]}
}
}
}
}}
,{$unwind: "$_x"}
,{$match: {"_x.q": true}}
// Nice use of grouping by an ARRAY here:
,{$group: {_id: "$_x.n", v: {$push: "$_id.fileid"}, n: {$sum:1} }}
,{$match: {"n": {"$gt":1}}}
]);
show(c2);