Perform Aggregation/Set intersection on MongoDB

前端 未结 1 949
悲哀的现实
悲哀的现实 2021-01-27 08:31

I have a query, consider the following example as a intermediate data after performing some aggregation on a sample dataset;

fileid field contains the id of a file, and

相关标签:
1条回答
  • 2021-01-27 08:59

    This is a somewhat involved solution. The idea is to first use the DB to get the population of possible pairs, then turn around and ask the DB to find the pairs in the _user field. Beware that 1000s of users will create a pretty darn big pairing list. We use $addFields just in case there's more to the input records than we see in the example, but if not, for efficiency replace with $project to cut down the amount of material flowing through the pipe.

    //
    // Stage 1:  Get unique set of username pairs.
    //
    c=db.foo.aggregate([
    {$unwind: "$_user"}
    
    // Create single deduped list of users:
    ,{$group: {_id:null, u: {$addToSet: "$_user"} }}
    
    // Nice little double map here creates the pairs, effectively doing this:
    //    for index in range(0, len(list)):
    //      first = list[index]
    //      for p2 in range(index+1, len(list)):
    //        pairs.append([first,list[p2]])
    // 
    ,{$addFields: {u: 
      {$map: {
        input: {$range:[0,{$size:"$u"}]},
        as: "z",
        in: {
            $map: {
                input: {$range:[{$add:[1,"$$z"]},{$size:"$u"}]},
                as: "z2",
                in: [
                {$arrayElemAt:["$u","$$z"]},
                {$arrayElemAt:["$u","$$z2"]}
                ]
            }
        }
        }}
    }}
    
    // Turn the array of array of pairs in to a nice single array of pairs:
    ,{$addFields: {u: {$reduce:{
            input: "$u",
            initialValue:[],
            in:{$concatArrays: [ "$$value", "$$this"]}
            }}
        }}
              ]);
    
    
    // Stage 2:  Find pairs and tally up the fileids
    
    doc = c.next(); // Get single output from Stage 1 above.                       
    
    u = doc['u'];
    
    c2=db.foo.aggregate([
    {$addFields: {_x: {$map: {
                    input: u,
                    as: "z",
                    in: {
                        n: "$$z",
                        q: {$setIsSubset: [ "$$z", "$_user" ]}
                    }
                }
            }
        }}
    ,{$unwind: "$_x"}
    ,{$match: {"_x.q": true}}
    //  Nice use of grouping by an ARRAY here:
    ,{$group: {_id: "$_x.n", v: {$push: "$_id.fileid"}, n: {$sum:1} }}
    ,{$match: {"n": {"$gt":1}}}
                         ]);
    
    show(c2);
    
    0 讨论(0)
提交回复
热议问题