Perform Aggregation/Set intersection on MongoDB

三世轮回 提交于 2019-12-02 16:54:27

问题


I have a query, consider the following example as a intermediate data after performing some aggregation on a sample dataset;

fileid field contains the id of a file, and the user array containing array of users, who made some changes to the respective file

{
   “_id” : {  “fileid” : 12  },
   “_user” : [ “a”,”b”,”c”,”d” ]
}
{
   “_id” : {  “fileid” : 13  },
   “_user” : [ “f”,”e”,”a”,”b” ]
}
{
   “_id” : {  “fileid” : 14  },
   “_user” : [ “g”,”h”,”m”,”n” ]
}
{
   “_id” : {  “fileid” : 15  },
   “_user” : [ “o”,”r”,”s”,”v” ]
}
{
   “_id” : {  “fileid” : 16  },
   “_user” : [ “x”,”y”,”z”,”a” ]
}
{
   “_id” : {  “fileid” : 17  },
   “_user” : [ “g”,”r”,”s”,”n” ]
}

I need to find solution for this -> any two users that did some changes to atleast two of the same file. So the output-result should be

{
   “_id” : {  “fileid” : [12,13]  },
   “_user” : [ “a”,”b”]
}
{
   “_id” : {  "fileid” : [14,17]  },
   “_user” : [ “g”,”n” ]
}
{
   “_id” : {  "fileid” : [15,17]  },
   “_user” : [ “r”,”s” ]
}

Your inputs are highly appreciated.


回答1:


This is a somewhat involved solution. The idea is to first use the DB to get the population of possible pairs, then turn around and ask the DB to find the pairs in the _user field. Beware that 1000s of users will create a pretty darn big pairing list. We use $addFields just in case there's more to the input records than we see in the example, but if not, for efficiency replace with $project to cut down the amount of material flowing through the pipe.

//
// Stage 1:  Get unique set of username pairs.
//
c=db.foo.aggregate([
{$unwind: "$_user"}

// Create single deduped list of users:
,{$group: {_id:null, u: {$addToSet: "$_user"} }}

// Nice little double map here creates the pairs, effectively doing this:
//    for index in range(0, len(list)):
//      first = list[index]
//      for p2 in range(index+1, len(list)):
//        pairs.append([first,list[p2]])
// 
,{$addFields: {u: 
  {$map: {
    input: {$range:[0,{$size:"$u"}]},
    as: "z",
    in: {
        $map: {
            input: {$range:[{$add:[1,"$$z"]},{$size:"$u"}]},
            as: "z2",
            in: [
            {$arrayElemAt:["$u","$$z"]},
            {$arrayElemAt:["$u","$$z2"]}
            ]
        }
    }
    }}
}}

// Turn the array of array of pairs in to a nice single array of pairs:
,{$addFields: {u: {$reduce:{
        input: "$u",
        initialValue:[],
        in:{$concatArrays: [ "$$value", "$$this"]}
        }}
    }}
          ]);


// Stage 2:  Find pairs and tally up the fileids

doc = c.next(); // Get single output from Stage 1 above.                       

u = doc['u'];

c2=db.foo.aggregate([
{$addFields: {_x: {$map: {
                input: u,
                as: "z",
                in: {
                    n: "$$z",
                    q: {$setIsSubset: [ "$$z", "$_user" ]}
                }
            }
        }
    }}
,{$unwind: "$_x"}
,{$match: {"_x.q": true}}
//  Nice use of grouping by an ARRAY here:
,{$group: {_id: "$_x.n", v: {$push: "$_id.fileid"}, n: {$sum:1} }}
,{$match: {"n": {"$gt":1}}}
                     ]);

show(c2);


来源:https://stackoverflow.com/questions/46634259/perform-aggregation-set-intersection-on-mongodb

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!