How to use MongoDB aggregation for general purpose set operations (union, intersection, difference)

醉酒当歌 提交于 2019-11-29 18:00:18

问题


I have come across some special purpose implementation of set operations, but nothing for the general case. What is the general case for performing set operations (specifically intersection, union, symmetric difference). This is easier to figure out using javascript in a $where or map reduce, but I want to know how to do this in aggregation in order to get native performance.

The better way to illustrate this question is with an example. Say I have a record with 2 arrays/sets:

db.colors.insert({
    _id: 1,
    left : ['red', 'green'],
    right : ['green', 'blue']
});

I want to find the union, intersection and difference of the 'left' and 'right' arrays. Even better, pictorially I want to find:

Union --> ['red', 'green', 'blue']

Intersection --> ['green']

Symmetric Difference --> ['red', 'blue']


回答1:


Version 2.6+ Only:

As of version 2.6 of MongoDB, this has become much much easier. You can now do the following to solve this problem:

Union

db.colors.aggregate([
    {'$project': {  
                    union:{$setUnion:["$left","$right"]}
                 }
    }
]);

Intersection

db.colors.aggregate([
    {'$project': {  
                  int:{$setIntersection:["$left","$right"]}
                 }
    }
]);

Relative Complement

db.colors.aggregate([
    {'$project': {  
                    diff:{$setDifference:["$left","$right"]}
                 }
    }
]);

Symmetric Difference

db.colors.aggregate([
    {'$project': {  
                    diff:{$setUnion:[{$setDifference:["$left","$right"]}, {$setDifference:["$right","$left"]}]}
                 }
    }
]);

Note: There is a ticket requesting symmetric difference be added as a core feature rather than having to do the union of two set differences.




回答2:


The easiest of these three using aggregation is the intersection**. The general case for that can be done using aggregation like so:

Intersection:

db.colors.aggregate([
    {'$unwind' : "$left"},
    {'$unwind' : "$right"},
    {'$project': {  
                    value:"$left", 
                    same:{$cond:[{$eq:["$left","$right"]}, 1, 0]}
                 }
    },
    {'$group'  : { 
                    _id: {id:'$_id', val:'$value'}, 
                    doesMatch:{$max:"$same"}
                 }
    },
    {'$match'   :{doesMatch:1}},
]);

The other two become a bit more tricky. To my knowledge there isn't a way of combining two separate fields in the same document together. It would be nice to have an $add, $combine, or $addToSet in the $project pipeline phase, but this doesn't exist. So the best we can do is say if something has intersected or not. We can start both aggregations with the following:

db.colors.aggregate([
    {'$unwind' : "$left"},
    {'$unwind' : "$right"},
    {'$project': {  
                    left:"$left",
                    right:'$right',
                    same:{$cond:[{$eq:["$left","$right"]}, 1, 0]}
                 }
    },
    {'$group'  : {
                    _id:{id:'$_id', left:'$left'},
                    right:{'$addToSet':'$right'},
                    sum: {'$sum':'$same'},
                 }
    },
    {'$project': {  
                    left:{val:"$_id.left",inter:"$sum"},
                    right:'$right',
                 }
    },
    {'$unwind' : "$right"},
    {'$project': {  
                    left:"$left",
                    right:'$right',
                    same:{$cond:[{$eq:["$left.val","$right"]}, 1, 0]}
                 }
    },
    {'$group'  : {
                    _id:{id:'$_id.id', right:'$right'},
                    left:{'$addToSet':'$left'},
                    sum: {'$sum':'$same'},
                 }
    },
    {'$project': {  
                    right:{val:"$_id.right",inter:"$sum"},
                    left:'$left',
                 }
    },
    {'$unwind' : "$left"},
    {'$group'  : {
                    _id:'$_id.id',
                    left:{'$addToSet':'$left'},
                    right: {'$addToSet':'$right'},
                 }
    },
]);

This aggregation on the sample provided in the question will give a result like this:

{
        "_id" : 1,
        "left" : [
                {
                        "val" : "green",
                        "inter" : 1
                },
                {
                        "val" : "red",
                        "inter" : 0
                }
        ],
        "right" : [
                {
                        "val" : "blue",
                        "inter" : 0
                },
                {
                        "val" : "green",
                        "inter" : 1
                }
        ]
}

From here we can get the intersection by adding the following to the aggregation:

{'$project': {  
                    left:"$left"
                 }
    },
    {'$unwind' : "$left"},
    {'$match'  : {'left.inter': 1}},
    {'$group'  : {
                    _id:'$_id',
                    left:{'$addToSet':'$left'},
                 }
    },

We can find the difference as well as the relative complement by adding the following to the end of the base aggregation:

{'$unwind' : "$left"},
    {'$match'  : {'left.inter': 0}},
    {'$unwind' : "$right"},
    {'$match'  : {'right.inter': 0}},
    {'$group'  : {
                    _id:'$_id',
                    left:{'$addToSet':'$left'},
                    right:{'$addToSet':'$right'},
                 }
    },

Unfortunately there does not appear to be a good way to combine dissimilar items from different fields together. In order to get the union, it seems best to do that from the client. Or if you want filtering, do it on each set individually.



来源:https://stackoverflow.com/questions/17268770/how-to-use-mongodb-aggregation-for-general-purpose-set-operations-union-inters

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!