Mongo Group and sum with two fields

前端 未结 2 414
离开以前
离开以前 2021-01-19 09:13

I have documents like:

{
   \"from\":\"abc@sss.ffffd\",
   \"to\" :\"ssd@dff.dff\",
   \"email\": \"Hi hello\"
}

How can we calculate count o

相关标签:
2条回答
  • 2021-01-19 09:20

    Since you need to calculate number of emails exchanged between 2 addresses, it would be fair to project a unified between field as following:

    db.a.aggregate([
        { $match: {
            to: { $exists: true },
            from: { $exists: true },
            email: { $exists: true }
        }}, 
        { $project: {
            between: { $cond: { 
                if: { $lte: [ { $strcasecmp: [ "$to", "$from" ] }, 0 ] }, 
                then: [ { $toLower: "$to" }, { $toLower: "$from" } ], 
                else: [ { $toLower: "$from" }, { $toLower: "$to" } ] }
            } 
        }},
        { $group: {
             "_id": "$between",
             "count": { $sum: 1 } 
        }},
        { $sort :{ count: -1 } }
    ])
    

    Unification logic should be quite clear from the example: it is an alphabetically sorted array of both emails. The $match and $toLower parts are optional if you trust your data.

    Documentation for operators used in the example:

    • $match
    • $exists
    • $project
    • $cond
    • $lte
    • $strcasecmp
    • $toLower
    • $group
    • $sum
    • $sort
    0 讨论(0)
  • 2021-01-19 09:31

    You basically need to consider the _id for grouping as an "array" of the possible "to" and "from" values, and then of course "sort" them, so that in every document the combination is always in the same order.

    Just as a side note, I want to add that "typically" when I am dealing with messaging systems like this, the "to" and "from" sender/recipients are usually both arrays to begin with anyway, so it usally forms the base of where different variations on this statement come from.

    First, the most optimal MongoDB 3.2 statement, for single addresses

    db.collection.aggregate([
        // Join in array
        { "$project": {
            "people": [ "$to", "$from" ],
        }},
    
        // Unwind array
        { "$unwind": "$people" },
    
        // Sort array
        { "$sort": { "_id": 1, "people": 1 } },
    
        // Group document
        { "$group": {
            "_id": "$_id",
            "people": { "$push": "$people" }
        }},
    
        // Group people and count
        { "$group": {
            "_id": "$people",
            "count": { "$sum": 1 }
        }}
    ]);
    

    Thats the basics, and now the only variations are in construction of the "people" array ( stage 1 only above ).

    MongoDB 3.x and 2.6.x - Arrays

    { "$project": {
        "people": { "$setUnion": [ "$to", "$from" ] }
    }}
    

    MongoDB 3.x and 2.6.x - Fields to array

    { "$project": {
        "people": { 
            "$map": {
                "input": ["A","B"],
                "as": "el",
                "in": {
                   "$cond": [
                       { "$eq": [ "A", "$$el" ] },
                       "$to",
                       "$from"
                   ]
                }
            }
        }
    }}
    

    MongoDB 2.4.x and 2.2.x - from fields

    { "$project": {
        "to": 1,
        "from": 1,
        "type": { "$const": [ "A", "B" ] }
    }},
    { "$unwind": "$type" },
    { "$group": {
        "_id": "$_id",
        "people": {
            "$addToSet": {
                "$cond": [
                    { "$eq": [ "$type", "A" ] },
                    "$to",
                    "$from"
                ]
            }
        }
    }}
    

    But in all cases:

    1. Get all recipients into a distinct array.

    2. Order the array to a consistent order

    3. Group on the "always in the same order" list of recipients.

    Follow that and you cannot go wrong.

    0 讨论(0)
提交回复
热议问题