Mongo and Pivot

后端 未结 2 578
野趣味
野趣味 2021-01-06 03:32

I need help with mongo in this problem: I have collection stats (UserId, EventId, Count, Date) in collection are data

UserID | EventId | Count | Date

相关标签:
2条回答
  • 2021-01-06 04:07

    You have to do this with a MapReduce operation.

    Your map function would look like this: (untested!):

    var mapFunction = function() {
                       var ret = {};
                       ret["Count_Event_" + this.EventId] = this.Count;
                       emit(this.UserId, ret);
                   };
    

    This emits a series of pairs consisting of the UserId and an object with a single, differently-named attribute with the count as a value.

    Your reduce function would then combine the results into one (untested - I am not sure if you can just increment a non-existing property and I can't test it right now):

    var reduceFunction = function(UserId, values_array) {
                       var ret = {};
    
                       for (var i = 0; i < values_array.length; i++) {
                           var values = values_array[i];
                           for (key in values) {
                               ret[key] += values[key]; // Can you increment a non-existing attribute? Not sure, try it, please.
                           }
                       }                       
    
                       return ret;
                   };
    

    You then call this like this:

     db.yourCollection.mapReduce(
                     mapFunction,
                     reduceFunction,
                     out: { inline: 1 }
                   )
    

    The line out: { inline: 1 } outputs the results into the console. Usually you use MapReduce to create a new collection with the results. See the documentation for more information.

    0 讨论(0)
  • 2021-01-06 04:16

    It is easier, and much faster, to do the job with an aggregate()!

    We will use a $project to create a counter field for each event, filling in the count from the document, if the event matches, zero otherwise. Then we will $group by user-id, summing up all the event counters.

    For the sake of explanation, let me first show how this looks like hard-coded for the two different events (1 and 2) in your example:

    db.xx.aggregate([
        { $project: { userid:1,
                      cnt_e1: { $cond: [ { $eq: [ "$event", 1 ] }, "$count", 0 ] },
                      cnt_e2: { $cond: [ { $eq: [ "$event", 2 ] }, "$count", 0 ] },
        } },
        { $group: { _id: "$userid", cnt_e1: { $sum: "$cnt_e1" }, cnt_e2: { $sum: "$cnt_e2" } } },  
        { $sort: { _id: 1 } },
    ])
    

    For the given collection:

    > db.xx.find({},{_id:0})
    { "userid" : 1, "event" : 1, "count" : 10 }
    { "userid" : 1, "event" : 1, "count" : 15 }
    { "userid" : 1, "event" : 2, "count" : 12 }
    { "userid" : 2, "event" : 1, "count" : 5 }
    { "userid" : 3, "event" : 2, "count" : 10 }
    

    the result is:

    {
        "result" : [
            {
                "_id" : 1,
                "cnt_e1" : 25,
                "cnt_e2" : 12
            },
            {
                "_id" : 2,
                "cnt_e1" : 5,
                "cnt_e2" : 0
            },
            {
                "_id" : 3,
                "cnt_e1" : 0,
                "cnt_e2" : 10
            }
        ],
        "ok" : 1
    }
    

    To get this done for variable events, we'll have to generate the projection and the grouping. We'll get an array of all possible events using the distinct() command (you might want to define an index on "event"). Then we create the two statements as JSON objects by looping over the array:

    project = {};
    project.$project = {};
    project.$project.userid = 1;
    
    group = {};
    group.$group = {};
    group.$group._id = '$userid'
    
    events = db.xx.distinct( "event" );
    events.forEach( function( e ) {
        field = "cnt_e" + e;
    
        eval("project.$project." + field + " = {}");
        eval("project.$project." + field + ".$cond = []");
        eval("project.$project." + field + ".$cond[0] = {}");
        eval("project.$project." + field + ".$cond[0].$eq = []");
        eval("project.$project." + field + ".$cond[0].$eq[0] = '$event'");
        eval("project.$project." + field + ".$cond[0].$eq[1] = " + e );
        eval("project.$project." + field + ".$cond[1] = '$count'");
        eval("project.$project." + field + ".$cond[2] = 0");
    
        eval("group.$group." + field + " = {}");
        eval("group.$group." + field + ".$sum = '$" + field + "'");
    });
    
    //printjson(project);
    //printjson(group);
    
    db.xx.aggregate([
        project,
        group,
        { $sort: { _id: 1 } },
    ])
    

    And the result is the same as above.

    Note: the above works for numerical events. If they were strings, you'd have to adapt the generator.

    At first sight, this might look more complicated than @Philipp 's mapReduce. But that will not return all events for each user - only the ones that do have a count. For a complete vertical to horizontal mapping you would have to generate the map and the reduce functions as well.

    For more information on aggregate(), see http://docs.mongodb.org/manual/aggregation/

    0 讨论(0)
提交回复
热议问题