Query in a MongoDB Map Reduce Function

后端 未结 2 807
北海茫月
北海茫月 2020-12-21 20:04

I have streamed and saved about 250k tweets into MongoDB and here, I am retrieving it, as you can see, based on a word, or keyword, present in the tweet.

Mon         


        
相关标签:
2条回答
  • 2020-12-21 20:15

    You might want to try the following:

        String map = "function() { " +
                     "    var regex1 = new RegExp('autobiography', 'i'); " +
                     "    var regex2 = new RegExp('book', 'i'); " +
                     "    if (regex1.test(this.tweet) ) " +
                     "         emit('Autobiography Tweet', 1); " +
                     "    else if (regex2.test(this.tweet) ) " +
                     "         emit('Book Tweet', 1); " +
                     "    else " +
                     "       emit('Uncategorized Tweet', 1); " +
                     "}";
    
        String reduce = "function(key, values) { " +
                        "    return Array.sum(values); " +
                        "}";
    
        MapReduceCommand cmd = new MapReduceCommand(collection, map, reduce,
                 null, MapReduceCommand.OutputType.INLINE, null);
        MapReduceOutput out = collection.mapReduce(cmd);
    
        try {
            for (DBObject o : out.results()) {
    
                System.out.println(o.toString());
    
           }
        } catch (Exception e) {
            e.printStackTrace();
        }    
    
    0 讨论(0)
  • 2020-12-21 20:34

    Although you already accepted the answer by Kay and this one will likely be ignored, I would like to suggest an alternative solution.

    Th MongoDB documentation has an article about how to perform full text search in Mongo. In order to allow text-based fields to be searched quickly for individual words, they suggest to prepare the documents by splitting the textfields into arrays of individual words, store these arrays in the documents together with the full text, and create an index over this array.

    Afterwards you can very quickly find all documents which contain a specific word, because your search query can 1. use an index and 2. doesn't have to use a regular expression (which can be very expensive).

    0 讨论(0)
提交回复
热议问题