I have streamed and saved about 250k tweets into MongoDB and here, I am retrieving it, as you can see, based on a word, or keyword, present in the tweet.
Mon
You might want to try the following:
String map = "function() { " +
" var regex1 = new RegExp('autobiography', 'i'); " +
" var regex2 = new RegExp('book', 'i'); " +
" if (regex1.test(this.tweet) ) " +
" emit('Autobiography Tweet', 1); " +
" else if (regex2.test(this.tweet) ) " +
" emit('Book Tweet', 1); " +
" else " +
" emit('Uncategorized Tweet', 1); " +
"}";
String reduce = "function(key, values) { " +
" return Array.sum(values); " +
"}";
MapReduceCommand cmd = new MapReduceCommand(collection, map, reduce,
null, MapReduceCommand.OutputType.INLINE, null);
MapReduceOutput out = collection.mapReduce(cmd);
try {
for (DBObject o : out.results()) {
System.out.println(o.toString());
}
} catch (Exception e) {
e.printStackTrace();
}
Although you already accepted the answer by Kay and this one will likely be ignored, I would like to suggest an alternative solution.
Th MongoDB documentation has an article about how to perform full text search in Mongo. In order to allow text-based fields to be searched quickly for individual words, they suggest to prepare the documents by splitting the textfields into arrays of individual words, store these arrays in the documents together with the full text, and create an index over this array.
Afterwards you can very quickly find all documents which contain a specific word, because your search query can 1. use an index and 2. doesn't have to use a regular expression (which can be very expensive).