mongoDB map/reduce minus the reduce

后端未结

关注

 5  1813

I have some 25k documents (4 GB in raw json) of data that I want to perform a few javascript operations on to make it more accessible to my end data consumer (R), a

相关标签:

5条回答

一整个雨季

2021-02-01 21:30

I faced the same situation. I was able to accomplish this via Mongo query and projection. see Mongo Query

0 讨论(0)
发布评论:

提交评论
- 加载中...
南笙

2021-02-01 21:32
When you got access to the mongo shell, it accepts some Javascript commands and then it's simpler:
```
map = function(item){
        db.result.insert(item);
}

db.collection.find().forEach(map);
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
闹比i

2021-02-01 21:41
When using map/reduce you'll always end up with
```
{ "value" : { <reduced data> } }
```
In order to remove the value key you'll have to use a finalize function.

Here's the simplest you can do to copy data from one collection to another:
```
map = function() { emit(this._id, this ); }
reduce = function(key, values) { return values[0]; }
finalize = function(key, value) { db.collection_2.insert(value); }
```
Then when you would run as normal:
```
db.collection_1.mapReduce(map, reduce, { finalize: finalize });
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
抹茶落季

2021-02-01 21:46

Only map without reduce is like copying a collection: http://www.mongodb.org/display/DOCS/Developer+FAQ#DeveloperFAQ-HowdoIcopyallobjectsfromonedatabasecollectiontoanother%3F

0 讨论(0)
发布评论:

提交评论
- 加载中...
感动是毒

2021-02-01 21:49
But that seems awkward and I don't know why it even works, since my emit call arguments in my mapper are not equivalent to the return argument of my reducer.

They are equivalent. The reduce function takes in an array of T values and should return a single value in the same T format. The format of T is defined by your map function. Your reduce function simply returns the first item in the values array, which will always be of type T. That's why it works :)

You seem to be on the right track. I did some experimenting and it seems you cannot do a db.collection.save() from the map function, but you can do this from the reduce function. Your map function should simply construct the document format you need:
```
function map() {
  emit(this._id, { _id: this.id, heading: this.title, body: this.content });
}
```
The map function reuses the ID of the original document. This should prevent any re-reduce steps, since no values will share the same key.

The reduce function can simply return null. But in addition, you can write the value to a separate collection.
```
function reduce(key, values) {
  db.result.save(values[0]);

  return null;
}
```
Now db.result should contain the transformed documents, without any additional map-reduce noise you'd have in the temporary collection. I haven't actually tested this on large amounts of data, but this approach should take advantage of the parallelized execution of map-reduce functions.
0 讨论(0)
发布评论:

提交评论
- 加载中...