This is my first day with MongoDB so please go easy with me :)
I can\'t understand the $unwind
operator, maybe because English is not my native language
consider the below example to understand this Data in a collection
{
"_id" : 1,
"shirt" : "Half Sleeve",
"sizes" : [
"medium",
"XL",
"free"
]
}
Query -- db.test1.aggregate( [ { $unwind : "$sizes" } ] );
output
{ "_id" : 1, "shirt" : "Half Sleeve", "sizes" : "medium" }
{ "_id" : 1, "shirt" : "Half Sleeve", "sizes" : "XL" }
{ "_id" : 1, "shirt" : "Half Sleeve", "sizes" : "free" }
$unwind
duplicates each document in the pipeline, once per array element.
So if your input pipeline contains one article doc with two elements in tags
, {$unwind: '$tags'}
would transform the pipeline to be two article docs that are the same except for the tags
field. In the first doc, tags
would contain the first element from the original doc's array, and in the second doc, tags
would contain the second element.
Let me explain in a way corelated to RDBMS way. This is the statement:
db.article.aggregate(
{ $project : {
author : 1 ,
title : 1 ,
tags : 1
}},
{ $unwind : "$tags" }
);
to apply to the document / record:
{
title : "this is my title" ,
author : "bob" ,
posted : new Date () ,
pageViews : 5 ,
tags : [ "fun" , "good" , "fun" ] ,
comments : [
{ author :"joe" , text : "this is cool" } ,
{ author :"sam" , text : "this is bad" }
],
other : { foo : 5 }
}
The $project / Select simply returns these field/columns as
SELECT author, title, tags FROM article
Next is the fun part of Mongo, consider this array tags : [ "fun" , "good" , "fun" ]
as another related table (can't be a lookup/reference table because values has some duplication) named "tags". Remember SELECT generally produces things vertical, so unwind the "tags" is to split() vertically into table "tags".
The end result of $project + $unwind:
Translate the output to JSON:
{ "author": "bob", "title": "this is my title", "tags": "fun"},
{ "author": "bob", "title": "this is my title", "tags": "good"},
{ "author": "bob", "title": "this is my title", "tags": "fun"}
Because we didn't tell Mongo to omit "_id" field, so it's auto-added.
The key is to make it table-like to perform aggregation.
As per mongodb official documentation :
$unwind Deconstructs an array field from the input documents to output a document for each element. Each output document is the input document with the value of the array field replaced by the element.
Explanation through basic example :
A collection inventory has the following documents:
{ "_id" : 1, "item" : "ABC", "sizes": [ "S", "M", "L"] }
{ "_id" : 2, "item" : "EFG", "sizes" : [ ] }
{ "_id" : 3, "item" : "IJK", "sizes": "M" }
{ "_id" : 4, "item" : "LMN" }
{ "_id" : 5, "item" : "XYZ", "sizes" : null }
The following $unwind operations are equivalent and return a document for each element in the sizes field. If the sizes field does not resolve to an array but is not missing, null, or an empty array, $unwind treats the non-array operand as a single element array.
db.inventory.aggregate( [ { $unwind: "$sizes" } ] )
or
db.inventory.aggregate( [ { $unwind: { path: "$sizes" } } ]
Above query output :
{ "_id" : 1, "item" : "ABC", "sizes" : "S" }
{ "_id" : 1, "item" : "ABC", "sizes" : "M" }
{ "_id" : 1, "item" : "ABC", "sizes" : "L" }
{ "_id" : 3, "item" : "IJK", "sizes" : "M" }
Why is it needed?
$unwind is very useful while performing aggregation. it breaks complex/nested document into simple document before performaing various operation like sorting, searcing etc.
To know more about $unwind :
https://docs.mongodb.com/manual/reference/operator/aggregation/unwind/
To know more about aggregation :
https://docs.mongodb.com/manual/reference/operator/aggregation-pipeline/
First off, welcome to MongoDB!
The thing to remember is that MongoDB employs an "NoSQL" approach to data storage, so perish the thoughts of selects, joins, etc. from your mind. The way that it stores your data is in the form of documents and collections, which allows for a dynamic means of adding and obtaining the data from your storage locations.
That being said, in order to understand the concept behind the $unwind parameter, you first must understand what the use case that you are trying to quote is saying. The example document from mongodb.org is as follows:
{
title : "this is my title" ,
author : "bob" ,
posted : new Date () ,
pageViews : 5 ,
tags : [ "fun" , "good" , "fun" ] ,
comments : [
{ author :"joe" , text : "this is cool" } ,
{ author :"sam" , text : "this is bad" }
],
other : { foo : 5 }
}
Notice how tags is actually an array of 3 items, in this case being "fun", "good" and "fun".
What $unwind does is allow you to peel off a document for each element and returns that resulting document. To think of this in a classical approach, it would be the equivilent of "for each item in the tags array, return a document with only that item".
Thus, the result of running the following:
db.article.aggregate(
{ $project : {
author : 1 ,
title : 1 ,
tags : 1
}},
{ $unwind : "$tags" }
);
would return the following documents:
{
"result" : [
{
"_id" : ObjectId("4e6e4ef557b77501a49233f6"),
"title" : "this is my title",
"author" : "bob",
"tags" : "fun"
},
{
"_id" : ObjectId("4e6e4ef557b77501a49233f6"),
"title" : "this is my title",
"author" : "bob",
"tags" : "good"
},
{
"_id" : ObjectId("4e6e4ef557b77501a49233f6"),
"title" : "this is my title",
"author" : "bob",
"tags" : "fun"
}
],
"OK" : 1
}
Notice that the only thing changing in the result array is what is being returned in the tags value. If you need an additional reference on how this works, I've included a link here. Hopefully this helps, and good luck with your foray into one of the best NoSQL systems that I have come across thus far.
Let's understand it by an example
This is how the company document looks like:
The $unwind
allows us to take documents as input that have an array valued field and produces output documents, such that there's one output document for each element in the array. source
So let's go back to our companies examples, and take a look at the use of unwind stages. This query:
db.companies.aggregate([
{ $match: {"funding_rounds.investments.financial_org.permalink": "greylock" } },
{ $project: {
_id: 0,
name: 1,
amount: "$funding_rounds.raised_amount",
year: "$funding_rounds.funded_year"
} }
])
produces documents that have arrays for both amount and year.
Because we're accessing the raised amount and the funded year for every element within the funding rounds array. To fix this, we can include an unwind stage before our project stage in this aggregation pipeline, and parameterize this by saying that we want to unwind
the funding rounds array:
db.companies.aggregate([
{ $match: {"funding_rounds.investments.financial_org.permalink": "greylock" } },
{ $unwind: "$funding_rounds" },
{ $project: {
_id: 0,
name: 1,
amount: "$funding_rounds.raised_amount",
year: "$funding_rounds.funded_year"
} }
])
If we look at the funding_rounds
array, we know that for each funding_rounds
, there is a raised_amount
and a funded_year
field. So, unwind
will for each one of the documents that are elements of the funding_rounds
array produce an output document. Now, in this example, our values are string
s. But, regardless of the type of value for the elements in an array, unwind
will produce an output document for each one of these values, such that the field in question will have just that element. In the case of funding_rounds
, that element will be one of these documents as the value for funding_rounds
for every document that gets passed on to our project
stage. The result, then of having run this, is that now we get an amount
and a year
. One for each funding round for every company in our collection. What this means is that our match produced many company documents and each one of those company documents results in many documents. One for each funding round within every company document. unwind
performs this operation using the documents handed to it from the match
stage. And all of these documents for every company are then passed to the project
stage.
So, all documents where the funder was Greylock (as in the query example) will be split into a number of documents, equal to the number of funding rounds for every company that matches the filter $match: {"funding_rounds.investments.financial_org.permalink": "greylock" }
. And each one those resulting documents will then be passed along to our project
. Now, unwind
produces an exact copy for every one of the documents that it receives as input. All fields have the same key and value, with one exception, and that is that the funding_rounds
field rather than being an array of funding_rounds
documents, instead has a value that is a single document, which is an individual funding round. So, a company that has 4 funding rounds will result in unwind
creating 4 documents. Where every field is an exact copy, except for the funding_rounds
field, which will instead of being an array for each of those copies will instead be an individual element from the funding_rounds
array from the company document that unwind
is currently processing. So, unwind
has the effect of outputting to the next stage more documents than it receives as input. What that means is that our project
stage now gets a funding_rounds
field that again, is not an array, but is instead a nested document that has a raised_amount
and a funded_year
field. So, project
will receive multiple documents for each company match
ing the filter and can therefore process each of the documents individually and identify an individual amount and year for each funding round for each company.