问题
I have a collection of a similar look:
_id:5d0fe0dcfd8ea94eb4633222
Category:"Stripveiling (Nederlands)"
Category url:"https://www.catawiki.nl/a/11-stripveiling-nederlands"
Lot title:"Erwin Sels (Ersel) - Originele pagina"
Seller name:"Stripwereld"
Seller country:"Nederland"
Bids count:21
Winning bid:"€ 135"
Bid amount:"Closed"
Lot image:"https://assets.catawiki.nl/assets/2011/11/17/7/4/c/74c53540-f390-012e-..."
I need to change the "Winning bid" field to a int. That is, remove the currency sign and convert from string to int for the entire collection.
Nowhere in the documentation I could not find how to do it, do I really have to take every value with Python, remove the currency symbol and use the method update to do it? I have almost 8,000,000 records, it will be long.
How can I do this with the collection method? Or what is the quickest option to do this with Python?
回答1:
If you want to convert the entire collection, you can do it with Aggregation pipeline.
You need to convert the currency to string using $substr
and $toInt(
or $toDouble
, or $convert
whatever suits your case) in the $project
stage and $out
as your last stage of aggregation. $out
writes the result of the aggregtion pipeline to the given collection name.
But be careful while using $out
. According to official mongodb documentation :
Create New Collection
The
$out
operation creates a new collection in the current database if one does not already exist. The collection is not visible until the aggregation completes. If the aggregation fails, MongoDB does not create the collection.Replace Existing Collection
If the collection specified by the
$out
operation already exists, then upon completion of the aggregation, the$out
stage atomically replaces the existing collection with the new results collection. Specifically, the$out
operation:
- Creates a temp collection.
- Copies the indexes from the existing collection to the temp collection.
- Inserts the documents into the temp collection.
- Calls db.collection.renameCollection with dropTarget: true to rename the temp collection to the destination collection.
The
$out
operation does not change any indexes that existed on the previous collection. If the aggregation fails, the$out
operation makes no changes to the pre-existing collection.
Try this :
db.collection_name.aggregate([
{
$project: {
category : "$category",
category_name : "$category_name",
lot_title : "$lot_title",
seller_name : "$seller_name",
seller_country : "$seller_country",
bid_count : "$bid_count",
winning_bid : { $toInt : {$substr : ["$winning_bid",2,-1]}},
bid_amount : "$bid_amount",
lot_image : "$lot_image"
}
},{
$out : "collection_name"
}
])
you might need to use allowDiskUse : true
as an option to aggregation pipeline, as you have a lots of documents, and it may surpass 16MB mongodb limit.
Don't forget to replace collection_name
with actual collection name , and include all the required field in the $project
stage which you need in the collection. And please double check the value first either with a different temporary_collection
or just by removing the $out stage and checking the result of aggregation
pipeline.
For detailed information read official mongodb documentation $out, $toInt, $toDouble, $convert, $substr and allowDiskUse.
来源:https://stackoverflow.com/questions/56813094/how-to-convert-a-string-with-characters-in-the-int-for-the-entire-collection