Import csv data as array in mongodb using mongoimport

后端 未结 2 412
[愿得一人]
[愿得一人] 2021-01-13 07:31

I have been trying to import the csv data into mongodb using mongoimport. The collection is like this:

{
id:\"122234343\",
name: \"name1\",
children: [\"2223         


        
相关标签:
2条回答
  • 2021-01-13 08:17

    To me, the simplest way to work out how to format your "CSV" for mongoimport is to simply create a collection then use mongoexport on it to see what the CSV format should look like.

    So create your document from the shell:

    db.newcol.insert({
      id:"122234343",
      name: "name1",
      children: ["222334444","333344444"]
    })
    

    Then exit the shell and run mongoexport:

     mongoexport -d test -c testcol --fields id,name,children --type csv > out.csv
    

    Which will show you the output as:

    id,name,children
    122234343,name1,"[""222334444"",""333344444""]"
    

    Where the "array" is represented with a "string" and using the quotes "" in their escaped form.

    That now is a pretty clear place to use mongoimport from, so just "import" now to test:

    mongoimport -d test -c newcol --headerline --type csv out.csv
    

    Re-enter the shell and see the document(s) in the new collection:

    db.newcol.findOne()
    {
            "_id" : ObjectId("59476924d6eb0b7d6ac37e02"),
            "id" : 122234343,
            "name" : "name1",
            "children" : "[\"222334444\",\"333344444\"]"
    }
    

    So everything is there, BUT the children are listed as a "string" rather than an array. But this is not really a problem, since we got the data imported and now it's just up us to now actually transform it:

    var ops = [];
    db.testcol.find({ "children": { "$type": 2} }).forEach(doc => {
      var children = doc.children.split(',').map( e => e.replace(/"|\[|\]|\\/gm,'').toString() );
      ops.push({
        "updateOne": {
          "filter": { "_id": doc._id },
          "update": { "$set": { "children": children } }
        }
      });
    
      if ( ops.length >= 1000 ) {
        db.newcol.bulkWrite(ops);
        ops = [];
      }             
    });
    
    if ( ops.length > 0 ) {
      db.newcol.bulkWrite(ops);
      ops = [];
    }
    

    So that is going to iterate anything that was imported into the collection that has a BSON type of 2 which is "string" via the $type query operator.

    Then we take the string, split it as an array and strip the other characters to only leave the value you want.

    Using .bulkWrite() you commit those updates in an efficient way, rather than writing every single document per request. They are actually sent in batches of 1000 to the server.

    The end result is the document in the original wanted form:

    db.testcol.findOne()
    {
            "_id" : ObjectId("5947652ccb237bd6e4e902a5"),
            "id" : "122234343",
            "name" : "name1",
            "children" : [
                    "222334444",
                    "333344444"
            ]
    }
    

    So that is my "step by step" of how you can work out your CSV format, import it and then "transform" the data into the state that you need it.

    0 讨论(0)
  • 2021-01-13 08:22

    collection is like below:

       {
           "_id" : ObjectId("580100f4da893943d393e909"),
           "username" : "crystal",
           "likes" : [ "running", "pandas", "software development" ]
        }
    

    below command for csv to mongodb:

    mongoimport --db users --type csv --headerline --file /opt/backups/contacts.csv
    

    actually Mongodb is non relational db so no need of creating two csv to import

    0 讨论(0)
提交回复
热议问题