Mongoimport csv files with string _id and upsert

后端 未结 5 1990
清歌不尽
清歌不尽 2021-01-13 05:26

I\'m trying to use mongoimport to upsert data with string values in _id. Since the ids look like integers (even though they\'re in quotes), mongoimport treats them as intege

相关标签:
5条回答
  • 2021-01-13 05:56

    Unfortunately there is not now a way to force number-like strings to be interpreted as strings:

    https://jira.mongodb.org/browse/SERVER-3731

    You could write a script in Python or some other language with which you're comfortable, along the lines of:

    import csv, pymongo
    
    connection = pymongo.Connection()
    collection = connection.mydatabase.mycollection
    reader = csv.DictReader(open('myfile.csv'))
    for line in reader:
        print '_id', line['_id']
        upsert_fields = {
            '_id': line['_id'],
            'my_other_upsert_field': line['my_other_upsert_field']}
    
        collection.update(upsert_fields, line, upsert=True, safe=True)
    
    0 讨论(0)
  • 2021-01-13 06:00

    Just encountered this same issue and discovered an alternative. You can force Mongo to use string types for non-string values by converting your CSV to JSON and quoting the field. For example, if your CSV looks like this:

    key value
    123 foo
    abc bar
    

    Then you'll get an integer field for key 123 and a string field for key abc. If you convert that to JSON, making sure that all the keys are quoted, and then use --type json when you import, you'll end up with the desired behavior:

    {
        "123":"foo",
        "abc":"bar"
    }
    
    0 讨论(0)
  • 2021-01-13 06:03

    I was able to prefix the numeric string and that worked for me. Example:

    00012345 was imported as 12345 (Type Int) string00012345 was imported as string00012345 (Type String)

    My source was a SQL database so I just did

    select 'string'+column as name
    

    Of course, you also need to do a bit of post-processing to parse the string, but far less effort than converting a rather large tsv file to json.

    I also added +1 to the jira link above for the enhancement.

    0 讨论(0)
  • 2021-01-13 06:10

    As an alternative to @Jesse, you can do something similar in the mongo console, e.g.

    db.my_collection.find().forEach(function (obj) {
      db.my_collection.remove({_id: obj._id); // remove the old one
      obj._id = '' + obj._id; // change to string
      db.my_collection.save(obj); // resave
    });
    

    For non _id fields you can simply do:

    db.my_collection.find().forEach(function (obj) {
      obj.someField = '' + obj.someField; // change to string
      db.my_collection.save(obj); // resave
    });
    
    0 讨论(0)
  • 2021-01-13 06:12

    I encountered the same issue.

    I feel the simplest way is to convert the CSV file to a JSON file using an online tool and then import.

    This is the tool I used:

    http://www.convertcsv.com/csv-to-json.htm

    It lets you wrap the integer values of your CSV file in double quotes for your JSON file.

    If you have trouble importing this JSON file and encountering an error, just add --jsonArray to your import command. It will work for sure.

    mongoimport --host localhost --db mydb -c mycollection --type json --jsonArray --file <file_path>
    
    0 讨论(0)
提交回复
热议问题