I had tried a couple different approaches to export to csv using steps outlined here and here. But I could not get either to work. So, here is what I did (my largest table was about 2GB). This works relatively quickly even though it seems like a lot of steps...better than fighting random code that Google may have changed for hours on end, too:
- Go into Cloud Storage and create 2 new buckets "data_backup" and "data_export". You can skip this if you already have a bucket to store things in.
- Go into "My Console" > Google Datastore > Admin > Open Datastore Admin for the datastore you are trying to convert.
- Check off the entity or entities that you want to backup and click "Backup Entities". I did one at a time since I only had like 5 tables to export rather than checking off all 5 at once.
- Indicate the Google Storage (gs) bucket you want to store them in
- Now go to Google Big Query (I had never used this before but it was cake to get going)
- Click the little down arrow and select "Create a New Dataset" and give it a name.
- Then click the down arrow next to the new dataset you just created and select "Create New Table". Walk through the steps to import selecting "Cloud Datastore Backup" under the Select Data step. Then choose whichever backup that you want to import to Big Query so you can export it to csv in the next step.
- Once the table imports (which was pretty quick for mine), click the down arrow next to the table name and select "Export". You can export directly to csv and you can save to the google storage bucket you created for the export and then download from there.
Here's a few tips:
- If your data has nested relationships, you will have to export to JSON rather than CSV (they also offer avro format whatever that is)
- I used json2csv to convert my exported JSON files that could not be saved as csv. It runs a little slow on big tables but gets it done.
- I had to split the 2GB file into 2 files because of a python memory error in json2csv. I used gsplit to split the files and checked off the option under Other Properties > Tags & Headers > Do not add Gsplit tags...(this made sure Gsplit did not add any data to the split files)
Like I said, this was actually pretty quick even though it is a number of steps. Hope it helps someone avoid a bunch of time spent trying to convert strange backup file formats or run code that may not work anymore.