I\'d like to flatten a nested json object, e.g. {\"a\":{\"b\":1}}
to {\"a.b\":1}
in order to digest it in solr.
I have 11 TB of json files whi
As it turns out, curl -XPOST 'http://localhost:8983/solr/flat/update/json/docs' -d @json_file
does just this:
{
"a.b":[1],
"id":"24e3e780-3a9e-4fa7-9159-fc5294e803cd",
"_version_":1535841499921514496
}
EDIT 1: solr 6.0.1 with bin/solr -e cloud
. collection name is flat
, all the rest are default (with data-driven-schema
which is also default).
EDIT 2: The final script I used: find . -name '*.json' -exec curl -XPOST 'http://localhost:8983/solr/collection1/update/json/docs' -d @{} \;
.
EDIT 3: Is is also possible to parallel with xargs and to add the id field with jq: find . -name '*.json' -print0 | xargs -0 -n 1 -P 8 -I {} sh -c "cat {} | jq '. + {id: .a.b}' | curl -XPOST 'http://localhost:8983/solr/collection/update/json/docs' -d @-"
where -P
is the parallelism factor. I used jq to set an id so multiple uploads of the same document won't create duplicates in the collection (when I searched for the optimal value of -P
it created duplicates in the collection)