Flatten nested JSON using jq

前端 未结 4 406
独厮守ぢ
独厮守ぢ 2021-02-04 09:58

I\'d like to flatten a nested json object, e.g. {\"a\":{\"b\":1}} to {\"a.b\":1} in order to digest it in solr.

I have 11 TB of json files whi

4条回答
  •  予麋鹿
    予麋鹿 (楼主)
    2021-02-04 10:41

    As it turns out, curl -XPOST 'http://localhost:8983/solr/flat/update/json/docs' -d @json_file does just this:

    {
        "a.b":[1],
        "id":"24e3e780-3a9e-4fa7-9159-fc5294e803cd",
        "_version_":1535841499921514496
    }
    

    EDIT 1: solr 6.0.1 with bin/solr -e cloud. collection name is flat, all the rest are default (with data-driven-schema which is also default).

    EDIT 2: The final script I used: find . -name '*.json' -exec curl -XPOST 'http://localhost:8983/solr/collection1/update/json/docs' -d @{} \;.

    EDIT 3: Is is also possible to parallel with xargs and to add the id field with jq: find . -name '*.json' -print0 | xargs -0 -n 1 -P 8 -I {} sh -c "cat {} | jq '. + {id: .a.b}' | curl -XPOST 'http://localhost:8983/solr/collection/update/json/docs' -d @-" where -P is the parallelism factor. I used jq to set an id so multiple uploads of the same document won't create duplicates in the collection (when I searched for the optimal value of -P it created duplicates in the collection)

提交回复
热议问题