Splitting large JSON data using Unix command Split

和自甴很熟 提交于 2020-07-09 12:13:04

问题


Issue with Unix Split command for splitting large data: split -l 1000 file.json myfile. Want to split this file into multiple files of 1000 records each. But Im getting the output as single file - no change.

P.S. File is created converting Pandas Dataframe to JSON.

Edit: It turn outs that my JSON is formatted in a way that it contains only one row. wc -l file.json is returning 0

Here is the sample: file.json

[
{"id":683156,"overall_rating":5.0,"hotel_id":220216,"hotel_name":"Beacon Hill Hotel","title":"\u201cgreat hotel, great location\u201d","text":"The rooms here are not palatial","author_id":"C0F"},
{"id":692745,"overall_rating":5.0,"hotel_id":113317,"hotel_name":"Casablanca Hotel Times Square","title":"\u201cabsolutely delightful\u201d","text":"I travelled from Spain...","author_id":"8C1"}
]

回答1:


I'd recommend spliting the JSON array with jq (see manual).

cat file.json | jq length              # get length of an array
cat file.json | jq -c '.[0:999]'       # first 1000 items
cat file.json | jq -c '.[1000:1999]'   # second 1000 items
...

Notice -c for compact result (not pretty printed).

For automation, you can code a simple bash script to split your file into chunks given the array length (jq length).




回答2:


Invoking jq once per partition plus once to determine the number of partitions would be extremely inefficient. The following solution suffices to achieve the partitioning deemed acceptable in your answer:

jq -c ".[]" file.json | split -l 1000

If, however, it is deemed necessary for each file to be pretty-printed, you could run jq -s . for each file, which would still be more efficient than running .[N:N+S] multiple times.

If each partition should itself be a single JSON array, then see Splitting / chunking JSON files with JQ in Bash or Fish shell?




回答3:


After asking elsewhere, the file was, in fact a single line.

Reformatting with JQ (in compact form), would enable the split, though to process the file would at least need the first and last character to be deleted (or add '[' & ']' to the split files)



来源:https://stackoverflow.com/questions/62609271/splitting-large-json-data-using-unix-command-split

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!