问题
Issue with Unix Split command
for splitting large data: split -l 1000 file.json myfile
. Want to split this file into multiple files of 1000 records each. But Im getting the output as single file - no change.
P.S. File is created converting Pandas Dataframe to JSON.
Edit: It turn outs that my JSON is formatted in a way that it contains only one row. wc -l file.json
is returning 0
Here is the sample: file.json
[
{"id":683156,"overall_rating":5.0,"hotel_id":220216,"hotel_name":"Beacon Hill Hotel","title":"\u201cgreat hotel, great location\u201d","text":"The rooms here are not palatial","author_id":"C0F"},
{"id":692745,"overall_rating":5.0,"hotel_id":113317,"hotel_name":"Casablanca Hotel Times Square","title":"\u201cabsolutely delightful\u201d","text":"I travelled from Spain...","author_id":"8C1"}
]
回答1:
I'd recommend spliting the JSON array with jq
(see manual).
cat file.json | jq length # get length of an array
cat file.json | jq -c '.[0:999]' # first 1000 items
cat file.json | jq -c '.[1000:1999]' # second 1000 items
...
Notice -c
for compact result (not pretty printed).
For automation, you can code a simple bash script to split your file into chunks given the array length (jq length
).
回答2:
Invoking jq once per partition plus once to determine the number of partitions would be extremely inefficient. The following solution suffices to achieve the partitioning deemed acceptable in your answer:
jq -c ".[]" file.json | split -l 1000
If, however, it is deemed necessary for each file to be pretty-printed, you could run jq -s .
for each file, which would still be more efficient than running .[N:N+S]
multiple times.
If each partition should itself be a single JSON array, then see Splitting / chunking JSON files with JQ in Bash or Fish shell?
回答3:
After asking elsewhere, the file was, in fact a single line.
Reformatting with JQ (in compact form), would enable the split, though to process the file would at least need the first and last character to be deleted (or add '[' & ']' to the split files)
来源:https://stackoverflow.com/questions/62609271/splitting-large-json-data-using-unix-command-split