Given a GEOJson file as follows:-
{
\"type\": \"FeatureCollection\",
\"features\": [
{
\"type\": \"Feature\",
\"properties\": {
\"FEATCODE\
An alternative solution could be for example:
jq '.features |= map_values(.tippecanoe.minzoom = 13)'
To test this, I created a sample JSON as
d = {'features': [{"type":"Feature", "properties":{"FEATCODE": 15014}} for i in range(0,N)]}
and inspected the execution time as a function of N
. Interestingly, while the map_values
approach seems to have linear complexity in N
, .features[].tippecanoe.minzoom = 13
exhibits quadratic behavior (already for N=50000, the former method finishes in about 0.8 seconds, while the latter needs around 47 seconds)
Alternatively, one might just do it manually with, e.g., Python:
import json
import sys
data = {}
with open(sys.argv[1], 'r') as F:
data = json.load(F)
extra_item = {"minzoom" : 13}
for feature in data['features']:
feature["tippecanoe"] = extra_item
with open(sys.argv[2], 'w') as F:
F.write(json.dumps(data))
A one-pass jq-only approach may require more RAM than is available. If that is the case, then a simple all-jq approach is shown below, together with a more economical approach based on using jq along with awk.
The two approaches are the same except for the reconstitution of the stream of objects into a single JSON document. This step can be accomplished very economically using awk.
In both cases, the large JSON input file with objects of the required form is assumed to be named input.json.
jq -c '.features[]' input.json |
jq -c '.tippecanoe.minzoom = 13' |
jq -c -s '{type: "FeatureCollection", features: .}'
jq -c '.features[]' input.json |
jq -c '.tippecanoe.minzoom = 13' | awk '
BEGIN {print "{\"type\": \"FeatureCollection\", \"features\": ["; }
NR==1 { print; next }
{print ","; print}
END {print "] }";}'
For comparison, an input file with 10,000,000 objects in .features[] was used. Its size is about 1GB.
u+s:
jq-only: 15m 15s
jq-awk: 7m 40s
jq one-pass using map: 6m 53s
In this case, map
rather than map_values
is far faster (*):
.features |= map(.tippecanoe.minzoom = 13)
However, using this approach will still require enough RAM.
p.s. If you want to use jq to generate a large file for timing, consider:
def N: 1000000;
def data:
{"features": [range(0;N) | {"type":"Feature", "properties": {"FEATCODE": 15014}}] };
(*) Using map
, 20s for 100MB, and approximately linear.
Here, based on the work of @nicowilliams at GitHub, is a solution that uses the streaming parser available with jq. The solution is very economical with memory, but is currently quite slow if the input is large.
The solution has two parts: a function for injecting the update into the stream produced using the --stream command-line option; and a function for converting the stream back to JSON in the original form.
jq -cnr --stream -f program.jq input.json
# inject the given object into the stream produced from "inputs" with the --stream option
def inject(object):
[object|tostream] as $object
| 2
| truncate_stream(inputs)
| if (.[0]|length == 1) and length == 1
then $object[]
else .
end ;
# Input: the object to be added
# Output: text
def output:
. as $object
| ( "[",
foreach fromstream( inject($object) ) as $o
(0;
if .==0 then 1 else 2 end;
if .==1 then $o else ",", $o end),
"]" ) ;
{}
| .tippecanoe.minzoom = 13
| output
def data(N):
{"features":
[range(0;2) | {"type":"Feature", "properties": {"FEATCODE": 15014}}] };
With N=2:
[
{"type":"Feature","properties":{"FEATCODE":15014},"tippecanoe":{"minzoom":13}}
,
{"type":"Feature","properties":{"FEATCODE":15014},"tippecanoe":{"minzoom":13}}
]