Process huge GEOJson file with jq

后端 未结 4 1974
庸人自扰
庸人自扰 2021-01-24 07:26

Given a GEOJson file as follows:-

{
  \"type\": \"FeatureCollection\",
  \"features\": [
   {
     \"type\": \"Feature\",
     \"properties\": {
     \"FEATCODE\         


        
4条回答
  •  夕颜
    夕颜 (楼主)
    2021-01-24 07:37

    A one-pass jq-only approach may require more RAM than is available. If that is the case, then a simple all-jq approach is shown below, together with a more economical approach based on using jq along with awk.

    The two approaches are the same except for the reconstitution of the stream of objects into a single JSON document. This step can be accomplished very economically using awk.

    In both cases, the large JSON input file with objects of the required form is assumed to be named input.json.

    jq-only

    jq -c  '.features[]' input.json |
        jq -c '.tippecanoe.minzoom = 13' |
        jq -c -s '{type: "FeatureCollection", features: .}'
    

    jq and awk

    jq -c '.features[]' input.json |
       jq -c '.tippecanoe.minzoom = 13' | awk '
         BEGIN {print "{\"type\": \"FeatureCollection\", \"features\": ["; }
         NR==1 { print; next }
               {print ","; print}
         END   {print "] }";}'
    

    Performance comparison

    For comparison, an input file with 10,000,000 objects in .features[] was used. Its size is about 1GB.

    u+s:

    jq-only:              15m 15s
    jq-awk:                7m 40s
    jq one-pass using map: 6m 53s
    

提交回复
热议问题