问题
Let's say I have: [[1,2], [3,9], [4,2], [], []]
I would like to know the scripts to get:
The number of nested lists which are/are not non-empty. ie want to get:
[3,2]
The number of nested lists which contain or not contain number 3. ie want to get:
[1,4]
The number of nested lists for which the sum of the elements is/isn't less than
4
. ie want to get:[3,2]
ie basic examples of nested data partition.
回答1:
Since stackoverflow.com is not a coding service, I'll confine this response to the first question, with the hope that it will convince you that learning jq is worth the effort.
Let's begin by refining the question about the counts of the lists
"which are/are not empty" to emphasize that the first number in the answer should correspond to the number of empty lists (2), and the second number to the rest (3). That is, the required answer should be [2,3]
.
Solution using built-in filters
The next step might be to ask whether group_by
can be used. If the ordering did not matter, we could simply write:
group_by(length==0) | map(length)
This returns [3,2]
, which is not quite what we want. It's now worth checking the documentation about what group_by
is supposed to do. On checking the details at https://stedolan.github.io/jq/manual/#Builtinoperatorsandfunctions,
we see that by design group_by
does indeed sort by the grouping value.
Since in jq, false < true
, we could fix our first attempt by writing:
group_by(length > 0) | map(length)
That's nice, but since group_by
is doing so much work when all we really need is a way to count, it's clear we should be able to come up with a more efficient (and hopefully less opaque) solution.
An efficient solution
At its core the problem boils down to counting, so let's define a generic tabulate
filter for producing the counts of distinct string values. Here's a def that will suffice for present purposes:
# Produce a JSON object recording the counts of distinct
# values in the given stream, which is assumed to consist
# solely of strings.
def tabulate(stream):
reduce stream as $s ({}; .[$s] += 1);
An efficient solution can now be written down in just two lines:
tabulate(.[] | length==0 | tostring )
| [.["true", "false"]]
QED
p.s.
The function named tabulate
above is sometimes called bow
(for "bag of words"). In some ways, that would be a better name, especially as it would make sense to reserve the name tabulate
for similar functionality that would work for arbitrary streams.
来源:https://stackoverflow.com/questions/56179292/jq-groupby-and-nested-json-arrays