问题
I'm looking for a way to find non-repeated elements in an array in bash.
Simple example:
joined_arrays=(CVE-2015-4840 CVE-2015-4840 CVE-2015-4860 CVE-2015-4860 CVE-2016-3598)
<magic>
non_repeated=(CVE-2016-3598)
To give context, the goal here is to end up with an array of all package update CVEs that aren't generally available via 'yum update' on a host due to being excluded. The way I came up with doing such a thing is to populate 3 preliminary arrays:
- available_updates=() #just what 'yum update' would provide
- all_updates=() #including excluded ones
- joined_updates=() # contents of both prior arrays Then apply logic to joined_updates=() that would return only elements that are included exactly once. Any element with two occurrences is one that can be updated normally and doesn't need to end up in the 'excluded_updates=()' array.
Hopefully this makes sense. As I was typing it out I'm wondering if it might be simpler to just remove all elements found in available_updates=() from all_updates=(), leaving the remaining ones as the excluded updates.
Thanks!
回答1:
One pure-bash approach is to store a counter in an associative array, and then look for items where the counter is exactly one:
declare -A seen=( ) # create an associative array (requires bash 4)
for item in "${joined_arrays[@]}"; do # iterate over original items
(( seen[$item] += 1 )) # increment value associated with item
done
declare -a non_repeated=( )
for item in "${!seen[@]}"; do # iterate over keys
if (( ${seen[$item]} == 1 )); then # if counter for that key is 1...
non_repeated+=( "$item" ) # ...add that item to the output array.
done
declare -p non_repeated # print result
Another, terser (but buggier -- doesn't work with values containing newline literals) approach is to take advantage of standard text manipulation tools:
non_repeated=( ) # setup
# use uniq -c to count; filter for results with a count of 1
while read -r count value; do
(( count == 1 )) && non_repeated+=( "$value" )
done < <(printf '%s\n' "${joined_arrays[@]}" | sort | uniq -c)
declare -p non_repeated # print result
...or, even terser (and buggier, requiring that the array value split into exactly one field in awk):
readarray -t non_repeated \
< <(printf '%s\n' "${joined_arrays[@]}" | sort | uniq -c | awk '$1 == 1 { print $2; }'
To crib an answer I really should have come up myself from @Aaron (who deserves an upvote from anyone using this; do note that it retains the doesn't-work-with-values-with-newlines bug), one can also use uniq -u
:
readarray -t non_repeated < <(printf '%s\n' "${joined_arrays[@]}" | sort | uniq -u)
回答2:
I would rely on uniq
.
It's -u
option is made for this exact case, outputting only the uniques occurrences. It relies on the input to be a sorted linefeed-separated list of tokens, hence the need for IFS
and sort
:
$ my_test_array=( 1 2 3 2 1 0 )
$ printf '%s\n' "${my_test_array[@]}" | sort | uniq -u
0
3
回答3:
Here is a single awk
based solution that doesn't require sort
:
arr=( 1 2 3 2 1 0 )
printf '%s\n' "${arr[@]}" |
awk '{++fq[$0]} END{for(i in fq) if (fq[i]==1) print i}'
0
3
来源:https://stackoverflow.com/questions/39253211/bash-find-non-repeated-elements-in-an-array