Bash: find non-repeated elements in an array

六眼飞鱼酱① 提交于 2020-07-30 03:39:08

问题


I'm looking for a way to find non-repeated elements in an array in bash.

Simple example:

joined_arrays=(CVE-2015-4840 CVE-2015-4840 CVE-2015-4860 CVE-2015-4860 CVE-2016-3598)
<magic>
non_repeated=(CVE-2016-3598)

To give context, the goal here is to end up with an array of all package update CVEs that aren't generally available via 'yum update' on a host due to being excluded. The way I came up with doing such a thing is to populate 3 preliminary arrays:

  • available_updates=() #just what 'yum update' would provide
  • all_updates=() #including excluded ones
  • joined_updates=() # contents of both prior arrays Then apply logic to joined_updates=() that would return only elements that are included exactly once. Any element with two occurrences is one that can be updated normally and doesn't need to end up in the 'excluded_updates=()' array.

Hopefully this makes sense. As I was typing it out I'm wondering if it might be simpler to just remove all elements found in available_updates=() from all_updates=(), leaving the remaining ones as the excluded updates.

Thanks!


回答1:


One pure-bash approach is to store a counter in an associative array, and then look for items where the counter is exactly one:

declare -A seen=( )                   # create an associative array (requires bash 4)
for item in "${joined_arrays[@]}"; do # iterate over original items
  (( seen[$item] += 1 ))              # increment value associated with item
done

declare -a non_repeated=( )
for item in "${!seen[@]}"; do         # iterate over keys
  if (( ${seen[$item]} == 1 )); then  # if counter for that key is 1...
    non_repeated+=( "$item" )         # ...add that item to the output array.
done

declare -p non_repeated               # print result

Another, terser (but buggier -- doesn't work with values containing newline literals) approach is to take advantage of standard text manipulation tools:

non_repeated=( )        # setup

# use uniq -c to count; filter for results with a count of 1
while read -r count value; do
  (( count == 1 )) && non_repeated+=( "$value" )
done < <(printf '%s\n' "${joined_arrays[@]}" | sort | uniq -c)

declare -p non_repeated # print result

...or, even terser (and buggier, requiring that the array value split into exactly one field in awk):

readarray -t non_repeated \
  < <(printf '%s\n' "${joined_arrays[@]}" | sort | uniq -c | awk '$1 == 1 { print $2; }'

To crib an answer I really should have come up myself from @Aaron (who deserves an upvote from anyone using this; do note that it retains the doesn't-work-with-values-with-newlines bug), one can also use uniq -u:

readarray -t non_repeated < <(printf '%s\n' "${joined_arrays[@]}" | sort | uniq -u)



回答2:


I would rely on uniq.

It's -u option is made for this exact case, outputting only the uniques occurrences. It relies on the input to be a sorted linefeed-separated list of tokens, hence the need for IFS and sort :

$ my_test_array=( 1 2 3 2 1 0 )
$ printf '%s\n' "${my_test_array[@]}" | sort | uniq -u
0
3



回答3:


Here is a single awk based solution that doesn't require sort:

arr=( 1 2 3 2 1 0 )

printf '%s\n' "${arr[@]}" | 
awk '{++fq[$0]} END{for(i in fq) if (fq[i]==1) print i}'
0
3


来源:https://stackoverflow.com/questions/39253211/bash-find-non-repeated-elements-in-an-array

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!