How to slice a variable into array indexes?

白昼怎懂夜的黑 提交于 2019-12-23 13:25:33

问题


There is this typical problem: given a list of values, check if they are present in an array.

In awk, the trick val in array does work pretty well. Hence, the typical idea is to store all the data in an array and then keep doing the check. For example, this will print all lines in which the first column value is present in the array:

awk 'BEGIN {<<initialize the array>>} $1 in array_var' file

However, it is initializing the array takes some time because val in array checks if the index val is in array, and what we normally have stored in array is a set of values.

This becomes more relevant when providing values from command line, where those are the elements that we want to include as indexes of an array. For example, in this basic example (based on a recent answer of mine, which triggered my curiosity):

$ cat file
hello 23
bye 45
adieu 99
$ awk -v values="hello adieu" 'BEGIN {split(values,v); for (i in v) names[v[i]]} $1 in names' file
hello 23
adieu 99
  • split(values,v) slices the variable values into an array v[1]="hello"; v[2]="adieu"
  • for (i in v) names[v[i]] initializes another array names[] with names["hello"] and names["adieu"] with empty value. This way, we are ready for
  • $1 in names that checks if the first column is any of the indexes in names[].

As you see, we slice into a temp variable v to later on initialize the final and useful variable names[].

Is there any faster way to initialize the indexes of an array instead of setting one up and then using its values as indexes of the definitive?


回答1:


No, that is the fastest (due to hash lookup) and most robust (due to string comparison) way to do what you want.

This:

BEGIN{split(values,v); for (i in v) names[v[i]]}

happens once on startup and will take close to no time while this:

$1 in array_var

which happens once for every line of input (and so is the place that needs to have optimal performance) is a hash lookup and so the fastest way to compare a string value to a set of strings.




回答2:


not an array solution but one trick is to use pattern matching. To eliminate partial matches wrap the search and array values with the delimiter. For your example,

$ awk -v values="hello adieu" 'FS values FS ~ FS $1 FS' file
hello 23
adieu 99


来源:https://stackoverflow.com/questions/40846595/how-to-slice-a-variable-into-array-indexes

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!