Dynamic indirect Bash array

后端 未结 3 637
挽巷
挽巷 2021-01-29 00:03

I have logs in this format:

log1,john,time,etc
log2,peter,time,etc
log3,jack,time,etc
log4,peter,time,etc

I want to create a list for every per

相关标签:
3条回答
  • 2021-01-29 00:19

    Be warned: The below uses namevars, a new bash 4.3 feature.


    First: I would strongly suggest namespacing your arrays with a prefix to avoid collisions with unrelated variables. Thus, using content_ as that prefix:

    read_arrays() {
      while IFS= read -r line && IFS=, read -r -a fields <<<"$line"; do
        name=${fields[1]}
        declare -g -a "content_${fields[1]}"
        declare -n cur_array="content_${fields[1]}"
        cur_array+=( "$line" )
        unset -n cur_array
      done
    }
    

    Then:

    lines_for() {
      declare -n cur_array="content_$1"
      printf '%s\n' "${#cur_array[@]}" ## emit length of array for given person
    }
    

    ...or...

    for_each_line() {
      declare -n cur_array="content_$1"; shift
      for line in "${cur_array[@]}"; do
        "$@" "$line"
      done
    }
    

    Tying all this together:

    $ read_arrays <<'EOF'
    log1,john,time,etc
    log2,peter,time,etc
    log3,jack,time,etc
    log4,peter,time,etc
    EOF
    $ lines_for peter
    2
    $ for_each_line peter echo
    log2,peter,time,etc
    log4,peter,time,etc
    

    ...and, if you really want the format you asked for, with the number of columns as explicit data, and variable names that aren't safely namespaced, it's easy to convert from one to the other:

    # this should probably be run in a subshell to avoid namespace pollution
    # thus, (generate_stupid_format) >output
    generate_stupid_format() {
      for scoped_varname in "${!content_@}"; do
        unscoped_varname="${scoped_varname#content_}"
        declare -n unscoped_var=$unscoped_varname
        declare -n scoped_var=$scoped_varname
        unscoped_var=( "${#scoped_var[@]}" "${scoped_var[@]}" )
        declare -p "$unscoped_varname"
      done
    }
    
    0 讨论(0)
  • 2021-01-29 00:19

    You can use awk. As a demo:

    awk -F, '{a1[$2]=a1[$2]" \""$0"\""; sum[$2]++} END{for (e in sum){print e"=("  "\""sum[e]"\""a1[e]")"}}' file
    john=("1" "log1,john,time,etc")
    peter=("2" "log2,peter,time,etc" "log4,peter,time,etc")
    jack=("1" "log3,jack,time,etc")
    
    0 讨论(0)
  • 2021-01-29 00:27

    Bash with Coreutils, grep and sed

    If I understand your code right, you try to have multidimensional arrays, which Bash doesn't support. If I were to solve this problem from scratch, I'd use this mix of command line tools (see security concerns at the end of the answer!):

    #!/bin/bash
    
    while read name; do
        printf "%s=(\"%d\" \"%s\")\n" \
            "$name" \
            "$(grep -c "$name" "$1")" \
            "$(grep "$name" "$1" | tr $'\n' ' ' | sed 's/ /" "/g;s/" "$//')"
    done < <(cut -d ',' -f 2 "$1" | sort -u)
    

    Sample output:

    $ ./SO.sh infile
    jack=("1" "log3,jack,time,etc")
    john=("1" "log1,john,time,etc")
    peter=("2" "log2,peter,time,etc" "log4,peter,time,etc")
    

    This uses process substitution to prepare the log file so we can loop over unique names; the output of the substitution looks like

    $ cut -d ',' -f 2 "$1" | sort -u
    jack
    john
    peter
    

    i.e., a list of unique names.

    For each name, we then print the summarized log line with

    printf "%s=(\"%d\" \"%s\")\n"
    

    Where

    • The %s string is just the name ("$name").
    • The log line count is the output of a grep command,

      grep -c "$name" "$1"
      

      which counts the number of occurrences of "$name". If the name can occur elsewhere in the log line, we can limit the search to just the second field of the log lines with

      grep -c "$name" <(cut -d ',' -f 2 "$1")
      
    • Finally, to get all log lines on one line with proper quoting and all, we use

      grep "$name" "$1" | tr $'\n' ' ' | sed 's/ /" "/g;s/" "$//'
      

      This gets all lines containing "$name", replaces newlines with spaces, then surrounds the spaces with quotes and removes the extra quotes from the end of the line.

    Pure Bash

    After initially thinking that pure Bash would be too cumbersome, it turned out to be not all that complicated:

    #!/bin/bash
    
    declare -A count
    declare -A lines
    
    old_ifs=IFS
    IFS=,
    while read -r -a line; do
        name="${line[1]}"
        (( ++count[$name] ))
        lines[$name]+="\"${line[*]}\" "
    done < "$1"
    
    for name in "${!count[@]}"; do
        printf "%s=(\"%d\" %s)\n" "$name" "${count[$name]}" "${lines[$name]% }"
    done
    
    IFS="$old_ifs"
    

    This updates two associative arrays while looping over the input file: count keeps track of the number of times a certain name occurs, and lines appends the log lines to an entry per name.

    To separate fields by commas, we set the input field separator IFS to a comma (but save it beforehand so it can be reset at the end).

    read -r -a reads the lines into an array line with comma separated fields, so the name is now in ${line[1]}. We increase the count for that name in the arithmetic expression (( ... )), and append (+=) the log line in the next line.

    ${line[*]} prints all fields of the array separated by IFS, which is exactly what we want. We also add a space here; the unwanted space at the end of the line (after the last element) will be removed later.

    The second loop iterates over all the keys of the count array (the names), then prints the properly formatted line for each. ${lines[$name]% } removes the space from the end of the line.

    Security concerns

    As it seems that the output of these scripts is supposed to be reused by the shell, we might want to prevent malicious code execution if we can't trust the contents of the log file.

    A way to do that for the Bash solution (hat tip: Charles Duffy) would be the following: the for loop would have to be replaced by

    for name in "${!count[@]}"; do
        IFS=' ' read -r -a words <<< "${lines[$name]}"
        printf -v words_str '%q ' "${words[@]}"
        printf "%q=(\"%d\" %s)\n" "$name" "${count[$name]}" "${words_str% }"
    done
    

    That is, we split the combined log lines into an array words, print that with the %q formatting flag into a string words_str and then use that string for our output, resulting in escaped output like this:

    peter=("2" \"log2\,peter\,time\,etc\" \"log4\,peter\,time\,etc\")
    jack=("1" \"log3\,jack\,time\,etc\")
    john=("1" \"log1\,john\,time\,etc\")
    

    The analogous could be done for the first solution.

    0 讨论(0)
提交回复
热议问题