passing bash variable for awk column specifier

前端 未结 2 1575
無奈伤痛
無奈伤痛 2021-01-24 06:35

There are loads of threads about passing a shell variable to awk, and I\'ve figured that out easily enough, but the variable I want to pass is the column specifier variable (

相关标签:
2条回答
  • 2021-01-24 06:55

    I'm going to add this as an answer because it does resolve the question as I posed it, notwithstanding Charles' excellent advice about the (myriad) areas I was going wrong.

    Altering the above code with Charles' point about the separate awk commands, I can now invoke the following (sorry yes, it's still using paste)...

    #!/bin/bash
    
    keyfile="$1"
    filetosort="$2"
    indexfield="$3"
    
    paste "$keyfile" <(awk -v field="$indexfield" 'NR==FNR{o[FNR]=$field; next} {t[$1]=$0} END{for(x=1; x<=FNR; x++){y=o[x]; print t[y]}}' "$keyfile" "$filetosort")
    

    I was missing the $ before the variable I was calling in the awk command, which is (part) of why my original code wasn't working, as well as not including the awk variable declaration in a single awk call.

    Thus, bash sortandmatch.sh keyfile filetosort 3 produces the output I want:

    PVCunit2_5  plu1704         PLT_01732   PLT_01732   4etv    4etv_A  39.0    12  0.00032 27.6    >4etv_A Ryanodine receptor 2; phosphorylation, cardiac, metal transport; 1.65A {Mus musculus}
    PVCunit2_4  plu1705         PLT_01733   PLT_01733   3j9q    3j9q_A  99.9    7.2e-30 1.9e-34 219.0   >3j9q_A Sheath; pyocin, bacteriocin, sheath, structural protein; 3.50A {Pseudomonas aeruginosa}
    XVC_pnf15   XBW1_RS06910    XBW1_RS06910    XBW1_RS06910    1fi0    1fi0_A  69.2    1.7 4.4e-05 22.8    >1fi0_A VPR protein, R ORF protein; helix, viral protein; NMR {Synthetic} SCOP: j.11.1.1
    PVCcif7     PAU_01999       PAU_01967   PAU_01967   5a3a    5a3a_A  47.5    7.3 0.00019 30.9    >5a3a_A SIR2 family protein; transferase, P-ribosyltransferase, metalloprotein, NAD-depen lipoylation, regulatory enzyme, rossmann fold; 1.54A {Streptococcus pyogenes} PDB: 5a3b_A* 5a3c_A*
    PVClumT15   PAU_02233       PAU_02192   PAU_02192   1tdp    1tdp_A  22.1    37.0    0.00096 27.2    >1tdp_A Carnobacteriocin B2 immunity protein; four-helix bundle, antimicrobial protein; NMR {Carnobacterium maltaromaticum} SCOP: a.29.8.1
    XVC_pnf3    XBW1_RS06850    XBW1_RS06850    XBW1_RS06850    3eaa    3eaa_A  87.7    0.13    3.4e-06 35.7    >3eaa_A EVPC; T6SS, unknown function; 2.79A {Edwardsiella tarda}
    PVCunit1_4  afp4            PAU_02778   PAU_02778   3j9q    3j9q_A  99.9    3.6e-29 9.5e-34 214.6   >3j9q_A Sheath; pyocin, bacteriocin, sheath, structural protein; 3.50A {Pseudomonas aeruginosa}
    PVCunit2_3  plu1706         PLT_01734   PLT_01734   3j9q    3j9q_A  100.0   1.6e-34 4.3e-39 253.7   >3j9q_A Sheath; pyocin, bacteriocin, sheath, structural protein; 3.50A {Pseudomonas aeruginosa}
    PVClumt17   PAK_2200        PAK_01998   PAK_01998   3k8p    3k8p_C  34.7    16.0    0.00041 34.1    >3k8p_C DSL1, KLLA0C02695P; intracellular trafficking, DSL1 complex, multisubunit tethering complex, snare proteins; 2.60A {Kluyveromyces lactis}
    PVClopT12   PAU_02101       PAU_02063   PAU_02063   4yap    4yap_A  31.1    20  0.00052 29.1    >4yap_A Glutathione S-transferase homolog; GSH-lyase GSH-dependent; 1.11A {Sphingobium SP} PDB: 4g10_A 4yav_A*
    
    0 讨论(0)
  • 2021-01-24 07:12

    When you pass awk -v a="$field", the specification of the awk variable a is only good for that single awk command. You can't expect a to be available in a completely different invocation of awk.

    Thus, you need to put it in-place directly:

    $ bashvar="2"
    $ echo 'foo bar baz' | awk -v awkvar="$bashvar" '{print $awkvar}'
    bar
    

    Or in your case:

    field=1
    awk -v a="$field" '
    NR==FNR {
      o[FNR]=$a;
      next;
    }
    
    { t[$1] = $0 }
    
    END {
      for(x=1; x<=FNR; x++) {
        y=o[x]
        printf("%s\t%s\n", y, t[y])
      }
    }' "$keyfile" "$filetosort"
    

    Points of note:

    • Our printf here is emitting both the key and the value, so there's no need to use paste to put the keyfile values back in.
    • $a is used to treat the awk variable a (assigned from shell variable field) as a variable name itself, and to perform an indirect reference -- thus, looking up the relevant column number.
    • Always, always quote your shell variables on expansion. Otherwise, you have no way of knowing how many argument to awk will be generated by the expansion of $keyfile -- it could be 0 (if there are no characters in the string not found in IFS); it could be 1, but it could also be a completely unbounded number (input file.txt would become two arguments, input and file.txt; * input * .txt would have each * replaced with a list of files).
    0 讨论(0)
提交回复
热议问题