Using awk to pull specific lines from a file

前端 未结 6 2036
醉话见心
醉话见心 2020-12-03 15:36

I have two files, one file is my data, and the other file is a list of line numbers that I want to extract from my data file. Can I use awk to read in my lines file, and the

相关标签:
6条回答
  • 2020-12-03 16:03

    Here is an awk example. inputfile is loaded up front, then matching records of datafile are output.

    awk \
      -v RS="[\r]*[\n]" \
      -v FILE="inputfile" \
      'BEGIN \
       {
         LINES = ","
         while ((getline Line < FILE))
         {
           LINES = LINES Line ","
         }
       }
       LINES ~ "," NR "," \
       {
         print
       }
      ' datafile
    
    0 讨论(0)
  • 2020-12-03 16:05
    awk 'NR == FNR {nums[$1]; next} FNR in nums' numberfile datafile
    

    simply referring to an array subscript creates the entry. Looping over the first file, while NR (record number) is equal to FNR (file record number) using the next statement stores all the line numbers in the array. After that when FNR of the second file is present in the array (true) the line is printed (which is the default action for "true").

    0 讨论(0)
  • 2020-12-03 16:07

    while read line;do echo $(sed -n '$(echo $line)p' Datafile.txt); done < numbersfile.txt

    0 讨论(0)
  • 2020-12-03 16:09

    I had the same problem. This is the solution already posted by Thor:

    cat datafile \
    | awk 'BEGIN{getline n<"numbers"} n==NR{print; getline n<"numbers"}'
    

    If like me you don't have a numbers file, but it is instead passed on from stdin and you don't want to generate a temporary numbers file, then this is an alternative solution:

    cat numbers \
    | awk '{while((getline line<"datafile")>0) {n++; if(n==$0) {print line;next}}}'
    
    0 讨论(0)
  • 2020-12-03 16:24

    This solution...

    awk 'NR == FNR {nums[$1]; next} FNR in nums' numberfile datafile

    ...only prints unique numbers in the numberfile. What if the numberfile contains repeated entries? Then sed is a better (but much slower) alternative:

    sed -nf <(sed 's/.*/&p/' numberfile) datafile

    0 讨论(0)
  • 2020-12-03 16:26

    One way with sed:

    sed 's/$/p/' linesfile | sed -n -f - datafile
    

    You can use the same trick with awk:

    sed 's/^/NR==/' linesfile | awk -f - datafile
    

    Edit - Huge files alternative

    With regards to huge number of lines it is not prudent to keep whole files in memory. The solution in that case can be to sort the numbers-file and read one line at a time. The following has been tested with GNU awk:

    extract.awk

    BEGIN {
      getline n < linesfile
      if(length(ERRNO)) {
        print "Unable to open linesfile '" linesfile "': " ERRNO > "/dev/stderr"
        exit
      }
    }
    
    NR == n { 
      print
      if(!(getline n < linesfile)) {
        if(length(ERRNO))
          print "Unable to open linesfile '" linesfile "': " ERRNO > "/dev/stderr"
        exit
      }
    }
    

    Run it like this:

    awk -v linesfile=$linesfile -f extract.awk infile
    

    Testing:

    echo "2
    4
    7
    8
    10
    13" | awk -v linesfile=/dev/stdin -f extract.awk <(paste <(seq 50e3) <(seq 50e3 | tac))
    

    Output:

    2   49999
    4   49997
    7   49994
    8   49993
    10  49991
    13  49988
    
    0 讨论(0)
提交回复
热议问题