Is there a way to ignore header lines in a UNIX sort?

后端 未结 12 2176
余生分开走
余生分开走 2020-11-28 21:02

I have a fixed-width-field file which I\'m trying to sort using the UNIX (Cygwin, in my case) sort utility.

The problem is there is a two-line header at the top of t

相关标签:
12条回答
  • 2020-11-28 21:20
    head -2 <your_file> && nawk 'NR>2' <your_file> | sort
    

    example:

    > cat temp
    10
    8
    1
    2
    3
    4
    5
    > head -2 temp && nawk 'NR>2' temp | sort -r
    10
    8
    5
    4
    3
    2
    1
    
    0 讨论(0)
  • 2020-11-28 21:23

    Here's a bash shell function derived from the other answers. It handles both files and pipes. First argument is the file name or '-' for stdin. Remaining arguments are passed to sort. A couple examples:

    $ hsort myfile.txt
    $ head -n 100 myfile.txt | hsort -
    $ hsort myfile.txt -k 2,2 | head -n 20 | hsort - -r
    

    The shell function:

    hsort ()
    {
       if [ "$1" == "-h" ]; then
           echo "Sort a file or standard input, treating the first line as a header.";
           echo "The first argument is the file or '-' for standard input. Additional";
           echo "arguments to sort follow the first argument, including other files.";
           echo "File syntax : $ hsort file [sort-options] [file...]";
           echo "STDIN syntax: $ hsort - [sort-options] [file...]";
           return 0;
       elif [ -f "$1" ]; then
           local file=$1;
           shift;
           (head -n 1 $file && tail -n +2 $file | sort $*);
       elif [ "$1" == "-" ]; then
           shift;
           (read -r; printf "%s\n" "$REPLY"; sort $*);
       else
           >&2 echo "Error. File not found: $1";
           >&2 echo "Use either 'hsort <file> [sort-options]' or 'hsort - [sort-options]'";
           return 1 ;
       fi
    }
    
    0 讨论(0)
  • 2020-11-28 21:26

    So here's a bash function where arguments are exactly like sort. Supporting files and pipes.

    function skip_header_sort() {
        if [[ $# -gt 0 ]] && [[ -f ${@: -1} ]]; then
            local file=${@: -1}
            set -- "${@:1:$(($#-1))}"
        fi
        awk -vsargs="$*" 'NR<2{print; next}{print | "sort "sargs}' $file
    }
    

    How it works. This line checks if there is at least one argument and if the last argument is a file.

        if [[ $# -gt 0 ]] && [[ -f ${@: -1} ]]; then
    

    This saves the file to separate argument. Since we're about to erase the last argument.

            local file=${@: -1}
    

    Here we remove the last argument. Since we don't want to pass it as a sort argument.

            set -- "${@:1:$(($#-1))}"
    

    Finally, we do the awk part, passing the arguments (minus the last argument if it was the file) to sort in awk. This was orignally suggested by Dave, and modified to take sort arguments. We rely on the fact that $file will be empty if we're piping, thus ignored.

        awk -vsargs="$*" 'NR<2{print; next}{print | "sort "sargs}' $file
    

    Example usage with a comma separated file.

    $ cat /tmp/test
    A,B,C
    0,1,2
    1,2,0
    2,0,1
    
    # SORT NUMERICALLY SECOND COLUMN
    $ skip_header_sort -t, -nk2 /tmp/test
    A,B,C
    2,0,1
    0,1,2
    1,2,0
    
    # SORT REVERSE NUMERICALLY THIRD COLUMN
    $ cat /tmp/test | skip_header_sort -t, -nrk3
    A,B,C
    0,1,2
    2,0,1
    1,2,0
    
    0 讨论(0)
  • 2020-11-28 21:26

    With Python:

    import sys
    HEADER_ROWS=2
    
    for _ in range(HEADER_ROWS):
        sys.stdout.write(next(sys.stdin))
    for row in sorted(sys.stdin):
        sys.stdout.write(row)
    
    0 讨论(0)
  • 2020-11-28 21:27
    (head -n 2 <file> && tail -n +3 <file> | sort) > newfile
    

    The parentheses create a subshell, wrapping up the stdout so you can pipe it or redirect it as if it had come from a single command.

    0 讨论(0)
  • 2020-11-28 21:31

    In simple cases, sed can do the job elegantly:

        your_script | (sed -u 1q; sort)
    

    or equivalently,

        cat your_data | (sed -u 1q; sort)
    

    The key is in the 1q -- print first line (header) and quit (leaving the rest of the input to sort).

    For the example given, 2q will do the trick.

    The -u switch (unbuffered) is required for those seds (notably, GNU's) that would otherwise read the input in chunks, thereby consuming data that you want to go through sort instead.

    0 讨论(0)
提交回复
热议问题