Swap two columns - awk, sed, python, perl

后端 未结 9 1448
孤独总比滥情好
孤独总比滥情好 2020-12-01 01:03

I\'ve got data in a large file (280 columns wide, 7 million lines long!) and I need to swap the first two columns. I think I could do this with some kind of awk for loop, to

相关标签:
9条回答
  • 2020-12-01 01:39

    This might work for you (GNU sed):

    sed -i 's/^\([^\t]*\t\)\([^\t]*\t\)/\2\1/' file
    
    0 讨论(0)
  • 2020-12-01 01:39

    Maybe even with "inlined" Python - as in a Python script within a shell script - but only if you want to do some more scripting with Bash beforehand or afterwards... Otherwise it is unnecessarily complex.

    Content of script file process.sh:

    #!/bin/bash
    
    # inline Python script
    read -r -d '' PYSCR << EOSCR
    from __future__ import print_function
    import codecs
    import sys
    
    encoding = "utf-8"
    fn_in = sys.argv[1]
    fn_out = sys.argv[2]
    
    # print("Input:", fn_in)
    # print("Output:", fn_out)
    
    with codecs.open(fn_in, "r", encoding) as fp_in, \
            codecs.open(fn_out, "w", encoding) as fp_out:
        for line in fp_in:
            # split into two columns and rest
            col1, col2, rest = line.split("\t", 2)
            # swap columns in output
            fp_out.write("{}\t{}\t{}".format(col2, col1, rest))
    EOSCR
    
    # ---------------------
    # do setup work?
    # e. g. list files for processing
    
    # call python script with params
    python3 -c "$PYSCR" "$inputfile" "$outputfile"
    
    # do some more processing
    # e. g. rename outputfile to inputfile, ...
    

    If you only need to swap the columns for a single file, then you can also just create a single Python script and statically define the filenames. Or just use an answer above.

    0 讨论(0)
  • 2020-12-01 01:45

    Have you tried using the cut command? E.g.

    cat myhugefile | cut -c10-20,c1-9,c21- > myrearrangedhugefile
    
    0 讨论(0)
  • 2020-12-01 01:45

    This is also easy in perl:

    perl -pe 's/^(\S+)\t(\S+)/$2\t$1/;' file > outputfile
    
    0 讨论(0)
  • 2020-12-01 01:50

    No need to call anything else but your shell:

    bash> while read col1 col2 rest; do 
            echo $col2 $col1 $rest
          done <input_file
    

    Test:

    bash> echo "first second a c d e f g" | 
          while read col1 col2 rest; do 
            echo $col2 $col1 $rest
          done
    second first a b c d e f g
    
    0 讨论(0)
  • 2020-12-01 01:56

    Try this more relevant to your question :

    awk '{printf("%s\t%s\n", $2, $1)}' inputfile
    
    0 讨论(0)
提交回复
热议问题