How to delete the first column ( which is in fact row names) from a data file in linux?

前端 未结 5 1870
失恋的感觉
失恋的感觉 2021-01-01 11:10

I have data file with many thousands columns and rows. I want to delete the first column which is in fact the row counter. I used this command in linux:

cut          


        
相关标签:
5条回答
  • 2021-01-01 11:31

    @Karafka I had CSV files so I added the "," separator (you can replace with yours

    cut -d"," -f2- input.csv  > output.csv
    

    Then, I used a loop to go over all files inside the directory

    # files are in the directory tmp/
    for f in tmp/*
    do
        name=`basename $f`
        echo "processing file : $name"
        #kepp all column excep the first one of each csv file 
    
        cut -d"," -f2- $f > new/$name
        #files using the same names are stored in directory new/  
    done
    
    0 讨论(0)
  • 2021-01-01 11:34

    As @karakfa notes, it looks like it's the leading whitespace which is causing your issues.

    Here's a sed oneliner to do the job (that will account for spaces or tabs):

    sed -i.bak "s|^[ \t]\+[0-9]\+[ \t]\+||" input.txt
    

    Explanation:

    -i       edit existing file in place
    .bak     backup original file and add .bak file extension (can use whatever you like)
    
    s        substitute
    |        separator (easiest character to read as sed separator IMO)
    ^        start match at start of the line
    [ \t]    match space or tab
    \+       match one or more times (escape required so sed does not interpret '+' literally)
    [0-9]    match any number 0 - 9
    

    As noted; the input.txt file will be edited in place. The original content of input.txt will be saved as input.txt.bak. Use just -i instead if you don't want a backup of the original file.

    Also, if you know that they are definitely leading spaces (not tabs), you could shorten it to this:

    sed -i.bak "s|^ \+[0-9]\+[ \t]\+||" input.txt
    
    0 讨论(0)
  • 2021-01-01 11:43

    You can use cut command with --complement option:

    cut -f1 -d" " --complement input.file > output.file
    

    This will output all columns except the first one.

    0 讨论(0)
  • 2021-01-01 11:45

    idiomatic use of cut will be

    cut -f2- input > output
    

    if you delimiter is tab ("\t").

    Or, simply with awk magic (will work for both space and tab delimiter)

     awk '{$1=""}1' input | awk '{$1=$1}1' > output
    

    first awk will delete field 1, but leaves a delimiter, second awk removes the delimiter. Default output delimiter will be space, if you want to change to tab, add -vOFS="\t" to the second awk.

    UPDATED

    Based on your updated input the problem is the initial spaces that cut treats as multiple columns. One way to address is to remove them first before feeding to cut

    sed 's/^ *//' input | cut -d" " -f2- > output
    

    or use the awk alternative above which will work in this case as well.

    0 讨论(0)
  • 2021-01-01 11:53

    You can also achieve this with grep:

    grep -E -o '[[:digit:]]([[:space:]][[:digit:]]){3}$' input.txt
    

    Which assumes single character digit and space columns. To accommodate a variable number of spaces and digits you can do:

    grep -E -o '[[:digit:]]+([[:space:]]+[[:digit:]]+){3}$' input.txt
    

    If your grep supports the -P flag (--perl-regexp) you can do:

    grep -P -o '\d+(\s+\d+){3}$' input.txt
    

    And here are a few options if you are using GNU sed:

    sed 's/^\s\+\w\+\s\+//' input.txt
    sed 's/^\s\+\S\+\s\+//' input.txt
    sed 's/^\s\+[0-9]\+\s\+//' input.txt
    sed 's/^\s\+[[:digit:]]\+\s\+//' input.txt
    

    Note that the grep regexes are matching the parts that we want to keep while the sed regexes are matching the parts we want to remove.

    0 讨论(0)
提交回复
热议问题