How to delete the first column ( which is in fact row names) from a data file in linux?

前端未结

关注

 5  1870

I have data file with many thousands columns and rows. I want to delete the first column which is in fact the row counter. I used this command in linux:

cut


                      
              相关标签:


      
      
        
          5条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  攒了一身酷        
                
              
                            
                2021-01-01 11:31
              
            
            
                                                                       
@Karafka I had CSV files so I added the "," separator (you can replace with yours

cut -d"," -f2- input.csv  > output.csv


Then, I used a loop to go over all files inside the directory

# files are in the directory tmp/
for f in tmp/*
do
    name=`basename $f`
    echo "processing file : $name"
    #kepp all column excep the first one of each csv file 

    cut -d"," -f2- $f > new/$name
    #files using the same names are stored in directory new/  
done

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  清酒与你        
                
              
                            
                2021-01-01 11:34
              
            
            
                                                                       
As @karakfa notes, it looks like it's the leading whitespace which is causing your issues.

Here's a sed oneliner to do the job (that will account for spaces or tabs):

sed -i.bak "s|^[ \t]\+[0-9]\+[ \t]\+||" input.txt


Explanation:

-i       edit existing file in place
.bak     backup original file and add .bak file extension (can use whatever you like)

s        substitute
|        separator (easiest character to read as sed separator IMO)
^        start match at start of the line
[ \t]    match space or tab
\+       match one or more times (escape required so sed does not interpret '+' literally)
[0-9]    match any number 0 - 9


As noted; the input.txt file will be edited in place. The original content of input.txt will be saved as input.txt.bak. Use just -i instead if you don't want a backup of the original file.

Also, if you know that they are definitely leading spaces (not tabs), you could shorten it to this:

sed -i.bak "s|^ \+[0-9]\+[ \t]\+||" input.txt

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  一生所求        
                
              
                            
                2021-01-01 11:43
              
            
            
                                                                       
You can use cut command with --complement option:

cut -f1 -d" " --complement input.file > output.file


This will output all columns except the first one.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  既然无缘        
                
              
                            
                2021-01-01 11:45
              
            
            
                                                                       
idiomatic use of cut will be 

cut -f2- input > output


if you delimiter is tab ("\t").

Or, simply with awk magic (will work for both space and tab delimiter)

 awk '{$1=""}1' input | awk '{$1=$1}1' > output


first awk will delete field 1, but leaves a delimiter, second awk removes the delimiter.  Default output delimiter will be space, if you want to change to tab, add -vOFS="\t" to the second awk.

UPDATED 

Based on your updated input the problem is the initial spaces that cut treats as multiple columns.  One way to address is to remove them first before feeding to cut

sed 's/^ *//' input | cut -d" " -f2- > output


or use the awk alternative above which will work in this case as well.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  栀梦        
                
              
                            
                2021-01-01 11:53
              
            
            
                                                                       
You can also achieve this with grep:

grep -E -o '[[:digit:]]([[:space:]][[:digit:]]){3}$' input.txt


Which assumes single character digit and space columns. To accommodate a variable number of spaces and digits you can do:

grep -E -o '[[:digit:]]+([[:space:]]+[[:digit:]]+){3}$' input.txt


If your grep supports the -P flag (--perl-regexp) you can do:

grep -P -o '\d+(\s+\d+){3}$' input.txt


And here are a few options if you are using GNU sed:

sed 's/^\s\+\w\+\s\+//' input.txt
sed 's/^\s\+\S\+\s\+//' input.txt
sed 's/^\s\+[0-9]\+\s\+//' input.txt
sed 's/^\s\+[[:digit:]]\+\s\+//' input.txt


Note that the grep regexes are matching the parts that we want to keep while the sed regexes are matching the parts we want to remove.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复