I have the following input file:
a 1 o p
b 2 o p p
c 3 o p p p
in the last line there is a double space between the last p\'s
GNU sed
remove first n fields
sed -r 's/([^ ]+ +){2}//' file
GNU awk 4.0+
awk '{sub("([^"FS"]"FS"){2}","")}1' file
GNU awk <4.0
awk --re-interval '{sub("([^"FS"]"FS"){2}","")}1' file
Incase FS one doesn't work(Eds suggestion)
awk '{sub(/([^ ] ){2}/,"")}1' file
Replace 2 with number of fields you wish to remove
Another way(doesn't require re-interval)
awk '{for(i=0;i<2;i++)sub($1"[[:space:]]*","")}1' file
Further edit
As advised by EdMorton it is bad to use fields in sub as they may contain metacharacters so here is an alternative(again!)
awk '{for(i=0;i<2;i++)sub(/[^[:space:]]+[[:space:]]*/,"")}1' file
o p
o p p
o p p p
Since you want to preserve spaces, let's just use cut
:
$ cut -d' ' -f2- file
1 o p
2 o p p
3 o p p p
Or for example to start by column 4:
$ cut -d' ' -f4- file
p
p p
p p p
This will work as long as the columns you are removing are one-space separated.
If the columns you are removing also contain different amount of spaces, you can use the beautiful solution by Ed Morton in Print all but the first three columns:
awk '{sub(/[[:space:]]*([^[:space:]]+[[:space:]]+){1}/,"")}1'
^
number of cols to remove
$ cat a
a 1 o p
b 2 o p p
c 3 o p p p
$ awk '{sub(/[[:space:]]*([^[:space:]]+[[:space:]]+){2}/,"")}1' a
o p
o p p
o p p p
In Perl, you can use split with capturing to keep the delimiters:
perl -ne '@f = split /( +)/; print @f[ 1 * 2 .. $#f ]'
# ^
# |
# column number goes
# here (starting from 0)
If you want to preserve all spaces after the start of the second column, this will do the trick:
{
match($0, ($1 "[ \\t*]+"))
print substr($0, RSTART+RLENGTH)
}
The call to match locates the start of the first 'token' on the line and the length of the first token and the whitespace that follows it. Then you just print everything on the line after that.
You could generalize it somewhat to ignore the first N tokens this way:
BEGIN {
N = 2
}
{
r = ""
for (i=1; i<=N; i++) {
r = (r $i "[ \\t*]+")
}
match($0, r)
print substr($0, RSTART+RLENGTH)
}
Applying the above script to your example input yields:
o p
o p p
o p p p