I need to write a script to replace all the numbers greater than an specified number which is in following position.
1499011200 310961583 142550756 313415036 14
Although it's an old-ish question, it's worth adding that this could also be handled using conditions:
sed -E '/^[0-9]+ +30{8} /! s/^([0-9]+) +([3-9][0-9]{8,}|[0-9]{10,}).*/\1 250000000 XXXX XXXX XXXX/'
sed -r '/^[0-9]+ +30{8} /! s/^([0-9]+) +([3-9][0-9]{8,}|[0-9]{10,}).*/\1 250000000 XXXX XXXX XXXX/'
We will handle the strict "greater than" sneakily!
We prefix the command with a condition that tells sed
to only process lines which do not have 300000000 in the second field. That means we don't have to worry about matching 300000001 or 300010000 but not 300000000. If a line passes this condition, then (and only then!) we will go ahead and replace any number followed by 300000000 or more followed by anything
, by the first number (only), followed by " 250000000 XXXX XXXX XXXX"
.
In other words:
If the 2nd field is exactly 300000000 the condition means nothing will happen. OTHERWISE if it's less than 300000000 then it won't match the regex "find" part so again nothing will happen, OTHERWISE it will do a replace.
Switches:
-E
/ -r
tells sed
to use modern regex. The letter differs between different versions of *nix, so it could be something else. These are the two most common letters for this option. See man sed
to check what you need on your system.
Condition:
This is easy. The line will be processed if:
^
from the start of the line....[0-9]+ +
some number >1 of numeric characters followed by some number >1 of spaces (your first field and the column spacing)...30{8}
3 followed by exactly 8 zeros followed by a space. We need the space otherwise it would match, e.g., 300000000500 as well./!
The !
after the end of the condition means "only process the command if this condition isn't met.If a line matches this condition, then we have a line with exactly 300000000 in the second field, and sed
will always leave the line unchanged. If not, it will try to find a match and replace it....
Regex replace command:
This command only gets executed if the second field is not exactly 300000000, because of the condition above. So we can assume that's already checked and look at the replace action if it didn't contain exactly 300000000 in the second field:
s
do a find/replace....^([0-9]+) +
find start of line followed by any number >1 of digits, followed by any number >1 of spaces. This is the contents of the first field. The (...)
is a grouping that tells regex to remember the part of the matched text it contains - which will be the first field - to potentially be re-used in the replacement operation. (We want to include the first field's value in the changed line, if the match succeeds). This must also be followed by...([3-9][0-9]{8,}|[0-9]{10,}).*
Match a second field that contains EITHER 3-9 followed by 8 digits OR any 9+ digit number, ONLY, and then anything else to the end of the line. Remember that *
is "greedy" and matches all it can, so we don't have to explicitly say "to the end of the line", it will do that anyway. We also don't need to match the space after the 2nd field, because again, *
and +
are greedy and will match all the digits they can. So we're telling sed
to match any line that contains "(start of line)(number)(spaces)(number >= 300000000)(anything)", and remember the first number. Although the pattern could in theory match and replace the exact value 300000000, it never will, because we excluded that possibility with a condition beforehand. Also note that we need the .*
at the end, because sed
only replaces what it matches - if we left it out, it wouldn't replace the rest of the line, it would only replace the text that it actually matched - the first and second fields - which isn't what we want.\1 250000000 XXXX XXXX XXXX
The \1
in the replacement string is a "back reference". It means, "put the contents of the first matched group here". So this tells sed
to replace the entire line (because that's what it matched) by the contents of the first field, followed by a space, followed by "250000000 XXXX XXXX XXXX".For completeness, if the line could have leading spaces, the command would then be:
sed -E '/^ *[0-9]+ +30{8} /! s/^( *[0-9]+) +([3-9][0-9]{8,}|[0-9]{10,}).*/\1 250000000 XXXX XXXX XXXX/'
(The leading spaces, if any, are inside the grouping, so that we keep them when we do the replacement, for niceness. Otherwise they'd be lost)
Done.
This might work for you (GNU sed):
sed -r '/^\S+\s+(300000000|[1-2][0-9]{8}|[0-9]{1,8})\s/!c change' file
If it's 300000000
or less keep it, otherwise change it.
Or using substitution:
sed '/^\S\+\s\+\(300000000\|[1-2][0-9]\{8\}\|[0-9]\{1,8\}\)\s/!s/^\(\S\+\s\+\).*/\1250000000 XXXX XXXX XXXX/' file
This is doable but not simple. (≥ a number ending is 0's is easier than >.)
Let's start with a smaller number.
How could we match numbers greater than 30?
2-digit numbers greater than 30 but less than 40,
\b3[1-9]\b
2-digit numbers 40 or greater,
\b[4-9][0-9]\b
numbers with more digits are greater too.
\b[1-9][0-9]\{2,\}\b
Use alternation to match all the cases.
\b\(3[1-9]\|[4-9][0-9]\|[0-9]\{3,\}\)\b
300000000 is similar, but more work. Here I've added spaces for readability, but you'll need to remove them in the sed
regex.
\b \( 30000000[1-9]
\| 3000000[1-9][0-9]
\| 300000[1-9][0-9]\{2\}
\| 30000[1-9][0-9]\{3\}
\| 3000[1-9][0-9]\{4\}
\| 300[1-9][0-9]\{5\}
\| 30[1-9][0-9]\{6\}
\| 3[1-9][0-9]\{7\}
\| [4-9][0-9]\{8\}
\| [1-9][0-9]\{9\}
\) \b
In awk:
$ awk '$2>300000000{for(i=3;i<=NF;i++)$i="XXXX"}1' file
1499011200 310961583 XXXX XXXX XXXX
Explained:
$ awk ' # using awk
$2>300000000 { # if the second value is greater than ...
for(i=3;i<=NF;i++) # for each value aftef the second
$i="XXXX" # replace it with XXXX
}1' file # output