Using sed to replace a number greater than a specified number at a specified position

后端 未结 4 722
失恋的感觉
失恋的感觉 2021-01-25 06:45

I need to write a script to replace all the numbers greater than an specified number which is in following position.

1499011200 310961583 142550756 313415036 14         


        
相关标签:
4条回答
  • 2021-01-25 07:18

    Although it's an old-ish question, it's worth adding that this could also be handled using conditions:

    • FreeBSD/MacOS:
      sed -E '/^[0-9]+ +30{8} /! s/^([0-9]+) +([3-9][0-9]{8,}|[0-9]{10,}).*/\1 250000000 XXXX XXXX XXXX/'
    • Linux:
      sed -r '/^[0-9]+ +30{8} /! s/^([0-9]+) +([3-9][0-9]{8,}|[0-9]{10,}).*/\1 250000000 XXXX XXXX XXXX/'

    Explanation

    We will handle the strict "greater than" sneakily!

    We prefix the command with a condition that tells sed to only process lines which do not have 300000000 in the second field. That means we don't have to worry about matching 300000001 or 300010000 but not 300000000. If a line passes this condition, then (and only then!) we will go ahead and replace any number followed by 300000000 or more followed by anything, by the first number (only), followed by " 250000000 XXXX XXXX XXXX".

    In other words:

    If the 2nd field is exactly 300000000 the condition means nothing will happen. OTHERWISE if it's less than 300000000 then it won't match the regex "find" part so again nothing will happen, OTHERWISE it will do a replace.

    Switches:

    -E / -r tells sed to use modern regex. The letter differs between different versions of *nix, so it could be something else. These are the two most common letters for this option. See man sed to check what you need on your system.

    Condition:

    This is easy. The line will be processed if:

    • ^ from the start of the line....
    • [0-9]+ + some number >1 of numeric characters followed by some number >1 of spaces (your first field and the column spacing)...
      followed by:
    • 30{8} 3 followed by exactly 8 zeros followed by a space. We need the space otherwise it would match, e.g., 300000000500 as well.
    • /! The ! after the end of the condition means "only process the command if this condition isn't met.

    If a line matches this condition, then we have a line with exactly 300000000 in the second field, and sed will always leave the line unchanged. If not, it will try to find a match and replace it....

    Regex replace command:

    This command only gets executed if the second field is not exactly 300000000, because of the condition above. So we can assume that's already checked and look at the replace action if it didn't contain exactly 300000000 in the second field:

    • s do a find/replace....
      match and replace this expression, if it's found in the line (otherwise do nothing):
    • ^([0-9]+) + find start of line followed by any number >1 of digits, followed by any number >1 of spaces. This is the contents of the first field. The (...) is a grouping that tells regex to remember the part of the matched text it contains - which will be the first field - to potentially be re-used in the replacement operation. (We want to include the first field's value in the changed line, if the match succeeds). This must also be followed by...
    • ([3-9][0-9]{8,}|[0-9]{10,}).* Match a second field that contains EITHER 3-9 followed by 8 digits OR any 9+ digit number, ONLY, and then anything else to the end of the line. Remember that * is "greedy" and matches all it can, so we don't have to explicitly say "to the end of the line", it will do that anyway. We also don't need to match the space after the 2nd field, because again, * and + are greedy and will match all the digits they can. So we're telling sed to match any line that contains "(start of line)(number)(spaces)(number >= 300000000)(anything)", and remember the first number. Although the pattern could in theory match and replace the exact value 300000000, it never will, because we excluded that possibility with a condition beforehand. Also note that we need the .* at the end, because sed only replaces what it matches - if we left it out, it wouldn't replace the rest of the line, it would only replace the text that it actually matched - the first and second fields - which isn't what we want.
      If the line matches that expression, then replace the text that was matched (which will be the whole line), with:
    • \1 250000000 XXXX XXXX XXXX The \1 in the replacement string is a "back reference". It means, "put the contents of the first matched group here". So this tells sed to replace the entire line (because that's what it matched) by the contents of the first field, followed by a space, followed by "250000000 XXXX XXXX XXXX".

    For completeness, if the line could have leading spaces, the command would then be:

    sed -E '/^ *[0-9]+ +30{8} /! s/^( *[0-9]+) +([3-9][0-9]{8,}|[0-9]{10,}).*/\1 250000000 XXXX XXXX XXXX/'

    (The leading spaces, if any, are inside the grouping, so that we keep them when we do the replacement, for niceness. Otherwise they'd be lost)

    Done.

    0 讨论(0)
  • 2021-01-25 07:21

    This might work for you (GNU sed):

    sed -r '/^\S+\s+(300000000|[1-2][0-9]{8}|[0-9]{1,8})\s/!c change' file
    

    If it's 300000000 or less keep it, otherwise change it.

    Or using substitution:

    sed '/^\S\+\s\+\(300000000\|[1-2][0-9]\{8\}\|[0-9]\{1,8\}\)\s/!s/^\(\S\+\s\+\).*/\1250000000 XXXX XXXX XXXX/' file
    
    0 讨论(0)
  • 2021-01-25 07:27

    This is doable but not simple. (≥ a number ending is 0's is easier than >.)

    Let's start with a smaller number.

    How could we match numbers greater than 30?

    • 2-digit numbers greater than 30 but less than 40,

      \b3[1-9]\b
      
    • 2-digit numbers 40 or greater,

      \b[4-9][0-9]\b
      
    • numbers with more digits are greater too.

      \b[1-9][0-9]\{2,\}\b
      

    Use alternation to match all the cases.

    \b\(3[1-9]\|[4-9][0-9]\|[0-9]\{3,\}\)\b
    

    300000000 is similar, but more work. Here I've added spaces for readability, but you'll need to remove them in the sed regex.

    \b \( 30000000[1-9]
       \| 3000000[1-9][0-9]
       \| 300000[1-9][0-9]\{2\}
       \| 30000[1-9][0-9]\{3\}
       \| 3000[1-9][0-9]\{4\}
       \| 300[1-9][0-9]\{5\}
       \| 30[1-9][0-9]\{6\}
       \| 3[1-9][0-9]\{7\}
       \| [4-9][0-9]\{8\}
       \| [1-9][0-9]\{9\}
    \) \b
    
    0 讨论(0)
  • 2021-01-25 07:32

    In awk:

    $ awk '$2>300000000{for(i=3;i<=NF;i++)$i="XXXX"}1' file
    1499011200 310961583 XXXX XXXX XXXX
    

    Explained:

    $ awk '                 # using awk
    $2>300000000 {          # if the second value is greater than ...
        for(i=3;i<=NF;i++)  # for each value aftef the second
            $i="XXXX"       # replace it with XXXX
    }1' file                # output
    
    0 讨论(0)
提交回复
热议问题