I\'m looking for a regular expression to catch all digits in the first 7 characters in a string.
This string has 12 characters:
A12B345CD678
The regex solution is cool, but I'd use something easier to read for maintainability. E.g.
library(stringr)
str_sub(s, 1, 7) = gsub('[A-Z]', '', str_sub(s, 1, 7))
You can also use a simple negative lookbehind:
s <- "A12B345CD678"
gsub("(?<!.{7})\\D", "", s, perl=T)
You can use the known SKIP-FAIL regex trick to match all the rest of the string beginning with the 8th character, and only match non-digit characters within the first 7 with a lookbehind:
s <- "A12B345CD678"
gsub("(?<=.{7}).*$(*SKIP)(*F)|\\D", "", s, perl=T)
## => [1] "12345CD678"
See IDEONE demo
The perl=T
is required for this regex to work. The regex breakdown:
(?<=.{7}).*$(*SKIP)(*F)
- matches any character but a newline (add (?s)
at the beginning if you have newline symbols in the input), as many as possible (.*
) up to the end ($
, also \\z
might be required to remove final newlines), but only if preceded with 7 characters (this is set by the lookbehind (?<=.{7})
). The (*SKIP)(*F)
verbs make the engine omit the whole matched text and advance the regex index to the position at the end of that text.|
- or...\\D
- a non-digit character.See the regex demo.