I\'m looking for a way, either in Ruby or Javascript, that will give me all matches, possibly overlapping, within a string against a regexp.
Let\'s say I have
This JavaScript approach offers an advantage over Wiktor's answer by lazily iterating the substrings of a given string using a generator function, which allows you to consume a single match at a time for very large input strings using a for...of loop, rather than generating a whole array of matches at once, which could lead to out-of-memory exceptions since the amount of substrings for a string grows quadratically with length:
function * substrings (str) {
for (let length = 1; length <= str.length; length++) {
for (let i = 0; i <= str.length - length; i++) {
yield str.slice(i, i + length);
}
}
}
function * matchSubstrings (str, re) {
const subre = new RegExp(`^${re.source}$`, re.flags);
for (const substr of substrings(str)) {
if (subre.test(substr)) yield substr;
}
}
for (const match of matchSubstrings('abcabc', /a.*c/)) {
console.log(match);
}
▶ str = "abcadc"
▶ from = str.split(/(?=\p{L})/).map.with_index { |c, i| i if c == 'a' }.compact
▶ to = str.split(/(?=\p{L})/).map.with_index { |c, i| i if c == 'c' }.compact
▶ from.product(to).select { |f,t| f < t }.map { |f,t| str[f..t] }
#⇒ [
# [0] "abc",
# [1] "abcadc",
# [2] "adc"
# ]
I believe, that there is a fancy way to find all indices of a character in a string, but I was unable to find it :( Any ideas?
Splitting on “unicode char boundary” makes it to work with strings like 'ábĉ'
or 'Üve Østergaard'
.
For more generic solution, that accepts any “from” and “to” sequences, one should introduce just a little modification: find all indices of “from” and “to” in the string.