Capture string in regex replacement

a 夏天 提交于 2020-01-14 07:32:28

问题


From what I can gather from the Pharo documentation on regex, I can define a regular expression object such as:

re := '(foo|re)bar' asRegex

And I can replace the matched regex with a string via this:

re copy: 'foobar blah rebar' replacingMatchesWith: 'meh'

Which will result in: `'meh blah meh'.

So far, so good. But I want to replace the 'bar' and leave the prefix alone. Therefore, I need a variable to handle the captured parenthetical:

re copy: 'foobar blah rebar' replacingMatchesWith: '%1meh'

And I want the result: 'foomeh blah remeh'. However, this just gives me: '%1meh blah %1meh'. I also tried using \1, or \\1, or $1, or {1} and got the literal string replacement, e.g., '\1meh blah \1meh' as a result.

I can do this easily enough in GNU Smalltalk with:

'foobar blah rebar' replacingAllRegex: '(foo|re)bar' with: '%1meh'

But I can't find anywhere in the Pharo regex documentation that tells me how I can do this in Pharo. I've done a bunch of googling for Pharo regex as well, but not turned up anything. Is this capability part of the RxMatcher class or some other Pharo regex class?


回答1:


After experimenting a bit with the RxMatcher class, I made the following modification to the RxMatcher#copyStream:to:replacingMatchesWith: selector:

copyStream: aStream to: writeStream replacingMatchesWith: aString
    "Copy the contents of <aStream> on the <writeStream>,
     except for the matches. Replace each match with <aString>."

    | searchStart matchStart matchEnd |
    stream := aStream.
    markerPositions := nil.
    [searchStart := aStream position.
    self proceedSearchingStream: aStream] whileTrue: [ | ws rep |
        matchStart := (self subBeginning: 1) first.
        matchEnd := (self subEnd: 1) first.
        aStream position: searchStart.
        searchStart to: matchStart - 1 do:
            [:ignoredPos | writeStream nextPut: aStream next].

        "------- The following lines replaced: writeStream nextPutAll: aString ------"
        "Do the regex replacement including lookback substitutions"
        writeStream nextPutAll: (aString format: self subexpressionStrings).
        "-------"

        aStream position: matchEnd.
        "Be extra careful about successful matches which consume no input.
        After those, make sure to advance or finish if already at end."
        matchEnd = searchStart ifTrue: 
            [aStream atEnd
                ifTrue: [^self "rest after end of whileTrue: block is a no-op if atEnd"]
                ifFalse:    [writeStream nextPut: aStream next]]].
    aStream position: searchStart.
    [aStream atEnd] whileFalse: [writeStream nextPut: aStream next]

And then "accessing" category:

subexpressionStrings
   "Create an array of lookback strings"
   | ws |
   ws := Array new writeStream.
   2 to: (self subexpressionCount) do: [ :n | | se |
      ws nextPut: ((se := self subexpression: n) ifNil: [ '' ] ifNotNil: [ se ]) ].
   ^ws contents.

With this modification, I can do a lookback in the replacement string using the Smalltalk String#format: pattern for arguments:

re := '((foo|re)ba(r|m))' asRegex
re copy: 'foobar meh rebam' replacingMatchesWith: '{2}bu{3} (was {1})'

Results in:

'foobur (was foobar) meh rebum (was rebam)'



回答2:


Did you check the Regex help? There is no #replacingAllRegex:, but the matcher has #subexpression:



来源:https://stackoverflow.com/questions/37403092/capture-string-in-regex-replacement

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!