问题
Writing a globalization module for a web application and I need a regexp to replace all instances of a word with another word (the translation) - except - words found within a URL/URI.
EDIT: I forgot to mention that I'm using Ruby, so I can't use 'Lookbehind'
回答1:
- Split on URI regular expression; include the URI's in the result.
- For each piece:
- if it is a URI, leave it alone
- otherwise, do word replacement
- Join the pieces
Code:
# From RFC 3986 Appendix B, with these modifications:
# o Spaces disallowed
# o All groups non-matching, except for added outermost group
# o Not anchored
# o Scheme required
# o Authority required
URI_REGEX = %r"((?:(?:[^ :/?#]+):)(?://(?:[^ /?#]*))(?:[^ ?#]*)(?:\?(?:[^ #]*))?(?:#(?:[^ ]*))?)"
def replace_except_uris(text, old, new)
text.split(URI_REGEX).collect do |s|
if s =~ URI_REGEX
s
else
s.gsub(old, new)
end
end.join
end
text = <<END
stack http://www.stackoverflow.com stack
stack http://www.somewhere.come/stack?stack=stack#stack stack
END
puts replace_except_uris(text, /stack/, 'LINKED-LIST')
# => LINKED-LIST http://www.stackoverflow.com LINKED-LIST
# => LINKED-LIST http://www.somewhere.come/stack?stack=stack#stack LINKED-LIST
回答2:
You can probaby use something like
(?<!://[^ ]*)\bfoo\b
But this probably isn't perfect, it just looks that the word doesn't appear in a single non-whitespace string of characters that don't have ://
somewhere before the word.
PS Home:\> "foo foobar http://foo_bar/baz?gak=foobar baz foo" -replace '(?<!://[^ ]*)\bfoo\b', 'FOO'
FOO foobar http://foo_bar/baz?gak=foobar baz FOO
回答3:
Have you tried splitting your text into words and iterating over the words? Then you can examine each word, determine if it's a URI, translate it if it isn't.
来源:https://stackoverflow.com/questions/2162860/regular-expression-replace-word-except-within-a-url-uri