what I\'m trying to do is make a \'jargon buster\'. Basically I have some html and some glossary terms in a database. When the person clicks on jargon buster it replaces the wor
Use the inverted word character \W
to select for any characters other than numbers and letters in your regex pattern. Because this would still fail at the boundaries of the text blob, you would also need to test those conditions as well. Thus using the word 'term' as the text you are searching for:
(^term$)|(^term\W)|(\Wterm\W)|(\Wterm$)
The first condition checks to make sure that term isn't the only contents of the blob, the second checks if its the first word, the third if it contained within the blob, and the last if its the last word.
If you want to consider any other characters as word characters (say a hyphen) you would need to repace the \W
with [^\w\-]
.
Hope this helps. There are probably optimizations that can performed as well, but this should at least be a good starting point.