I\'m trying to return the contents of any tags in a body of text. I\'m currently using the following expression, but it only captures the contents of the first tag and ign
Don't use regular expressions for parsing HTML. HTML is not a regular language. Use the power of the DOM. This is much easier, because it is the right tool.
var scripts = document.getElementsByTagName('script');
The "problem" here is in how exec
works. It matches only first occurrence, but stores current index (i.e. caret position) in lastIndex
property of a regex. To get all matches simply apply regex to the string until it fails to match (this is a pretty common way to do it):
var scripttext = ' <script type="text/javascript">\nalert(\'1\');\n</script>\n\n<div>Test</div>\n\n<script type="text/javascript">\nalert(\'2\');\n</script>';
var re = /<script\b[^>]*>([\s\S]*?)<\/script>/gm;
var match;
while (match = re.exec(scripttext)) {
// full match is in match[0], whereas captured groups are in ...[1], ...[2], etc.
console.log(match[1]);
}
Try using the global flag:
document.body.innerHTML.match(/<script.*?>([\s\S]*?)<\/script>/gmi)
Edit: added multiple line and case insensitive flags (for obvious reasons).
try this
for each(var x in document.getElementsByTagName('script');
if (x && x.innerHTML){
var yourRegex = /http:\/\/\.*\.com/g;
var matches = yourRegex.exec(x.innerHTML);
if (matches){
your code
}}
In .Net, there's a submatch method, in PHP, preg_match_all, which should solve you problem. In Javascript there isn't such a method. But you can made by yourself.
Test in http://www.pagecolumn.com/tool/regtest.htm
Select $1elements method will return what you want
The first group contains the content of the tags.
Edit: Don't you have to surround the regex-satement with quotes? Like:
re = "/<script\b[^>]*>([\s\S]*?)<\/script>/gm";