问题
I have this problem with the code i'm running right now.
My code is that I enter an URL and when i click submit it removes all tags. I use strip_tags
for that one. And then I use preg_match_all("/((?:\w'|\w|-)+)/", $contents, $words);
which creates and array of every word. I then have a foreach loop which will count all words and then place it in a table with another foreach loop.
The problem is for example. Say I enter an URL which has the following content:
<html>
<head>
<title>titel1</title>
</head>
<body>
<div id="div1">
<h1 class="class2">
Testpage-h1
</h1>
<p>
Testpage-p
</p>
</div>
<script>
alert('hallo');
document.getElementById('class2');
</script>
</body>
</html>
This will echo out the following using my code:
document 1
getElementById1 1
class2' 1
hallo 1
alert 1
Testpage-h1 1
Testpage-p 1
titel1 1
(sorry for placing this as 'code' but it wouldn't let me use breaks otherwise, or place the numbers under eachother)
My problem with this is that it shouldn't show what is between the <script></script>
tags, because that has no use for me anyway. Is there a solution for this matter?
I've tried such things as sanitize filterering but this didn't help me.
回答1:
You can remove < script >...< /script > from your string before any calculations:
$text = preg_replace('#<script(.*?)>(.*?)</script>#is', '', $text);
Or another solutions (slower, but sometimes more correct) from remove script tag from HTML content:
$doc = new DOMDocument();
// load the HTML string we want to strip
$doc->loadHTML($html);
// get all the script tags
$script_tags = $doc->getElementsByTagName('script');
$length = $script_tags->length;
// for each tag, remove it from the DOM
for ($i = 0; $i < $length; $i++) {
$script_tags->item($i)->parentNode->removeChild($script_tags->item($i));
}
// get the HTML string back
$no_script_html_string = $doc->saveHTML();
来源:https://stackoverflow.com/questions/22781853/strip-tags-remove-javascript