how to remove text between and
using python?
Are you trying to prevent XSS? Just eliminating the tags will not solve all possible attacks! Here's a great list of the many ways (some of them very creative) that you could be vulnerable http://ha.ckers.org/xss.html. After reading this page you should understand why just elimintating the
tags using a regular expression is not robust enough. The python library lxml has a function that will robustly clean your HTML to make it safe to display.
If you are sure that you just want to eliminate the tags this code in lxml should work:
from lxml.html import parse
root = parse(filename_or_url).getroot()
for element in root.iter("script"):
element.drop_tree()
Note: I downvoted all the solutions using regular expresions. See here why you shouldn't parse HTML using regular expressions: Using regular expressions to parse HTML: why not?
Note 2: Another SO question showing HTML that is impossible to parse with regular expressions: Can you provide some examples of why it is hard to parse XML and HTML with a regex?