how to remove text between and
using python?
Element Tree is the best simplest and sweetest package to do this. Yes, there are other ways to do it too; but don't use any 'coz they suck! (via Mark Pilgrim)
If you don't want to import any modules:
string = "<script> this is some js. begone! </script>"
string = string.split(' ')
for i, s in enumerate(string):
if s == '<script>' or s == '</script>' :
del string[i]
print ' '.join(string)
You can use BeautifulSoup with this (and other) methods:
soup = BeautifulSoup(source.lower())
to_extract = soup.findAll('script')
for item in to_extract:
item.extract()
This actually removes the nodes from the HTML. If you wanted to leave the empty <script></script>
tags you'll have to work with the item
attributes rather than just extracting it from the soup.