I have a file with a bunch of information. For example, all of the lines follow the same pattern as this:
Nebraska
s = '<school>Nebraska</school>'
in:
s.split('>')
out:
['<school', 'Nebraska</school', '']
in:
s.split('>')[1].split('<')
out:
['Nebraska', '/school']
in:
s.split('>')[1].split('<')[0]
out:
'Nebraska'
You've cut off part of the string. Keep going in the same fashion:
>>> s = '<school>Nebraska</school>'
>>> s.split('>')[1]
'Nebraska</school'
>>> s.split('>')[1].split('<')[0]
'Nebraska'
That said, you should parse HTML with an HTML parser like BeautifulSoup.
You could use a regular expression:
import re
regexp = re.compile('<school>(.*?)<\/school>')
with open('Pro.txt') as fo:
for rec in fo:
match = regexp.match(rec)
if match:
text = match.groups()[0]
print(text)