I have a lxml etree HTMLParser object that I\'m trying to build xpaths with to assert xpaths, attributes of the xpath and text of that tag. I ran into a problem when the te
The solution is applicable If u r using python lxml
.
Its better to leave the escaping for lxml
. We can do this by using lxml
variables.
Suppose We have xpath
as below:
//tagname[text='some_text']`
If some_text
has both single and double quotes, then it causes "Invalid Predicate error"
.
Neither escaping work for me nor triple quotes. Because xml won't accept triple quotes.
Solution worked for me is lxml
variables.
We convert the xpath as below:
//tagname[text = $var]
Then execute
find = etree.XPath(xpath)
Then evaluate these variable to its value
elements = find(root, {'var': text})
According to what we can see in Wikipedia and w3 school, you should not have '
and "
in nodes content, even if only <
and &
are said to be stricly illegal. They should be replaced by corresponding "predefined entity references", that are '
and "
.
By the way, the Python parsers I use will take care of this transparently: when writing, they are replaced; when reading, they are converted.
After a second reading of your answer, I tested some stuff with the '
and so on in Python interpreter. And it will escape everything for you!
>>> 'text {0}'.format('blabla "some" bla')
'text blabla "some" bla'
>>> 'ntsnts {0}'.format("ontsi'tns")
"ntsnts ontsi'tns"
>>> 'ntsnts {0}'.format("ontsi'tn' \"ntsis")
'ntsnts ontsi\'tn\' "ntsis'
So we can see that Python escapes things correctly. Could you then copy-paste the error message you get (if any)?
there are more options to choose from, especially the """
and '''
might be what you want.
s = "a string with a single ' quote"
s = 'a string with a double " quote'
s = """a string with a single ' and a double " quote"""
s = '''another string with those " quotes '.'''
s = r"raw strings let \ be \"
s = r'''and can be added \ to " any ' of """ those things'''
s = """The three-quote-forms
may contain
newlines."""