Hi I am using the XML package in R to scrape html pages. The page of interest is http://www.ncbi.nlm.nih.gov/protein/225903367?report=fasta and on that page there is a sequence
If you go to this URL ncbi.nlm.nih.gov/protein/225903367?report=fasta you will see a sequence of letters starting with "MYS" and it's that sequence that I need.
Finally I think I understood what you need. The content you are looking for is in the following span
:
MYSFNTLRLYLWETIVFFSLAASKEAEAARSAPKPMSPSDFLDKLMGRTS…
You find it with an XPath expression like:
"//span[@id = 'gi_225903367_1']"
Note: This is the correct expression to retrieve a span
element with the id
attribute value "gi_225903367_1". I cannot comment on whether you are applying XPath correctly in your R code.