问题
Im trying to scrape this website (https://kamadan.gwtoolbox.com/) with google sheets for material costs for a game that I play. There are two tables; "Common Materials" and "Rare Materials" in a drop down in the top right corner. I am trying to pull the values for both as the prices update. I copied the full Xpath and used the function below in an empty cell on a sheet.
=importxml("https://kamadan.gwtoolbox.com/","/html/body/div[2]/div[1]/div/div[2]/table/tbody")
This returns a #N/A error saying it is returning an empty value.
I also tried it with the regular xpath...
=importxml("https://kamadan.gwtoolbox.com/","//*[@id='trader-overlay-items']")
Which just returns a blank cell. I have also tried both methods using the inspect function through chrome on the ancestors and children they return either of the two errors above.
Sorry if this is a really easy one. I am not familiar at all with Xpaths or html. I mostly dabble in VBA in excel.
回答1:
Answer:
IMPORTXML
can not retrieve data which is populated by a script, and so using this formula to retrieve data from this table is not possible to do.
More Information:
As you've already mentioned, you can attempt to get the data directly from the table using:
=IMPORTXML("https://kamadan.gwtoolbox.com/","//table[@id='trader-overlay-items']")
Which just gets a blank cell.
I went a step further and tried to reverse-engineer this by calling IMPORTXML
on the HTML
elements on the page in steps:
=IMPORTXML("https://kamadan.gwtoolbox.com/","html")
=IMPORTXML("https://kamadan.gwtoolbox.com/","html/body")
=IMPORTXML("https://kamadan.gwtoolbox.com/","html/body/div[1]")
=IMPORTXML("https://kamadan.gwtoolbox.com/","html/body/div[1]/div[0]")
...
html/body/div[1]/div[0]
is the first path which gives no imported content, and we can see from importing html/body
that the full body does not contain the imformation and only a template of it - in cell B1
we have references to 'Common materials' and 'Rare materials':
And in D1
we start to see JavaScript and JSON objects which are not called by IMPORTXML
and so the results of which can not be retrieved:
As you can see if you disable JavaScript on the site, almost nothing is actually rendered and so can't be obtained using IMPORTXML
:
I know this is generally bad news, but I hope this is helpful to you!
References:
- IMPORTXML - Docs Editors Help
来源:https://stackoverflow.com/questions/61470783/google-sheets-importxml-returns-empty-value