How to import table with mix of text and image metadata with IMPORTHTML and/or IMPORTXML?

社会主义新天地 提交于 2020-04-18 03:47:23

问题


I'm trying to import tables with a mixture of text and images into Google Sheets with IMPORTHTML and/or IMPORTXML function.

The tables I'm trying to import are the 'Equipment' tables under the 'Advancement' section from multiple sites like this: https://stt.wiki/wiki/Xindi_%27Prisoner%27_Archer.

The number of stars at each item in the table represents a "level" from 1 ("Common") to 5 ("Legendary"), with no stars representing level 0 ("Basic"). The image metadata contains the level description. Example for "Legendary" level:

<img alt="Legendary" src="/w/images/thumb/b/b5/StarItem.png/15px-StarItem.png" title="Legendary" width="15" height="15" style="vertical-align: sub" srcset="/w/images/thumb/b/b5/StarItem.png/23px-StarItem.png 1.5x, /w/images/thumb/b/b5/StarItem.png/30px-StarItem.png 2x">

My problem is to include the level information in the import, either as images or as image metadata.

My ultimate goal is a table like this (created manually):

desired outcome

(columns E and I with URLs are optional).


IMPORTHTML:

First I tried to import with IMPORTHTML, cell A1 contains the URL (see above) (please note that I have to use semicolon in formulas due to local settings):

=IMPORTHTML(A1; "table"; 4)

This gives me this table:

import result for above code

Unfortunately, the "stars" from the original table are not imported.

1) So the first question is: Is there a way to include the images from a table with IMPORTHTML method? Or alternatively metadata from the images?


IMPORTXML:

I then tried to use IMPORTXML to get just the missing level data:

=IMPORTXML(A1; "//*[@id='mw-content-text']/div/table[3]/tbody/tr/td/span/img[1]/@alt").

The IMPORTHTML gave me 40 items in total, but with this IMPORTXML I only get 37 values for item levels. This is because with my IMPORTXML method I don't get information on the "Basic" items, that is the items without stars.

So now I have a list of 37 levels and a table with 40 items, but no logical connection between them. The list of levels would need entries (could be blank cells) for the basic items at the correct positions in the list to make the assignmant between items and levels possible.

2) So my second question is: For the IMPORTXML method, is there any way to get a result with the same number of cells in Google sheets as in the original table, even when for some cells of the original table the XPATH doesn't match? In this case the import could give an empty cell instead. In the example this would give me a list of 40 cells, 3 of which would be empty.


Other solutions with Google Sheets are welcome, too.


回答1:


XPath solution (6 are used, check yellow cells) :

Star.Treck.Sheet

First we get the structure of the table with IMPORTHTML. Then with XPath, we get the ids, members names and levels of everyone with a star (i.e a rank). Then we get the ids and members names of everyone (with and without a star). We VLOOKUP to build the levels table (see join.levels). No star >> "Basic". We fetch the urls. Finally, we build our final table with CONCAT (ids+names to secure the join).




回答2:


would this satisfy you:

=ARRAYFORMULA(IFERROR(VLOOKUP(B4:B13&C4:C13, {
 IMPORTXML($A1, "//table[3]/tbody//span/img[1]/@title/preceding::td[@class='ItemRight'][1]")&
 IMPORTXML($A1, "//table[3]/tbody//span/img[1]/@title/preceding::a[1]"),
 VLOOKUP(IMPORTXML($A1, "//table[3]/tbody//span/img[1]/@title"),
 {"Common",     "★", "", "", "", "";
  "Uncommon",   "★", "★", "", "", "";
  "Rare",       "★", "★", "★", "", "";
  "Super Rare", "★", "★", "★", "★", "";
  "Legendary",  "★", "★", "★", "★", "★"}, 
 {2, 3, 4, 5, 6}, 0)}, {2, 3, 4, 5, 6}, 0)))

or with original stars:

=ARRAYFORMULA(IMAGE(SUBSTITUTE(IFERROR(VLOOKUP(B4:B13&C4:C13, {
 IMPORTXML($A1, "//table[3]/tbody//span/img[1]/@title/preceding::td[@class='ItemRight'][1]")&
 IMPORTXML($A1, "//table[3]/tbody//span/img[1]/@title/preceding::a[1]"), 
 VLOOKUP(IMPORTXML($A1, "//table[3]/tbody//span/img[1]/@title"), 
 {"Common",     "★", "", "", "", "";
  "Uncommon",   "★", "★", "", "", "";
  "Rare",       "★", "★", "★", "", "";
  "Super Rare", "★", "★", "★", "★", "";
  "Legendary",  "★", "★", "★", "★", "★"}, 
 {2, 3, 4, 5, 6}, 0)}, {2, 3, 4, 5, 6}, 0)), "★", 
 "https://stt.wiki/w/images/thumb/b/b5/StarItem.png/15px-StarItem.png"), 3))


original:


spreadsheet demo



来源:https://stackoverflow.com/questions/60878556/how-to-import-table-with-mix-of-text-and-image-metadata-with-importhtml-and-or-i

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!