问题
I'm new to LibreOffice Basic. I'm trying to write a macro in LibreOffice Calc that will read the name of a noble House of Westeros from a cell (e.g. Stark), and output the Words of that House by looking it up on the relevant page on A Wiki of Ice and Fire. It should work like this:
Here is the pseudocode:
Read HouseName from column A
Open HtmlFile at "http://www.awoiaf.westeros.org/index.php/House_" & HouseName
Iterate through HtmlFile to find line which begins "<table class="infobox infobox-body"" // Finds the info box for the page.
Read Each Row in the table until Row begins Words
Read the contents of the next <td> tag, and return this as a string.
My problem is with the second line, I don't know how to read a HTML file. How should I do this in LibreOffice Basic?
回答1:
There are two mainly issues with this. 1. Performance Your UDF will need get the HTTP resource in every cell, in which it is stored. 2. HTML Unfortunately there is no HTML parser in OpenOffice or LibreOffice. There is only a XML parser. Thats why we cannot parse HTML directly with the UDF.
This will work, but slow and not very universal:
Public Function FETCHHOUSE(sHouse as String) as String
sURL = "http://awoiaf.westeros.org/index.php/House_" & sHouse
oSimpleFileAccess = createUNOService ("com.sun.star.ucb.SimpleFileAccess")
oInpDataStream = createUNOService ("com.sun.star.io.TextInputStream")
on error goto falseHouseName
oInpDataStream.setInputStream(oSimpleFileAccess.openFileRead(sUrl))
on error goto 0
dim delimiters() as long
sContent = oInpDataStream.readString(delimiters(), false)
lStartPos = instr(1, sContent, "<table class=" & chr(34) & "infobox infobox-body" )
if lStartPos = 0 then
FETCHHOUSE = "no infobox on page"
exit function
end if
lEndPos = instr(lStartPos, sContent, "</table>")
sTable = mid(sContent, lStartPos, lEndPos-lStartPos + 8)
lStartPos = instr(1, sTable, "Words" )
if lStartPos = 0 then
FETCHHOUSE = "no Words on page"
exit function
end if
lEndPos = instr(lStartPos, sTable, "</tr>")
sRow = mid(sTable, lStartPos, lEndPos-lStartPos + 5)
oTextSearch = CreateUnoService("com.sun.star.util.TextSearch")
oOptions = CreateUnoStruct("com.sun.star.util.SearchOptions")
oOptions.algorithmType = com.sun.star.util.SearchAlgorithms.REGEXP
oOptions.searchString = "<td[^<]*>"
oTextSearch.setOptions(oOptions)
oFound = oTextSearch.searchForward(sRow, 0, Len(sRow))
If oFound.subRegExpressions = 0 then
FETCHHOUSE = "Words header but no Words content on page"
exit function
end if
lStartPos = oFound.endOffset(0) + 1
lEndPos = instr(lStartPos, sRow, "</td>")
sWords = mid(sRow, lStartPos, lEndPos-lStartPos)
FETCHHOUSE = sWords
exit function
falseHouseName:
FETCHHOUSE = "House name does not exist"
End Function
The better way would be, if you could get the needed informations from a Web API that would offered from the Wiki. You know the people behind the Wiki? If so, then you could place this there as a suggestion.
Greetings
Axel
来源:https://stackoverflow.com/questions/25330550/reading-html-page-using-libreoffice-basic