cyberneko

When using HtmlUnit, how can I configure the underlying NekoHtml parser?

戏子无情 提交于 2019-12-25 05:22:46
问题 I'm using HtmlUnit to try and scrape a webpage because of it's Javascript support. (I'd rather use Jsoup, but no JS support). The issue relates to a feature of the underlying NekoHtml parser: "http://cyberneko.org/html/features/scanner/allow-selfclosing-iframe" See: http://nekohtml.sourceforge.net/settings.html This can apparently be enabled in Neko, but I'm using HtmlUnit. Is there a way to configure the underlying Neko parser that HTML unit is using to enable this feature? When attempting

When using HtmlUnit, how can I configure the underlying NekoHtml parser?

旧城冷巷雨未停 提交于 2019-12-25 05:22:20
问题 I'm using HtmlUnit to try and scrape a webpage because of it's Javascript support. (I'd rather use Jsoup, but no JS support). The issue relates to a feature of the underlying NekoHtml parser: "http://cyberneko.org/html/features/scanner/allow-selfclosing-iframe" See: http://nekohtml.sourceforge.net/settings.html This can apparently be enabled in Neko, but I'm using HtmlUnit. Is there a way to configure the underlying Neko parser that HTML unit is using to enable this feature? When attempting

XmlSlurper/NekoHTML document fragment parsing - No HTML or BODY tags wanted

廉价感情. 提交于 2019-12-10 12:19:34
问题 Dear All, I am trying to parse the following HTML fragment, and I would like to get the same fragment as output (without HTML and BODY tags). Is this possible? If so, how? Thank you Misha p.s. I am reading here: http://nekohtml.sourceforge.net/faq.html#fragments and I believe I have added the correct options below. However, the output is still incorrect :( Thank you Misha import groovy.xml.MarkupBuilder import groovy.xml.StreamingMarkupBuilder import groovy.util.XmlNodePrinter import groovy