Regex for finding HTML classes with JSOUP

孤街浪徒 提交于 2020-04-30 07:20:06

问题


For my project I need to parse HTML and get the price of a product. This is how I am doing it at the moment:

let url = "https://www.adidas.de/adistar-trikot/CV7089.html"
let className = "gl-price__value"

do {
    let html: String = getHTMLfromURL(url: url)
    let doc: Document = try SwiftSoup.parse(html)

    let price: Elements = try doc.getElementsByClass(className)

    let priceText : String = try price.text()

    result.text = priceText

} catch Exception.Error(let type, let message) {
    print(message)
} catch {
    print("error")
}

Question:

How can I change className to a regex so all 3 examples below would match? Ive tried several possibilities now but can not make it work. Happy for every help!

Example 1:

<div class="price">82 EUR</div>

Example 2:

<span class="gl-price__value">€ 139,95</span>

Example 3:

<span id="priceblock_ourprice" class="a-size-medium a-color-price priceBlockBuyingPriceString">79,99&nbsp;€</span>

回答1:


Maybe getElementsByClass is not the best way to go. From SwiftSoup Readme - Use selector syntax to find elements

SwiftSoup elements support a CSS (or jQuery) like selector syntax to find matching elements, that allows very powerful and robust queries.

[attr~=regex]: elements with attribute values that match the regular expression; e.g. img[src~=(?i)\.(png|jpe?g)]

Your code would became something similar to:

let doc: Document = try SwiftSoup.parse(html)

let priceClasses: Elements = try doc.select("[class~=(?i)price]")

for priceClass: Element in priceClasses.array() {
    let priceText : String = try priceClass.text()
    ...
}
...

I'm using price here as the regex based on the examples you have provided, but you can adapt as you need.



来源:https://stackoverflow.com/questions/61432613/regex-for-finding-html-classes-with-jsoup

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!