问题
I'm trying to parse some websites with Swiftsoup, let's say one of the websites is from Medium. How can I extract the body of the website and load the body to another UIViewController like what Instapaper does?
Here is the code I use to extract the title:
import SwiftSoup
class WebViewController: UIViewController, UIWebViewDelegate {
...
override func viewDidLoad() {
super.viewDidLoad()
let url = URL(string: "https://medium.com/@timjwise/stop-lying-to-yourself-when-you-snub-panhandlers-its-not-for-their-own-good-199d0aa7a513")
let request = URLRequest(url: url!)
webView.loadRequest(request)
guard let myURL = url else {
print("Error: \(String(describing: url)) doesn't seem to be a valid URL")
return
}
let html = try! String(contentsOf: myURL, encoding: .utf8)
do {
let doc: Document = try SwiftSoup.parseBodyFragment(html)
let headerTitle = try doc.title()
print("Header title: \(headerTitle)")
} catch Exception.Error(let type, let message) {
print("Message: \(message)")
} catch {
print("error")
}
}
}
But I got no luck to extract the body of the website or any other websites, any way to get it work? CSS or JavaScript (I know nothing about CSS or Javascript)?
回答1:
Use function body https://github.com/scinfu/SwiftSoup#parsing-a-body-fragment Try this:
let html = try! String(contentsOf: myURL, encoding: .utf8)
do {
let doc: Document = try SwiftSoup.parseBodyFragment(html)
let headerTitle = try doc.title()
// my body
let body = doc.body()
// elements to remove, in this case images
let undesiredElements: Elements? = try body?.select("img[src]")
//remove
undesiredElements?.remove()
print("Header title: \(headerTitle)")
} catch Exception.Error(let type, let message) {
print("Message: \(message)")
} catch {
print("error")
}
来源:https://stackoverflow.com/questions/48963919/parse-html-with-swiftsoup-swift