问题
I am trying to use FSharp.Data's HTML Parser to extract a string List of links from href attributes.
I can get the links printed out to console, however, i'm struggling to get them into a list.
Working snippet of a code which prints the wanted links:
let results = HtmlDocument.Load(myUrl)
let links =
results.Descendants("td")
|> Seq.filter (fun x -> x.HasClass("pagenav"))
|> Seq.map (fun x -> x.Elements("a"))
|> Seq.iter (fun x -> x |> Seq.iter (fun y -> y.AttributeValue("href") |> printf "%A"))
How do i store those strings into variable links instead of printing them out?
Cheers,
回答1:
On the very last line, you end up with a sequence of sequences - for each td.pagenav
you have a bunch of <a>
, each of which has a href
. That's why you have to have two nested Seq.iter
s - first you iterate over the outer sequence, and on each iteration you iterate over the inner sequence.
To flatten a sequence of sequences, use Seq.collect
. Further, to convert a sequence to a list, use Seq.toList
or List.ofSeq
(they're equivalent):
let a = [ [1;2;3]; [4;5;6] ]
let b = a |> Seq.collect id |> Seq.toList
> val b : int list = [1; 2; 3; 4; 5; 6]
Applying this to your code:
let links =
results.Descendants("td")
|> Seq.filter (fun x -> x.HasClass("pagenav"))
|> Seq.map (fun x -> x.Elements("a"))
|> Seq.collect (fun x -> x |> Seq.map (fun y -> y.AttributeValue("href")))
|> Seq.toList
Or you could make it a bit cleaner by applying Seq.collect
at the point where you first encounter a nested sequence:
let links =
results.Descendants("td")
|> Seq.filter (fun x -> x.HasClass("pagenav"))
|> Seq.collect (fun x -> x.Elements("a"))
|> Seq.map (fun y -> y.AttributeValue("href"))
|> Seq.toList
That said, I would rather rewrite this as a list comprehension. Looks even cleaner:
let links = [ for td in results.Descendants "td" do
if td.HasClass "pagenav" then
for a in td.Elements "a" ->
a.AttributeValue "href"
]
来源:https://stackoverflow.com/questions/44294409/f-data-html-parser-extracting-strings-from-nodes