Regex match pull href link and string in between [duplicate]

前端未结

关注

 2  839

醉梦人生 2021-01-29 08:26

2条回答

囚心锁ツ (楼主)

2021-01-29 08:49

This shows how to do what you are looking for: C# Scraping HTML Links

Here is the code example from that page:

using System.Collections.Generic;
using System.Text.RegularExpressions;

public struct LinkItem
{
    public string Href;
    public string Text;

    public override string ToString()
    {
    return Href + "\n\t" + Text;
    }
}

static class LinkFinder
{
    public static List Find(string file)
    {
    List list = new List();

    // 1.
    // Find all matches in file.
    MatchCollection m1 = Regex.Matches(file, @"(.*?)",
        RegexOptions.Singleline);

    // 2.
    // Loop over each match.
    foreach (Match m in m1)
    {
        string value = m.Groups[1].Value;
        LinkItem i = new LinkItem();

        // 3.
        // Get href attribute.
        Match m2 = Regex.Match(value, @"href=\""(.*?)\""",
        RegexOptions.Singleline);
        if (m2.Success)
        {
        i.Href = m2.Groups[1].Value;
        }

        // 4.
        // Remove inner tags from text.
        string t = Regex.Replace(value, @"\s*<.*?>\s*", "",
        RegexOptions.Singleline);
        i.Text = t;

        list.Add(i);
    }
    return list;
    }
}

0 讨论(0)

查看其它2个回答

热议问题