C# Regular Expression excluding a string

后端 未结 3 1797
-上瘾入骨i
-上瘾入骨i 2021-01-02 20:03

I got a collection of string and all i want for regex is to collect all started with http..

href=\"http://www.test.com/cat/1-one_piece_episodes/\"href

相关标签:
3条回答
  • 2021-01-02 20:48

    Your input doesn't look like a valid string (unless you escape the quotes in them) but you can do it without regex too:

    string input = "href=\"http://www.test.com/cat/1-one_piece_episodes/\"href=\"http://www.test.com/cat/2-movies_english_subbed/\"href=\"http://www.test.com/cat/3-english_dubbed/\"href=\"http://www.exclude.com\"";
    
    List<string> matches = new List<string>();
    
    foreach(var match in input.split(new string[]{"href"})) {
       if(!match.Contains("exclude.com"))
          matches.Add("href" + match);
    }
    
    0 讨论(0)
  • 2021-01-02 20:52

    Will this do the job?

    href="(?!http://[^/"]+exclude.com)(.*?)[^#]"
    
    0 讨论(0)
  • 2021-01-02 20:55

    @ridgerunner and me would change the regex to:

    href="((?:(?!\bexclude\b)[^"])*)[^#]"
    

    It matches all href attributes as long as they don't end in # and don't contain the word exclude.

    Explanation:

    href="     # Match href="
    (          # Capture...
     (?:       # the following group:
      (?!      # Look ahead to check that the next part of the string isn't...
       \b      # the entire word
       exclude # exclude
       \b      # (\b are word boundary anchors)
      )        # End of lookahead
      [^"]     # If successful, match any character except for a quote
     )*        # Repeat as often as possible
    )          # End of capturing group 1
    [^#]"      # Match a non-# character and the closing quote.
    

    To allow multiple "forbidden words":

    href="((?:(?!\b(?:exclude|this|too)\b)[^"])*)[^#]"
    
    0 讨论(0)
提交回复
热议问题