What's the best way to remove tags from the end of a string?

问题

The .NET web system I'm working on allows the end user to input HTML formatted text in some situations. In some of those places, we want to leave all the tags, but strip off any trailing break tags (but leave any breaks inside the body of the text.)

What's the best way to do this? (I can think of ways to do this, but I'm sure they're not the best.)

回答1:

As @Mitch said,

//  using System.Text.RegularExpressions;

/// <summary>
///  Regular expression built for C# on: Thu, Sep 25, 2008, 02:01:36 PM
///  Using Expresso Version: 2.1.2150, http://www.ultrapico.com
///  
///  A description of the regular expression:
///  
///  Match expression but don't capture it. [\<br\s*/?\>], any number of repetitions
///      \<br\s*/?\>
///          <
///          br
///          Whitespace, any number of repetitions
///          /, zero or one repetitions
///          >
///  End of line or string
///  
///  
/// </summary>
public static Regex regex = new Regex(
    @"(?:\<br\s*/?\>)*$",
    RegexOptions.IgnoreCase
    | RegexOptions.CultureInvariant
    | RegexOptions.IgnorePatternWhitespace
    | RegexOptions.Compiled
    );
regex.Replace(text, string.Empty);

回答2:

Small change to bdukes code, which should be faster as it doesn't backtrack.

public static Regex regex = new Regex(
    @"(?:\<br[^>]*\>)*$",
    RegexOptions.IgnoreCase
    | RegexOptions.CultureInvariant
    | RegexOptions.IgnorePatternWhitespace
    | RegexOptions.Compiled
);
regex.Replace(text, string.Empty);

回答3:

I'm sure this isn't the best way either, but it should work unless you have trailing spaces or something.

while (myHtmlString.EndsWith("<br>"))
{
    myHtmlString = myHtmlString.SubString(0, myHtmlString.Length - 4);
}

回答4:

I'm trying to ignore the ambiguity in your original question, and read it literally. Here is an extension method that overloads TrimEnd to take a string.

static class StringExtensions
{
    public static string TrimEnd(this string s, string remove)
    {
        if (s.EndsWith(remove))
        {
            return s.Substring(0, s.Length - remove.Length);
        }
        return s;
    }
}

Here are some tests to show that it works:

        Debug.Assert("abc".TrimEnd("<br>") == "abc");
        Debug.Assert("abc<br>".TrimEnd("<br>") == "abc");
        Debug.Assert("<br>abc".TrimEnd("<br>") == "<br>abc");

I want to point out that this solution is easier to read than regex, probably faster than regex (you should use a profiler, not speculation, if you're concerned about performance), and useful for removing other things from the ends of strings.

regex becomes more appropriate if your problem is more general than you stated (e.g., if you want to remove <BR> and </BR> and deal with trailing spaces or whatever.

回答5:

You can use a regex to find and remove the text with the regex match set to anchor at the end of the string.

回答6:

You could also try (if the markup is likely to be a valid tree) something similar to:

string s = "<markup><div>Text</div><br /><br /></markup>";

XmlDocument doc = new XmlDocument();
doc.LoadXml(s);

Console.WriteLine(doc.InnerXml);

XmlElement markup = doc["markup"];
int childCount = markup.ChildNodes.Count;
for (int i = childCount -1; i >= 0; i--)
{
    if (markup.ChildNodes[i].Name.ToLower() == "br")
    {
        markup.RemoveChild(markup.ChildNodes[i]);
    }
    else
    {
        break;
    }
}
Console.WriteLine("---");
Console.WriteLine(markup.InnerXml); 
Console.ReadKey();

The code above is a bit "scratch-pad" but if you cut and paste it into a Console application and run it, it does work :=)

回答7:

you can use RegEx or check if the trailing string is a break and remove it

来源：https://stackoverflow.com/questions/135151/whats-the-best-way-to-remove-br-tags-from-the-end-of-a-string

标签

.net

ASP.NET

xhtml

string

What's the best way to remove <br> tags from the end of a string?

问题