问题
I have a regular expression to validate a string. But now I want to remove all the characters that do not match my regular expression.
E.g.
regExpression = @"^([\w\'\-\+])"
text = "This is a sample text with some invalid characters -+%&()=?";
//Remove characters that do not match regExp.
result = "This is a sample text with some invalid characters -+";
Any ideas of how I can use the RegExpression to determine the valid characters and remove all the other ones.
Many thanks
回答1:
I believe you can do this (whitelist characters and replace everything else) in one line:
var result = Regex.Replace(text, @"[^\w\s\-\+]", "");
Technically it will produce this: "This is a sample text with some invalid characters - +" which is slightly different than your example (the extra space between the - and +).
回答2:
Simple as that:
var match = Regex.Match(text, regExpression);
string result = "";
if(match.Success)
result = match.Value;
Removing the non-matched characters is the same as keeping the matched ones. That's what we are doing here.
If it is possible that the expression matches multiple times in your text, you can use this:
var result = Regex.Matches(text, regExpression).Cast<Match>()
.Aggregate("", (s, e) => s + e.Value, s => s);
回答3:
Thanks to Replace chars if not match answer I've created a helper method to strips unaccepted characters .
The allowed pattern should be in Regex format, expect them wrapped in square brackets. A function will insert a tilde after opening squere bracket. I anticipate that it could work not for all RegEx describing valid characters sets,but it works for relatively simple sets, that we are using.
/// <summary>
/// Replaces not expected characters.
/// </summary>
/// <param name="text"> The text.</param>
/// <param name="allowedPattern"> The allowed pattern in Regex format, expect them wrapped in brackets</param>
/// <param name="replacement"> The replacement.</param>
/// <returns></returns>
/// // https://stackoverflow.com/questions/4460290/replace-chars-if-not-match.
//https://stackoverflow.com/questions/6154426/replace-remove-characters-that-do-not-match-the-regular-expression-net
//[^ ] at the start of a character class negates it - it matches characters not in the class.
//Replace/Remove characters that do not match the Regular Expression
static public string ReplaceNotExpectedCharacters( this string text, string allowedPattern,string replacement )
{
allowedPattern = allowedPattern.StripBrackets( "[", "]" );
//[^ ] at the start of a character class negates it - it matches characters not in the class.
var result = Regex .Replace(text, @"[^" + allowedPattern + "]", replacement);
return result;
}
static public string RemoveNonAlphanumericCharacters( this string text)
{
var result = text.ReplaceNotExpectedCharacters(NonAlphaNumericCharacters, "" );
return result;
}
public const string NonAlphaNumericCharacters = "[a-zA-Z0-9]";
来源:https://stackoverflow.com/questions/6154426/replace-remove-characters-that-do-not-match-the-regular-expression-net