I want to extract only those words within double quotes. So, if the content is:
Would \"you\" like to have responses to your \"questions\" sent to you
Try this regex
:
\"[^\"]*\"
or
\".*?\"
explain :
[^ character_group ]
Negation: Matches any single character that is not in character_group.
*?
Matches the previous element zero or more times, but as few times as possible.
and a sample code:
foreach(Match match in Regex.Matches(inputString, "\"([^\"]*)\""))
Console.WriteLine(match.ToString());
//or in LINQ
var result = from Match match in Regex.Matches(line, "\"([^\"]*)\"")
select match.ToString();
I like the regex solutions. You could also think of something like this
string str = "Would \"you\" like to have responses to your \"questions\" sent to you via email?";
var stringArray = str.Split('"');
Then take the odd
elements from the array. If you use linq, you can do it like this:
var stringArray = str.Split('"').Where((item, index) => index % 2 != 0);
Based on @Ria 's answer:
static void Main(string[] args)
{
string str = "Would \"you\" like to have responses to your \"questions\" sent to you via email?";
var reg = new Regex("\".*?\"");
var matches = reg.Matches(str);
foreach (var item in matches)
{
Console.WriteLine(item.ToString());
}
}
The output is:
"you"
"questions"
You can use string.TrimStart() and string.TrimEnd() to remove double quotes if you don't want it.
I combine Regex and Trim:
const string searchString = "This is a \"search text\" and \"another text\" and not \"this text";
var collection = Regex.Matches(searchString, "\\\"(.*?)\\\"");
foreach (var item in collection)
{
Console.WriteLine(item.ToString().Trim('"'));
}
Result:
search text
another text
This also steals the Regex from @Ria, but allows you to get them into an array where you then remove the quotes:
strText = "Would \"you\" like to have responses to your \"questions\" sent to you via email?";
MatchCollection mc = Regex.Matches(strText, "\"([^\"]*)\"");
for (int z=0; z < mc.Count; z++)
{
Response.Write(mc[z].ToString().Replace("\"", ""));
}
I needed to do this in C# for parsing CSV and none of these worked for me so I came up with this:
\s*(?:(?:(['"])(?<value>(?:\\\1|[^\1])*?)\1)|(?<value>[^'",]+?))\s*(?:,|$)
This will parse out a field with or without quotes and will exclude the quotes from the value while keeping embedded quotes and commas. <value>
contains the parsed field value. Without using named groups, either group 2 or 3 contains the value.
There are better and more efficient ways to do CSV parsing and this one will not be effective at identifying bad input. But if you can be sure of your input format and performance is not an issue, this might work for you.