I am new to C#, but I have a requirement to cut the strings to be <= 80 characters AND they must keep the words integrity (without cutting them)
Examples
string truncatedText = text.Substring(0, 80); // truncate to 80 characters
if (text[80] != ' ') // don't remove last word if a space occurs after it in the original string (in other words, last word is already complete)
truncatedText = truncatedText.Substring(0, truncatedText.LastIndexOf(' ')); // remove any cut-off words
Updated to fix issue from comments where last word could get cut off even if it is complete.
This isn't using regex but this is how I would do it:
Use String.LastIndexOf to get the last space before the 81st char.
If the 81th char is a space then take it until 80.
if it returns a number > -1 cut it off there.
If it's -1 you-have-a-really-long-word-or-someone-messing-with-the-system so you do wathever you like.
Here's an approach without using Regex: just split the string (however you'd like) into whatever you consider "words" to be. Then, just start concatenating them together using a StringBuilder
, checking for your desired length, until you can't add the next "word". Then, just return the string that you have built up so far.
(Untested code ahead)
public string TruncateWithPreservation(string s, int len)
{
string[] parts = s.Split(' ');
StringBuilder sb = new StringBuilder();
foreach (string part in parts)
{
if (sb.Length + part.Length > len)
break;
sb.Append(' ');
sb.Append(part);
}
return sb.ToString();
}
Try
^(.{0,80})(?: |$)
This is a capturing greedy match which must be followed by a space or end of string. You could also use a zero-width lookahead assertion, as in
^.{0,80}(?= |$)
If you use a live test tool like http://regexhero.net/tester/ it's pretty cool, you can actually see it jump back to the word boundary as you type beyond 80 characters.
And here's one which will simply truncate at the 80th character if there are no word boundaries (spaces) to be found:
^(.{1,80}(?: |$)|.{80})