How to write a function that can cut a string with HTML tags to an N-length string without breaking HTML tags while doing it.
The returned string doesn\'t need to b
Here is JavaScript solution: trimHtml
In javascript, you can use the textContent property of DOM elements to obtain this.
HTML
<p id='mytext'>Hey <a href="#">Visit Croatia</a> today</p>
Javascript
var el = document.getElementById("mytext");
console.log( el.textContent );
//alert( el.textContent ); // if you don't have firebug.
static string CutIt(string s, int limit)
{
s = s.Substring(0, limit);
int openMark = s.LastIndexOf('<');
if (openMark != -1)
{
int closeMark = s.LastIndexOf('>');
if (openMark > closeMark)
{
s = s.Substring(0, openMark);
}
}
return s.Trim();
}
public static void Main()
{
Console.WriteLine(
CutIt("Visit <a href=\"www.htz.hr\">Croatia</a> this summer.", 9)
); // prints "Visit"
}
When I encountered such problem (for RSS feed) I just called strip_tags before cutting my string.
I solved the problem so here is the code in c#;
static string CutIt(string s, int limit)
{
if (s.Length < limit) return s;
int okIndex = 0;
bool inClosingTag = false;
int numOpenTags = 0;
for (int i = 0; i < limit; i++)
{
if (s[i]=='<')
{
if (s[i+1]=='/')
{
inClosingTag = true;
}
else
{
numOpenTags++;
}
}
if (s[i]=='>')
{
if (s[i-1]=='/')
{
numOpenTags--;
}
if (inClosingTag)
{
numOpenTags--;
}
}
if (numOpenTags == 0) okIndex = i;
}
return s.Substring(0, okIndex + 1);
}
This might be overkill, but try looking up AWK, it can do this kind of things pretty easily since it's centered around processing text.
You can also write a custom parsing script like
string s = "Visit <a href="www.htz.hr">Croatia</a> this summer."
result = ""
slice_limit = 9
i= 0
j = 0
in_tag = false
while i < slice_limit and j < s.size do
if s[j] == "<" then in_tag = true
if in_tag and s[i]==">" then in_tag = false
if !in_tag then i++
result += s[j]
end
... or something like that (haven't tested, but it gives you the idea).
EDIT: You will also have to add something to detect if the tag is closed or not (just add a flag like in_tag and mix it with some regular expression and it should work) Hope that helps
EDIT2: if you gave the language you want to use, that could be helpful. javascript?