问题
Here I've found a grate way to HTML encode/escape special chars. Now I wonder how to unescape HTML encoded text in C++?
So codebase is:
#include <algorithm>
namespace xml {
// Helper for null-terminated ASCII strings (no end of string iterator).
template<typename InIter, typename OutIter>
OutIter copy_asciiz ( InIter begin, OutIter out )
{
while ( *begin != '\0' ) {
*out++ = *begin++;
}
return (out);
}
// XML escaping in it's general form. Note that 'out' is expected
// to an "infinite" sequence.
template<typename InIter, typename OutIter>
OutIter escape ( InIter begin, InIter end, OutIter out )
{
static const char bad[] = "&<>";
static const char* rep[] = {"&", "<", ">"};
static const std::size_t n = sizeof(bad)/sizeof(bad[0]);
for ( ; (begin != end); ++begin )
{
// Find which replacement to use.
const std::size_t i =
std::distance(bad, std::find(bad, bad+n, *begin));
// No need for escaping.
if ( i == n ) {
*out++ = *begin;
}
// Escape the character.
else {
out = copy_asciiz(rep[i], out);
}
}
return (out);
}
}
and
#include <iterator>
#include <string>
namespace xml {
// Get escaped version of "content".
std::string escape ( const std::string& content )
{
std::string result;
result.reserve(content.size());
escape(content.begin(), content.end(), std::back_inserter(result));
return (result);
}
// Escape data on the fly, using "constant" memory.
void escape ( std::istream& in, std::ostream& out )
{
escape(std::istreambuf_iterator<char>(in),
std::istreambuf_iterator<char>(),
std::ostreambuf_iterator<char>(out));
}
}
Its - grate peace of code - it works for:
#include <iostream>
int main ( int, char ** )
{
std::cout << xml::escape("<foo>bar & qux</foo>") << std::endl;
}
So I wonder - how to make HTML unescape in such manner?
回答1:
Take a look at how I've solved a similar problem for '&#(\d+);'
strings i.e., numeric character references (NCRs) using boost::spirit, boost::regex_token_iterator, Flex, Perl.
In your case the regex is &(amp|lt|gt);
if you don't need to convert all html entities.
来源:https://stackoverflow.com/questions/7976445/so-weve-got-our-html-escape-functions-that-really-work-in-a-c-manner-how-to