I\'ve found this very interesting function on internet:
CREATE OR REPLACE FUNCTION strip_tags(TEXT) RETURNS TEXT AS $$
SELECT regexp_replace(regexp_replace($
This classic quote may apply here: Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems. Regex are useful, but HTML parsing is not a job they're well suited for. Jeff Atwood explains this well. To strip tags from HTML correctly some kind of parsing is necessary.
What I would recommend is that you use a more powerful PL like PL/Perl
or PL/Pythonu
to invoke mature and well tested HTML-stripping libraries. For example, you could use Perl's HTML::Strip via a plperl
function that accepts text
and returns text
.
The quick and dirty way to handle this would be to use another layer of regexp_replace
expressions to convert entities. This will rapidly lead you down the path alluded to by Igor though, and is best avoided by using tools that aready exist. For example, if you use HTML::Strip
it'll use HTML::Entities
to convert entities for you as part of the process.