PostgreSQL replace HTML entities function

后端 未结 2 776
不思量自难忘°
不思量自难忘° 2021-02-06 01:46

I\'ve found this very interesting function on internet:

CREATE OR REPLACE FUNCTION strip_tags(TEXT) RETURNS TEXT AS $$
    SELECT regexp_replace(regexp_replace($         


        
2条回答
  •  名媛妹妹
    2021-02-06 02:34

    This classic quote may apply here: Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems. Regex are useful, but HTML parsing is not a job they're well suited for. Jeff Atwood explains this well. To strip tags from HTML correctly some kind of parsing is necessary.

    What I would recommend is that you use a more powerful PL like PL/Perl or PL/Pythonu to invoke mature and well tested HTML-stripping libraries. For example, you could use Perl's HTML::Strip via a plperl function that accepts text and returns text.

    The quick and dirty way to handle this would be to use another layer of regexp_replace expressions to convert entities. This will rapidly lead you down the path alluded to by Igor though, and is best avoided by using tools that aready exist. For example, if you use HTML::Strip it'll use HTML::Entities to convert entities for you as part of the process.

提交回复
热议问题