comparing strings in PostgreSQL

后端 未结 2 1058
花落未央
花落未央 2021-01-25 16:44

Is there any way in PostgreSQL to convert UTF-8 characters to \"similar\" ASCII characters?

String glāžšķūņu rūķīši would have to be converted to glaz

相关标签:
2条回答
  • 2021-01-25 17:47

    Use pg_collkey() for ICU supported unicode compare: - http://www.public-software-group.org/pg_collkey - http://russ.garrett.co.uk/tag/postgresql/

    0 讨论(0)
  • 2021-01-25 17:48

    I found different ways to do this on the PostgreSQL Wiki.

    In plperl:

    CREATE OR REPLACE FUNCTION unaccent_string(text) RETURNS text AS $$
    my ($input_string) = @_;
    $input_string =~ s/[âãäåāăą]/a;
    $input_string =~ s/[ÁÂÃÄÅĀĂĄ]/A;
    $input_string =~ s/[èééêëēĕėęě]/e;
    $input_string =~ s/[ĒĔĖĘĚ]/E;
    $input_string =~ s/[ìíîïìĩīĭ]/i;
    $input_string =~ s/[ÌÍÎÏÌĨĪĬ]/I;
    $input_string =~ s/[óôõöōŏő]/o;
    $input_string =~ s/[ÒÓÔÕÖŌŎŐ]/O;
    $input_string =~ s/[ùúûüũūŭů]/u;
    $input_string =~ s/[ÙÚÛÜŨŪŬŮ]/U;
    return $input_string;
    $$ LANGUAGE plperl;
    

    In pure SQL:

    CREATE OR REPLACE FUNCTION unaccent_string(text)
    RETURNS text
    IMMUTABLE
    STRICT
    LANGUAGE SQL
    AS $$
    SELECT translate(
        $1,
        'âãäåāăąÁÂÃÄÅĀĂĄèééêëēĕėęěĒĔĖĘĚìíîïìĩīĭÌÍÎÏÌĨĪĬóôõöōŏőÒÓÔÕÖŌŎŐùúûüũūŭůÙÚÛÜŨŪŬŮ',
        'aaaaaaaaaaaaaaaeeeeeeeeeeeeeeeiiiiiiiiiiiiiiiiooooooooooooooouuuuuuuuuuuuuuuu'
    );
    $$;
    

    And in plpython:

    create or replace function unaccent(text) returns text language plpythonu as $$
    import unicodedata
    rv = plpy.execute("select setting from pg_settings where name = 'server_encoding'");
    encoding = rv[0]["setting"]
    s = args[0].decode(encoding)
    s = unicodedata.normalize("NFKD", s)
    s = ''.join(c for c in s if ord(c) < 127)
    return s
    $$;
    

    In your case, a translate() call with all the characters you can find in the UTF-8 table should be enough.

    0 讨论(0)
提交回复
热议问题