How to remove diacritics from text?

前端 未结 9 1614
余生分开走
余生分开走 2020-11-29 04:56

I am making a swedish website, and swedish letters are å, ä, and ö.

I need to make a string entered by a user to become url-safe with PHP.

Basically, need to

相关标签:
9条回答
  • 2020-11-29 05:12

    If intl php extension is enabled, you can use Transliterator like this :

    protected function removeDiacritics($string)
    {
        $transliterator = \Transliterator::create('NFD; [:Nonspacing Mark:] Remove; NFC;');
        return $transliterator->transliterate($string);
    }
    

    To remove other special chars (not diacritics only like 'æ')

    protected function removeDiacritics($string)
    {
        $transliterator = \Transliterator::createFromRules(
            ':: Any-Latin; :: Latin-ASCII; :: NFD; :: [:Nonspacing Mark:] Remove; :: NFC;',
            \Transliterator::FORWARD
        );
        return $transliterator->transliterate($string);
    }
    
    0 讨论(0)
  • 2020-11-29 05:13

    If you're just interested in making things URL safe, then you want urlencode.

    Returns a string in which all non-alphanumeric characters except -_. have been replaced with a percent (%) sign followed by two hex digits and spaces encoded as plus (+) signs. It is encoded the same way that the posted data from a WWW form is encoded, that is the same way as in application/x-www-form-urlencoded media type. This differs from the » RFC 1738 encoding (see rawurlencode()) in that for historical reasons, spaces are encoded as plus (+) signs.

    If you really want to strip all non A-Z, a-z, 1-9 (what's wrong with 0, by the way?), then you want:

    $mynewstring = preg_replace('/[^A-Za-z1-9]/', '', $str);
    
    0 讨论(0)
  • 2020-11-29 05:16

    as simple as

     $str = str_replace(array('å', 'ä', 'ö'), array('a', 'a', 'o'), $str); 
     $str = preg_replace('/[^a-z0-9]+/', '_', strtolower($str));
    

    assuming you use the same encoding for your data and your code.

    0 讨论(0)
  • 2020-11-29 05:23
    // normalize data (remove accent marks) using PHP's *intl* extension
    $data = normalizer_normalize($data);
    
    // replace everything NOT in the sets you specified with an underscore
    $data = preg_replace("#[^A-Za-z1-9]#","_", $data);
    
    0 讨论(0)
  • 2020-11-29 05:29

    Use iconv to convert strings from a given encoding to ASCII, then replace non-alphanumeric characters using preg_replace:

    $input = 'räksmörgås och köttbullar'; // UTF8 encoded
    $input = iconv('UTF-8', 'ASCII//TRANSLIT', $input);
    $input = preg_replace('/[^a-zA-Z0-9]/', '_', $input);
    echo $input;
    

    Result:

    raksmorgas_och_kottbullar
    
    0 讨论(0)
  • 2020-11-29 05:32

    One simple solution is to use str_replace function with search and replace letter arrays.

    0 讨论(0)
提交回复
热议问题