Detect encoding and make everything UTF-8

前端 未结 24 2374
暗喜
暗喜 2020-11-22 03:03

I\'m reading out lots of texts from various RSS feeds and inserting them into my database.

Of course, there are several different character encodings used in the fee

24条回答
  •  感情败类
    2020-11-22 03:05

    If you apply utf8_encode() to an already UTF-8 string, it will return garbled UTF-8 output.

    I made a function that addresses all this issues. It´s called Encoding::toUTF8().

    You don't need to know what the encoding of your strings is. It can be Latin1 (ISO 8859-1), Windows-1252 or UTF-8, or the string can have a mix of them. Encoding::toUTF8() will convert everything to UTF-8.

    I did it because a service was giving me a feed of data all messed up, mixing UTF-8 and Latin1 in the same string.

    Usage:

    require_once('Encoding.php');
    use \ForceUTF8\Encoding;  // It's namespaced now.
    
    $utf8_string = Encoding::toUTF8($utf8_or_latin1_or_mixed_string);
    
    $latin1_string = Encoding::toLatin1($utf8_or_latin1_or_mixed_string);
    

    Download:

    https://github.com/neitanod/forceutf8

    I've included another function, Encoding::fixUFT8(), which will fix every UTF-8 string that looks garbled.

    Usage:

    require_once('Encoding.php');
    use \ForceUTF8\Encoding;  // It's namespaced now.
    
    $utf8_string = Encoding::fixUTF8($garbled_utf8_string);
    

    Examples:

    echo Encoding::fixUTF8("Fédération Camerounaise de Football");
    echo Encoding::fixUTF8("Fédération Camerounaise de Football");
    echo Encoding::fixUTF8("FÃÂédÃÂération Camerounaise de Football");
    echo Encoding::fixUTF8("Fédération Camerounaise de Football");
    

    will output:

    Fédération Camerounaise de Football
    Fédération Camerounaise de Football
    Fédération Camerounaise de Football
    Fédération Camerounaise de Football
    

    I've transformed the function (forceUTF8) into a family of static functions on a class called Encoding. The new function is Encoding::toUTF8().

提交回复
热议问题