Remove BOM () from imported .csv file

后端 未结 6 1611
温柔的废话
温柔的废话 2020-11-27 20:40

I want to delete the BOM from my imported file, but it just doesn\'t seem to work.

I tried to preg_replace(\'/[\\x00-\\x1F\\x80-\\xFF]/\', \'\', $file);

相关标签:
6条回答
  • 2020-11-27 20:49

    Read data with file_get_contents then use mb_convert_encoding to convert to UTF-8

    UPDATE

    $filepath = get_bloginfo('template_directory')."/testing.csv";
    $fileContent = file_get_contents($filepath);
    $fileContent = mb_convert_encoding($fileContent, "UTF-8");
    $lines = explode("\n", $fileContent);
    foreach($lines as $line) {
        $conls = explode(";", $line);
        // etc...
    }
    
    0 讨论(0)
  • 2020-11-27 20:52

    Try this:

    function removeBomUtf8($s){
      if(substr($s,0,3)==chr(hexdec('EF')).chr(hexdec('BB')).chr(hexdec('BF'))){
           return substr($s,3);
       }else{
           return $s;
       }
    }
    
    0 讨论(0)
  • 2020-11-27 20:53

    Correct way is to skip BOM if present in file (https://www.php.net/manual/en/function.fgetcsv.php#122696):

    ini_set('auto_detect_line_endings',TRUE);
    $file = fopen($filepath, "r") or die("Error opening file");
    if (fgets($file, 4) !== "\xef\xbb\xbf") //Skip BOM if present
            rewind($file); //Or rewind pointer to start of file
    
    $i = 0;
    while(($line = fgetcsv($file, 1000, ";")) !== FALSE) {
        ...
    }
    
    0 讨论(0)
  • 2020-11-27 20:53

    Using @Tomas'z answer as the main inspiration for this, and @Nolwennig's comment:

    // Strip byte order marks from a string
    function strip_bom($string, $type = 'utf8') {
        $length = 0;
    
        switch($type) {
            case 'utf8':
                $length = substr($string, 0, 3) === chr(0xEF) . chr(0xBB) . chr(0xBF) ? 3 : 0;
            break;
    
            case 'utf16_little_endian':
                $length = substr($string, 0, 2) === chr(0xFF) . chr(0xFE) ? 2 : 0;
            break;
        }
    
        return $length ? substr($string, $length) : $string;
    }
    
    0 讨论(0)
  • 2020-11-27 20:55

    Isn't the BOM there to give you a clue on how to reencode the input to something your script/app/database needs? Just deleting isn't gonna help.

    This is how I force a string (drawn from a file with file_get_contents()) to be encoded in UTF-8 and get rid of the BOM as well:

    switch (true) { 
        case (substr($string,0,3) == "\xef\xbb\xbf") :
            $string = substr($string, 3);
            break;
        case (substr($string,0,2) == "\xfe\xff") :                            
            $string = mb_convert_encoding(substr($string, 2), "UTF-8", "UTF-16BE");
            break;
        case (substr($string,0,2) == "\xff\xfe") :                            
            $string = mb_convert_encoding(substr($string, 2), "UTF-8", "UTF-16LE");
            break;
        case (substr($string,0,4) == "\x00\x00\xfe\xff") :
            $string = mb_convert_encoding(substr($string, 4), "UTF-8", "UTF-32BE");
            break;
        case (substr($string,0,4) == "\xff\xfe\x00\x00") :
            $string = mb_convert_encoding(substr($string, 4), "UTF-8", "UTF-32LE");
            break;
        default:
            $string = iconv(mb_detect_encoding($string, mb_detect_order(), true), "UTF-8", $string);
    };
    
    0 讨论(0)
  • 2020-11-27 21:11

    If the character encoding functions don't work for you (as is the case for me in some situations) and you know for a fact that your file always has a BOM, you can simply use an fseek() to skip the first 3 bytes, which is the length of the BOM.

    $fp = fopen("testing.csv", "r");
    fseek($fp, 3);
    

    You should also not use explode() to split your CSV lines and columns because if your column contains the character by which you split, you will get an incorrect result. Use this instead:

    while (!feof($fp)) {
        $arrayLine = fgetcsv($fp, 0, ";", '"');
        ...
    }
    
    0 讨论(0)
提交回复
热议问题