PHP readdir with european characters

假装没事ソ 提交于 2019-12-23 13:40:08

问题


I get images files which have Czech characters in the filename (eg, ěščřžýáíé) and I want to rename them without the accents so that they are more compatible for the web. I thought I could use a simple str_replace function but it doesn't seem to work the same with the file array as it does with a string literal.

I read the files with readdir, after checking for extension.

function readFiles($dir, $ext = false) {
    if (is_dir($dir)) {
        if ($dh = opendir($dir)) {
            while (($file = readdir($dh)) !== false) {
                if($ext){  
                    if(end(explode('.', $file)) == $ext) {
                        $f[] = $file;
                    }
                } else {
                    $f[] = $file;
                }
            }

            closedir($dh);
            return $f;
        } else {
            return false;
        }
    } else {
        return false;
    }
}

$files = readFiles(".", "jpg");

$search = array('š','á','ž','í','ě','é','ř','ň','ý','č',' ');
$replace = array('s','a','z','i','e','e','r','n','y','c','-');

$string = "čšěáýísdjksnalci sášěééalskcnkkjy+ěéší";
$safe_string = str_replace($search, $replace, $string);

echo '<pre>';

foreach($files as $fl) {
    $safe_files[] = str_replace($search, $replace, $fl);
}

var_dump($files);
var_dump($safe_files);

var_dump($string);
var_dump($safe_string);

echo '</pre>';

Output

array(6) {
  [0]=>
  string(21) "Hl�vka s listem01.jpg"
  [1]=>
  string(23) "Hl�vky v atelieru02.jpg"
  [2]=>
  string(17) "Jarn� v�hon03.jpg"
  [3]=>
  string(17) "Mlad� chmel04.jpg"
  [4]=>
  string(23) "Stavba chmelnice 05.jpg"
  [5]=>
  string(21) "Zimni chmelnice06.jpg"
}
array(6) {
  [0]=>
  string(21) "Hl�vka-s-listem01.jpg"
  [1]=>
  string(23) "Hl�vky-v-atelieru02.jpg"
  [2]=>
  string(17) "Jarn�-v�hon03.jpg"
  [3]=>
  string(17) "Mlad�-chmel04.jpg"
  [4]=>
  string(23) "Stavba-chmelnice-05.jpg"
  [5]=>
  string(21) "Zimni-chmelnice06.jpg"
}
string(53) "čšěáýísdjksnalci sášěééalskcnkkjy+ěéší"
string(38) "cseayisdjksnalci-saseeealskcnkkjy+eesi"

Right now I'm running on WAMP but answers that work across platforms are even better :)


回答1:


According to the 0xFFFD marks (which appears in Firefox as diamonds with a question mark inside) you already aren't reading them using the correct encoding (which would be Unicode / UTF-8). As far I found this bug, it seems to be related.

Here's another SO topic about that: php readdir problem with japanese language file name

To the point, wait until they get PHP6 stable and then use it.

Unrelated to the problem: the Normalizer is a better tool to get rid of diacritical marks.




回答2:


If it works with strings but not with arrays, just applies it on strings :-)

$search = array('š','á','ž','í','ě','é','ř','ň','ý','č',' ');
$replace = array('s','a','z','i','e','e','r','n','y','c','-');

len = count($safe_files)

for ($i=0; $i<len; $i++)
    $safe_files[$i] = str_replace($search, $replace, $safe_files[$i]);

I think str_replace accept arrays only for the 2 first params, and not the last. I may be wrong, but anyway this should work.

If by any mean, you have a real encoding problem, it could just be that you OS use a single byte encoding while your source file use another, probably UTF-8.

In that case, do something like :

$search = array('š','á','ž','í','ě','é','ř','ň','ý','č',' ');
$replace = array('s','a','z','i','e','e','r','n','y','c','-');

$code_encoding = "UTF-8"; // this is my guess, but put whatever is yours
$os_encoding = "CP-1250"; // this is my guess, but put whatever is yours

len = count($safe_files)

for ($i=0; $i<len; $i++)
{
    $safe_files[$i] = iconv($os_encoding , $code_encoding, $safe_files[$i]); // convert before replace
    /*
     ALternatively :
     $safe_files[$i] = mb_convert_encoding($safe_files[$i], $code_encoding , $os_encoding );
    */
    $safe_files[$i] = str_replace($search, $replace, $safe_files[$i]);
}

mb_convert_encoding() require the ext/mbstring extension and iconv() require ext/iconv.




回答3:


Not directly an answer to your question maybe but you might want to take a look at the iconv() function in PHP and more in particulare the //TRANSLIT option that you can append to the second argument. I've used it several times turning french and eastern europe strings to their a-z and url friendly counterparts.

From PHP.net (http://www.php.net/manual/en/function.iconv.php)

If you append the string //TRANSLIT to out_charset transliteration is activated. This means that when a character can't be represented in the target charset, it can be approximated through one or several similarly looking characters.




回答4:


Your source code (and the test string) appear to be in utf8, while file names seem to use a single-byte encoding. I'd suggest you use the same encoding for your replacement string. To avoid source encoding issues, it'd better to write accented chars in your code in a hex form (like \xE8 for "č" etc).




回答5:


So I got it working on my Windows XP system by this

$search = array('š','á','ž','í','e','é','r','n','ý','c',' ');
$replace = array('s','a','z','i','e','e','r','n','y','c','-');

$files = readFiles(".", "jpg");
$len = count($files);

for($i = 0; $i < $len; $i++){
  if(mb_check_encoding($files[$i], 'ASCII')){
    $safe_files[$i] = $files[$i];
  }else{
    $safe_files[$i] = str_replace(
        $search, $replace, iconv("iso-8859-1", "utf-8//TRANSLIT", $files[$i]));
  }
  if($files[$i] != $safe_files[$i]){
    rename($files[$i], $safe_files[$i]);
  }
}

I don't know if it's a conincidence or not, but calling mb_get_info() shows

[internal_encoding] => ISO-8859-1




回答6:


Here is another function I found helpful on the PHP strtr page

<?
// Windows-1250 to ASCII
// This function replace all Windows-1250 accent characters with
// thier non-accent ekvivalents. Useful for Czech and Slovak languages.

function win2ascii($str)    {   

$str = StrTr($str,
    "\xE1\xE8\xEF\xEC\xE9\xED\xF2",
    "\x61\x63\x64\x65\x65\x69\x6E");

$str = StrTr($str,
    "\xF3\xF8\x9A\x9D\xF9\xFA\xFD\x9E\xF4\xBC\xBE",
    "\x6F\x72\x73\x74\x75\x75\x79\x7A\x6F\x4C\x6C");

$str = StrTr($str,
    "\xC1\xC8\xCF\xCC\xC9\xCD\xC2\xD3\xD8",
    "\x41\x43\x44\x45\x45\x49\x4E\x4F\x52");

$str = StrTr($str,
    "\x8A\x8D\xDA\xDD\x8E\xD2\xD9\xEF\xCF",
    "\x53\x54\x55\x59\x5A\x4E\x55\x64\x44");

return $str;
}
?>

Basically, it wasn't such a problem to convert the european characters to an ascii equivilent, but I could find no reliable way to rename the files (ie, reference files with non-ascii characters).




回答7:


For UTF-8 use the PHP function utf8_encode. Microsoft Windows uses ISO-8859-1 so in this case a conversion is necessary.

Example - listing the files in a dir:

<?php
$dir_handle = opendir(".");
while (false !== ($file = readdir($dir_handle)))
{
  echo utf8_encode($file)."<br>";
}
?>



回答8:


Area5one has it right - it's a problem of different encoding.

When I upgraded my machine from XP to Win7, I also upgraded my version of MySQL and PHP. Somewhere along the way, PHP programs that used to work stopped working. In particular, scandir, readdir and utf-8 had lived happily together, but no longer.

So, I've modified my code. Variables related to data taken from the hard disk end in "_iso" to reflecct Windows' ISO-8859-1 encoding, data from the MySQL database goes in variables ending in "_utf". Thus, the code from area5one would like this: $dir_handle_iso = opendir("."); while (false !== ($file_iso = readdir($dir_handle_iso))) { $file_utf = utf8_encode($file); ... }




回答9:


This works for me 100%:

setlocale(LC_ALL,"cs_CZ");
$new_str = iconv("UTF-8","ASCII//TRANSLIT",$orig_str);



回答10:


$file = mb_convert_encoding($file, 'UTF-8', "iso-8859-1"); Worked for me (Windows, Danish characters).



来源:https://stackoverflow.com/questions/1766863/php-readdir-with-european-characters

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!