Make PHP pathinfo() return the correct filename if the filename is UTF-8

我与影子孤独终老i 提交于 2019-11-29 02:51:36

I have used these functions in PHP 5.3.3 - 5.3.18 to handle UTF-8 issue in basename() and pathinfo().


if (!function_exists("mb_basename"))
{
  function mb_basename($path)
  {
    $separator = " qq ";
    $path = preg_replace("/[^ ]/u", $separator."\$0".$separator, $path);
    $base = basename($path);
    $base = str_replace($separator, "", $base);
    return $base;
  }
}
if (!function_exists("mb_pathinfo"))
{
  function mb_pathinfo($path, $opt = "")
  {
    $separator = " qq ";
    $path = preg_replace("/[^ ]/u", $separator."\$0".$separator, $path);
    if ($opt == "") $pathinfo = pathinfo($path);
    else $pathinfo = pathinfo($path, $opt);

    if (is_array($pathinfo))
    {
      $pathinfo2 = $pathinfo;
      foreach($pathinfo2 as $key => $val)
      {
        $pathinfo[$key] = str_replace($separator, "", $val);
      }
    }
    else if (is_string($pathinfo)) $pathinfo = str_replace($separator, "", $pathinfo);
    return $pathinfo;
  }
}

before usage pathinfo

setlocale(LC_ALL,'en_US.UTF-8');
pathinfo($OriginalName, PATHINFO_FILENAME);
pathinfo($OriginalName, PATHINFO_BASENAME);

A temporary work-around for this problem appears to be to make sure there is a 'normal' character in front of the accented characters, like so:

function getFilename($path)
{
    // if there's no '/', we're probably dealing with just a filename
    // so just put an 'a' in front of it
    if (strpos($path, '/') === false)
    {
        $path_parts = pathinfo('a'.$path);
    }
    else
    {
        $path= str_replace('/', '/a', $path);
        $path_parts = pathinfo($path);
    }
    return substr($path_parts["filename"],1);
}

Note that we replace all occurrences of '/' with '/a' but this is okay, since we return starting at offset 1 of the result. Interestingly enough, the dirname part of pathinfo() does seem to work, so no workaround is needed there.

When process ansi characters, the function pathinfo do correctly.

Base this note, we will convert (encoding) input to ansi charaters and then still use function pathinfo to keep its whole things.

Finally, we will convert (decoding) output values to original format.

And demo as bellowing.

function _pathinfo($path, $options = null)
{
    $path = urlencode($path);
    $parts = null === $options ? pathinfo($path) : pathinfo($path, $options);
    foreach ($parts as $field => $value) {
        $parts[$field] = urldecode($value);
    }
    return $parts;
}
// calling
_pathinfo('すtest.jpg');
_pathinfo('すtest.jpg', PATHINFO_EXTENSION);
private function _pathinfo($path, $options = null) {
  $result = pathinfo(' ' . $path, $options);
  return substr($result, 1);
}

As the doc shows,

Caution

pathinfo() is locale aware, so for it to parse a path containing multibyte characters correctly, the matching locale must be set using the setlocale() function.

and the example in the manual

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!