Python: How to move a file with unicode filename to a unicode folder

随声附和 提交于 2019-11-29 10:26:34

The basic problem is the unconverted mix between Unicode and byte strings. The solutions can be converting to a single format or avoiding the problems using some trickery. All of my solutions include the glob and shutil standard library.

For the sake of example, I have some Unicode filenames ending with ods, and I want to move them to the subdirectory called א (Hebrew Aleph, a unicode character).

First solution - express directory name as byte string:

>>> import glob
>>> import shutil
>>> files=glob.glob('*.ods')      # List of Byte string file names
>>> for file in files:
...     shutil.copy2(file, 'א')   # Byte string directory name
... 

Second solution - convert the file names to Unicode:

>>> import glob
>>> import shutil
>>> files=glob.glob(u'*.ods')     # List of Unicode file names
>>> for file in files:
...     shutil.copy2(file, u'א')  # Unicode directory name

Credit to the Ezio Melotti, Python bug list.

Third solution - avoiding destination Unicode directory name

Although this isn't the best solution in my opinion, there is a nice trick here that's worth mentioning.

Change your directory to the destination directory using os.getcwd(), and then copy the files to it by referring to it as .:

# -*- coding: utf-8 -*-
import os
import shutil
import glob

os.chdir('א')                   # CD to the destination Unicode directory
print os.getcwd()               # DEBUG: Make sure you're in the right place
files=glob.glob('../*.ods')     # List of Byte string file names
for file in files:
        shutil.copy2(file, '.') # Copy each file
# Don't forget to go back to the original directory here, if it matters

Deeper explanation

The straightforward approach shutil.copy2(src, dest) fails because shutil concatenates a unicode with ASCII string without conversions:

>>> files=glob.glob('*.ods')
>>> for file in files:
...     shutil.copy2(file, u'א')
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/usr/lib/python2.6/shutil.py", line 98, in copy2
    dst = os.path.join(dst, os.path.basename(src))
  File "/usr/lib/python2.6/posixpath.py", line 70, in join
    path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd7 in position 1: 
                    ordinal not in range(128)

As seen before, this can be avoided when using 'א' instead of the Unicode u'א'

It this a bug?

In my opinion, this is bug, because Python cannot expect basedir names to be always str, not unicode. I have reported this as an issue in the Python buglist, and waiting for responses.

Further reading

Python's official Unicode HOWTO

Use Unicode string everywhere:

# -*- coding: utf-8 -*-
# source code ^^ encoding; it might be different from sys.getfilesystemencoding()
import glob
import os

srcdir = u'مصدر الدليل' # <-- unicode string
dstdir = os.path.join('..', u'κατάλογο προορισμού') # relative path
for path in glob.glob(os.path.join(srcdir, u'*.ext')):
    newpath = os.path.join(dstdir, os.path.basename(path))
    os.rename(path, newpath) # move file or directory; assume the same filesystem

There are many subtle details in moving files; see shutit.copy* functions. You could use one that is appropriate in your particular case and remove source files on success e.g., via os.remove().

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!