问题
I have a list of unicode file paths in which I need to replace all umlauts with an English diacritic. For example, I would ü with ue, ä with ae and so on. I have defined a dictionary of umlauts (keys) and their diacritics (values). So I need to compare each key to each file path and where the key is found, replace it with the value. This seems like it would be simple, but I can't get it to work. Does anyone out there have any ideas? Any feedback is greatly appreciated!
code so far:
# -*- coding: utf-8 -*-
import os
def GetFilepaths(directory):
"""
This function will generate all file names a directory tree using os.walk.
It returns a list of file paths.
"""
file_paths = []
for root, directories, files in os.walk(directory):
for filename in files:
filepath = os.path.join(root, filename)
file_paths.append(filepath)
return file_paths
# dictionary of umlaut unicode representations (keys) and their replacements (values)
umlautDictionary = {u'Ä': 'Ae',
u'Ö': 'Oe',
u'Ü': 'Ue',
u'ä': 'ae',
u'ö': 'oe',
u'ü': 'ue'
}
# get file paths in root directory and subfolders
filePathsList = GetFilepaths(u'C:\\Scripts\\Replace Characters\\Umlauts')
for file in filePathsList:
for key, value in umlautDictionary.iteritems():
if key in file:
file.replace(key, value) # does not work -- umlauts still in file path!
print file
回答1:
The replace
method returns a new string, it does not modify the original string.
So you would need
file = file.replace(key, value)
instead of just file.replace(key, value)
.
Note also that you could use the translate method to do all the replacements at once, instead of using a for-loop
:
In [20]: umap = {ord(key):unicode(val) for key, val in umlautDictionary.items()}
In [21]: umap
Out[21]: {196: u'Ae', 214: u'Oe', 220: u'Ue', 228: u'ae', 246: u'oe', 252: u'ue'}
In [22]: print(u'ÄÖ'.translate(umap))
AeOe
So you could use
umap = {ord(key):unicode(val) for key, val in umlautDictionary.items()}
for filename in filePathsList:
filename = filename.translate(umap)
print(filename)
回答2:
Replace the line
file.replace(key, value)
with:
file = file.replace(key, value)
This is because strings are immutable in Python.
Which means that file.replace(key, value)
returns a copy of file
with replacements made.
来源:https://stackoverflow.com/questions/33281430/python-transliterate-german-umlauts-to-diacritic