问题
I need to find out whether a name starts with any of a list's prefixes and then remove it, like:
if name[:2] in ["i_", "c_", "m_", "l_", "d_", "t_", "e_", "b_"]:
name = name[2:]
The above only works for list prefixes with a length of two. I need the same functionality for variable-length prefixes.
How is it done efficiently (little code and good performance)?
A for loop iterating over each prefix and then checking name.startswith(prefix)
to finally slice the name according to the length of the prefix works, but it's a lot of code, probably inefficient, and "non-Pythonic".
Does anybody have a nice solution?
回答1:
A bit hard to read, but this works:
name=name[len(filter(name.startswith,prefixes+[''])[0]):]
回答2:
str.startswith(prefix[, start[, end]])¶
Return True if string starts with the prefix, otherwise return False. prefix can also be a tuple of prefixes to look for. With optional start, test string beginning at that position. With optional end, stop comparing string at that position.
$ ipython
Python 3.5.2 (default, Nov 23 2017, 16:37:01)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.4.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: prefixes = ("i_", "c_", "m_", "l_", "d_", "t_", "e_", "b_")
In [2]: 'test'.startswith(prefixes)
Out[2]: False
In [3]: 'i_'.startswith(prefixes)
Out[3]: True
In [4]: 'd_a'.startswith(prefixes)
Out[4]: True
回答3:
for prefix in prefixes:
if name.startswith(prefix):
name=name[len(prefix):]
break
回答4:
Regexes will likely give you the best speed:
prefixes = ["i_", "c_", "m_", "l_", "d_", "t_", "e_", "b_", "also_longer_"]
re_prefixes = "|".join(re.escape(p) for p in prefixes)
m = re.match(re_prefixes, my_string)
if m:
my_string = my_string[m.end()-m.start():]
回答5:
If you define prefix to be the characters before an underscore, then you can check for
if name.partition("_")[0] in ["i", "c", "m", "l", "d", "t", "e", "b", "foo"] and name.partition("_")[1] == "_":
name = name.partition("_")[2]
回答6:
What about using filter
?
prefs = ["i_", "c_", "m_", "l_", "d_", "t_", "e_", "b_"]
name = list(filter(lambda item: not any(item.startswith(prefix) for prefix in prefs), name))
Note that the comparison of each list item against the prefixes efficiently halts on the first match. This behaviour is guaranteed by the any
function that returns as soon as it finds a True
value, eg:
def gen():
print("yielding False")
yield False
print("yielding True")
yield True
print("yielding False again")
yield False
>>> any(gen()) # last two lines of gen() are not performed
yielding False
yielding True
True
Or, using re.match
instead of startswith
:
import re
patt = '|'.join(["i_", "c_", "m_", "l_", "d_", "t_", "e_", "b_"])
name = list(filter(lambda item: not re.match(patt, item), name))
回答7:
Regex, tested:
import re
def make_multi_prefix_matcher(prefixes):
regex_text = "|".join(re.escape(p) for p in prefixes)
print repr(regex_text)
return re.compile(regex_text).match
pfxs = "x ya foobar foo a|b z.".split()
names = "xenon yadda yeti food foob foobarre foo a|b a b z.yx zebra".split()
matcher = make_multi_prefix_matcher(pfxs)
for name in names:
m = matcher(name)
if not m:
print repr(name), "no match"
continue
n = m.end()
print repr(name), n, repr(name[n:])
Output:
'x|ya|foobar|foo|a\\|b|z\\.'
'xenon' 1 'enon'
'yadda' 2 'dda'
'yeti' no match
'food' 3 'd'
'foob' 3 'b'
'foobarre' 6 're'
'foo' 3 ''
'a|b' 3 ''
'a' no match
'b' no match
'z.yx' 2 'yx'
'zebra' no match
回答8:
When it comes to search and efficiency always thinks of indexing techniques to improve your algorithms. If you have a long list of prefixes you can use an in-memory index by simple indexing the prefixes by the first character into a dict
.
This solution is only worth if you had a long list of prefixes and performance becomes an issue.
pref = ["i_", "c_", "m_", "l_", "d_", "t_", "e_", "b_"]
#indexing prefixes in a dict. Do this only once.
d = dict()
for x in pref:
if not x[0] in d:
d[x[0]] = list()
d[x[0]].append(x)
name = "c_abcdf"
#lookup in d to only check elements with the same first character.
result = filter(lambda x: name.startswith(x),\
[] if name[0] not in d else d[name[0]])
print result
回答9:
This edits the list on the fly, removing prefixes. The break
skips the rest of the prefixes once one is found for a particular item.
items = ['this', 'that', 'i_blah', 'joe_cool', 'what_this']
prefixes = ['i_', 'c_', 'a_', 'joe_', 'mark_']
for i,item in enumerate(items):
for p in prefixes:
if item.startswith(p):
items[i] = item[len(p):]
break
print items
Output
['this', 'that', 'blah', 'cool', 'what_this']
回答10:
Could use a simple regex.
import re
prefixes = ("i_", "c_", "longer_")
re.sub(r'^(%s)' % '|'.join(prefixes), '', name)
Or if anything preceding an underscore is a valid prefix:
name.split('_', 1)[-1]
This removes any number of characters before the first underscore.
回答11:
import re
def make_multi_prefix_replacer(prefixes):
if isinstance(prefixes,str):
prefixes = prefixes.split()
prefixes.sort(key = len, reverse=True)
pat = r'\b(%s)' % "|".join(map(re.escape, prefixes))
print 'regex patern :',repr(pat),'\n'
def suber(x, reg = re.compile(pat)):
return reg.sub('',x)
return suber
pfxs = "x ya foobar yaku foo a|b z."
replacer = make_multi_prefix_replacer(pfxs)
names = "xenon yadda yeti yakute food foob foobarre foo a|b a b z.yx zebra".split()
for name in names:
print repr(name),'\n',repr(replacer(name)),'\n'
ss = 'the yakute xenon is a|bcdf in the barfoobaratu foobarii'
print '\n',repr(ss),'\n',repr(replacer(ss)),'\n'
来源:https://stackoverflow.com/questions/7539959/finding-whether-a-string-starts-with-one-of-a-lists-variable-length-prefixes