I need to find out whether a name starts with any of a list\'s prefixes and then remove it, like:
if name[:2] in [\"i_\", \"c_\", \"m_\", \"l_\", \"d_\", \"t
for prefix in prefixes:
if name.startswith(prefix):
name=name[len(prefix):]
break
Regex, tested:
import re
def make_multi_prefix_matcher(prefixes):
regex_text = "|".join(re.escape(p) for p in prefixes)
print repr(regex_text)
return re.compile(regex_text).match
pfxs = "x ya foobar foo a|b z.".split()
names = "xenon yadda yeti food foob foobarre foo a|b a b z.yx zebra".split()
matcher = make_multi_prefix_matcher(pfxs)
for name in names:
m = matcher(name)
if not m:
print repr(name), "no match"
continue
n = m.end()
print repr(name), n, repr(name[n:])
Output:
'x|ya|foobar|foo|a\\|b|z\\.'
'xenon' 1 'enon'
'yadda' 2 'dda'
'yeti' no match
'food' 3 'd'
'foob' 3 'b'
'foobarre' 6 're'
'foo' 3 ''
'a|b' 3 ''
'a' no match
'b' no match
'z.yx' 2 'yx'
'zebra' no match
What about using filter
?
prefs = ["i_", "c_", "m_", "l_", "d_", "t_", "e_", "b_"]
name = list(filter(lambda item: not any(item.startswith(prefix) for prefix in prefs), name))
Note that the comparison of each list item against the prefixes efficiently halts on the first match. This behaviour is guaranteed by the any
function that returns as soon as it finds a True
value, eg:
def gen():
print("yielding False")
yield False
print("yielding True")
yield True
print("yielding False again")
yield False
>>> any(gen()) # last two lines of gen() are not performed
yielding False
yielding True
True
Or, using re.match
instead of startswith
:
import re
patt = '|'.join(["i_", "c_", "m_", "l_", "d_", "t_", "e_", "b_"])
name = list(filter(lambda item: not re.match(patt, item), name))
This edits the list on the fly, removing prefixes. The break
skips the rest of the prefixes once one is found for a particular item.
items = ['this', 'that', 'i_blah', 'joe_cool', 'what_this']
prefixes = ['i_', 'c_', 'a_', 'joe_', 'mark_']
for i,item in enumerate(items):
for p in prefixes:
if item.startswith(p):
items[i] = item[len(p):]
break
print items
['this', 'that', 'blah', 'cool', 'what_this']
Could use a simple regex.
import re
prefixes = ("i_", "c_", "longer_")
re.sub(r'^(%s)' % '|'.join(prefixes), '', name)
Or if anything preceding an underscore is a valid prefix:
name.split('_', 1)[-1]
This removes any number of characters before the first underscore.
If you define prefix to be the characters before an underscore, then you can check for
if name.partition("_")[0] in ["i", "c", "m", "l", "d", "t", "e", "b", "foo"] and name.partition("_")[1] == "_":
name = name.partition("_")[2]