I am trying to split a string in python before a specific word. For example, I would like to split the following string before \"path:\"
.
in_str = "path:bte00250 Alanine, aspartate and glutamate metabolism path:bte00330 Arginine and proline metabolism" in_list = in_str.split('path:') print ",path:".join(in_list)[1:]
You could do ["path:"+s for s in line.split("path:")[1:]]
instead of using a regex. (note that we skip first match, that has no "path:" prefix.
using a regular expression to split your string seems a bit overkill: the string split()
method may be just what you need.
anyway, if you really need to match a regular expression in order to split your string, you should use the re.split() method, which splits a string upon a regular expression match.
also, use a correct regular expression for splitting:
>>> line = 'path:bte00250 Alanine, aspartate and glutamate metabolism path:bte00330 Arginine and proline metabolism'
>>> re.split(' (?=path:)', line)
['path:bte00250 Alanine, aspartate and glutamate metabolism', 'path:bte00330 Arginine and proline metabolism']
the (?=...)
group is a lookahead assertion: the expression matches a space (note the space at the start of the expression) which is followed by the string 'path:'
, without consuming what follows the space.
This can be done without regular expressons. Given a string:
s = "path:bte00250 Alanine, aspartate ... path:bte00330 Arginine and ..."
We can temporarily replace the desired word with a placeholder. The placeholder is a single character, which we use to split by:
word, placeholder = "path:", "|"
s = s.replace(word, placeholder).split(placeholder)
s
# ['', 'bte00250 Alanine, aspartate ... ', 'bte00330 Arginine and ...']
Now that the string is split, we can rejoin the original word to each sub-string using a list comprehension:
["".join([word, i]) for i in s if i]
# ['path:bte00250 Alanine, aspartate ... ', 'path:bte00330 Arginine and ...']