I\'m looking to split a string Series at different points depending on the length of certain substrings:
In [47]: df = pd.DataFrame([\'group9class1\', \'grou
Use a regular expression to split the string
import re
regex = re.compile("(class)")
str="group1class23"
# this will split the group and the class string by adding a space between them, and using a simple split on space.
split_string = re.sub(regex, " \\1", str).split(" ")
This will return the array:
['group9', 'class23']
So to append two new columns to your DataFrame
you can do:
new_cols = [re.sub(regex, " \\1", x).split(" ") for x in df.group_class]
df['group'], df['class'] = zip(*new_cols)
Which results in:
group_class group class
0 group9class1 group9 class1
1 group10class2 group10 class2
2 group11class20 group11 class20