rsplit on pandas series using regular expression not working. I want to split the series based on separator without removing separator.
df2= pd.Series([\'Ser
Unfortunately, pd.Series.str.rsplit
does not work as documented (v0.25, stable/v1+). The project's GitHub issue tracker has an open bug from Nov. 2019 that repots that rsplit
is not working with regex patterns (v 0.24.2 and 0.25.2). Internally, the method is calling str.rsplit which does not support regular expressions.
Luckily, the reporter jamespreed added a (homegrown) alternative function:
def str_rsplit(arr, pat=None, n=None): if pat is None or len(pat) == 1: if n is None or n == 0: n = -1 f = lambda x: x.rsplit(pat, n) else: if n is None or n == -1: n = 0 regex = re.compile(pat) def f(x): s = regex.split(x) a, b = s[:-n], s[-n:] if not a: return b ix = 0 for a_ in a: ix = x.find(a_, ix) + len(a_) x_ = [x[:ix]] return x_ + b return f res = _na_map(f, arr) return res