rsplit on pandas series with regular expression not working

前端 未结 1 802
渐次进展
渐次进展 2021-01-15 19:02

rsplit on pandas series using regular expression not working. I want to split the series based on separator without removing separator.

df2= pd.Series([\'Ser         


        
相关标签:
1条回答
  • 2021-01-15 19:56

    Unfortunately, pd.Series.str.rsplit does not work as documented (v0.25, stable/v1+). The project's GitHub issue tracker has an open bug from Nov. 2019 that repots that rsplit is not working with regex patterns (v 0.24.2 and 0.25.2). Internally, the method is calling str.rsplit which does not support regular expressions.

    Luckily, the reporter jamespreed added a (homegrown) alternative function:

    def str_rsplit(arr, pat=None, n=None):
    
        if pat is None or len(pat) == 1:
            if n is None or n == 0:
                n = -1
            f = lambda x: x.rsplit(pat, n)
        else:
            if n is None or n == -1:
                n = 0
            regex = re.compile(pat)
            def f(x):
                s = regex.split(x)
                a, b = s[:-n], s[-n:]
                if not a:
                    return b
                ix = 0
                for a_ in a:
                    ix = x.find(a_, ix) + len(a_)
                x_ = [x[:ix]]
                return x_ + b
        return f
        res = _na_map(f, arr)
        return res
    
    0 讨论(0)
提交回复
热议问题