pandas split by last delimiter

北战南征 提交于 2019-12-08 02:27:32

问题


I have the following column in a dataframe with different outputs"

col1
MLB|NBA|NFL
MLB|NBA
NFL|NHL|NBA|MLB

I would like to use the split function to split the column by the last pipe always so something like this:

col1           col2
MLB|NBA        NFL
MLB            NBA
NFL|NHL|NBA    MLB

回答1:


With Series.str.rsplit, limiting the number of splits.

df.col1.str.rsplit('|', 1, expand=True).rename(lambda x: f'col{x + 1}', axis=1)

If the above throws you a SyntaxError, it means you're on a python version older than 3.6 (shame on you!). Use instead

df.col1.str.rsplit('|', 1, expand=True)\
  .rename(columns=lambda x: 'col{}'.format(x + 1))

          col1 col2
0      MLB|NBA  NFL
1          MLB  NBA
2  NFL|NHL|NBA  MLB

There's also the faster loopy str.rsplit equivalent.

pd.DataFrame(
    [x.rsplit('|', 1) for x in df.col1.tolist()], 
    columns=['col1', 'col2']
) 
          col1 col2
0      MLB|NBA  NFL
1          MLB  NBA
2  NFL|NHL|NBA  MLB

P.S., yes, the second solution is faster:

df = pd.concat([df] * 100000, ignore_index=True)

%timeit df.col1.str.rsplit('|', 1, expand=True)
%timeit pd.DataFrame([x.rsplit('|', 1) for x in df.col1.tolist()])

473 ms ± 13.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
128 ms ± 1.29 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


来源:https://stackoverflow.com/questions/50993263/pandas-split-by-last-delimiter

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!