Remove unwanted parts from strings in a column

前端 未结 9 885
鱼传尺愫
鱼传尺愫 2020-11-22 15:48

I am looking for an efficient way to remove unwanted parts from strings in a DataFrame column.

Data looks like:

    time    result
1    09:00   +52A
         


        
相关标签:
9条回答
  • 2020-11-22 16:39

    I often use list comprehensions for these types of tasks because they're often faster.

    There can be big differences in performance between the various methods for doing things like this (i.e. modifying every element of a series within a DataFrame). Often a list comprehension can be fastest - see code race below for this task:

    import pandas as pd
    #Map
    data = pd.DataFrame({'time':['09:00','10:00','11:00','12:00','13:00'], 'result':['+52A','+62B','+44a','+30b','-110a']})
    %timeit data['result'] = data['result'].map(lambda x: x.lstrip('+-').rstrip('aAbBcC'))
    10000 loops, best of 3: 187 µs per loop
    #List comprehension
    data = pd.DataFrame({'time':['09:00','10:00','11:00','12:00','13:00'], 'result':['+52A','+62B','+44a','+30b','-110a']})
    %timeit data['result'] = [x.lstrip('+-').rstrip('aAbBcC') for x in data['result']]
    10000 loops, best of 3: 117 µs per loop
    #.str
    data = pd.DataFrame({'time':['09:00','10:00','11:00','12:00','13:00'], 'result':['+52A','+62B','+44a','+30b','-110a']})
    %timeit data['result'] = data['result'].str.lstrip('+-').str.rstrip('aAbBcC')
    1000 loops, best of 3: 336 µs per loop
    
    0 讨论(0)
  • 2020-11-22 16:45

    Try this using regular expression:

    import re
    data['result'] = data['result'].map(lambda x: re.sub('[-+A-Za-z]',x)
    
    0 讨论(0)
  • 2020-11-22 16:47

    There's a bug here: currently cannot pass arguments to str.lstrip and str.rstrip:

    http://github.com/pydata/pandas/issues/2411

    EDIT: 2012-12-07 this works now on the dev branch:

    In [8]: df['result'].str.lstrip('+-').str.rstrip('aAbBcC')
    Out[8]: 
    1     52
    2     62
    3     44
    4     30
    5    110
    Name: result
    
    0 讨论(0)
提交回复
热议问题