Pandas Python Regex : error: nothing to repeat

前端 未结 3 414
爱一瞬间的悲伤
爱一瞬间的悲伤 2021-01-14 01:10

I have a dataframe with a couple of strange characters, \"*\" and \"-\".

import pandas as pd
import numpy as np

data = {\'year\': [2010, 2011, 2012, 2011,          


        
相关标签:
3条回答
  • 2021-01-14 01:21

    * is a special character in regex, you have to escape it:

    football.replace(['\*','-'], ['0.00','0.00'], regex=True).astype(np.float64)
    

    or use a character class:

    football.replace([*-], '0.00', regex=True).astype(np.float64)
    
    0 讨论(0)
  • 2021-01-14 01:22

    Do

    football.replace(['*','-'], ['0.00','0.00'], regex=False)
    

    That is, there is no need to use regular expression for a simple case of matching just 1 character or another;

    or if you want to use regular expression, do note that * is a special character; if you want to match values that are '*' or '-' exactly, use

    football.replace('^[*-]$', '0.00', regex=True)
    
    0 讨论(0)
  • 2021-01-14 01:40

    You could use a list comprehension within a dict comprehension to do this

    >>> {key: [i if i not in {'*','-'} else '0.00' for i in values] for key, values in data.items()}
    {'year': [2010, 2011, 2012, 2011, 2012, 2010, 2011, 2012],
     'wins': [11, '0.00', 10, '0.00', 11, 6, 10, 4],
     'losses': [5, 8, 6, 1, 5, 10, 6, 12],
     'team': ['Bears', 'Bears', 'Bears', 'Packers', 'Packers', 'Lions', 'Lions', 'Lions']}
    

    This would be done to clean up data before you make a DataFrame.

    0 讨论(0)
提交回复
热议问题