I have a dataframe with a couple of strange characters, \"*\" and \"-\".
import pandas as pd
import numpy as np
data = {\'year\': [2010, 2011, 2012, 2011,
*
is a special character in regex, you have to escape it:
football.replace(['\*','-'], ['0.00','0.00'], regex=True).astype(np.float64)
or use a character class:
football.replace([*-], '0.00', regex=True).astype(np.float64)
Do
football.replace(['*','-'], ['0.00','0.00'], regex=False)
That is, there is no need to use regular expression for a simple case of matching just 1 character or another;
or if you want to use regular expression, do note that *
is a special character; if you want to match values that are '*'
or '-'
exactly, use
football.replace('^[*-]$', '0.00', regex=True)
You could use a list comprehension within a dict comprehension to do this
>>> {key: [i if i not in {'*','-'} else '0.00' for i in values] for key, values in data.items()}
{'year': [2010, 2011, 2012, 2011, 2012, 2010, 2011, 2012],
'wins': [11, '0.00', 10, '0.00', 11, 6, 10, 4],
'losses': [5, 8, 6, 1, 5, 10, 6, 12],
'team': ['Bears', 'Bears', 'Bears', 'Packers', 'Packers', 'Lions', 'Lions', 'Lions']}
This would be done to clean up data
before you make a DataFrame
.