问题
I have imported a csv file using pandas.
My dataframe has multiple columns titled "Farm", "Total Apples" and "Good Apples".
The numerical data imported for "Total Apples" and "Good Apples" contains commas to indicate thousands e.g. 1,200 etc. I want to remove the comma so the data looks like 1200 etc.
The variable type for the "Total Apples" and "Good Apples" columns comes up as object.
I tried using df.str.replace
and df.strip
but have not been successful.
Also tried to change the variable type from object to string and object to integer but couldn't make it work.
Any help would be greatly appreciated.
****EDIT****
Excerpt of data from csv file imported using pd.read_csv:
Farm_Name Total Apples Good Apples
EM 18,327 14,176
EE 18,785 14,146
IW 635 486
L 33,929 24,586
NE 12,497 9,609
NW 30,756 23,765
SC 8,515 6,438
SE 22,896 17,914
SW 11,972 9,114
WM 27,251 20,931
Y 21,495 16,662
回答1:
I think you can add parameter thousands
to read_csv, then values in columns Total Apples
and Good Apples
are converted to integers
:
Maybe your separator
is different, dont forget change it. If separator is whitespace, change it to sep='\s+'
.
import pandas as pd
import io
temp=u"""Farm_Name;Total Apples;Good Apples
EM;18,327;14,176
EE;18,785;14,146
IW;635;486
L;33,929;24,586
NE;12,497;9,609
NW;30,756;23,765
SC;8,515;6,438
SE;22,896;17,914
SW;11,972;9,114
WM;27,251;20,931
Y;21,495;16,662"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), sep=";",thousands=',')
print df
Farm_Name Total Apples Good Apples
0 EM 18327 14176
1 EE 18785 14146
2 IW 635 486
3 L 33929 24586
4 NE 12497 9609
5 NW 30756 23765
6 SC 8515 6438
7 SE 22896 17914
8 SW 11972 9114
9 WM 27251 20931
10 Y 21495 16662
print df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11 entries, 0 to 10
Data columns (total 3 columns):
Farm_Name 11 non-null object
Total Apples 11 non-null int64
Good Apples 11 non-null int64
dtypes: int64(2), object(1)
memory usage: 336.0+ bytes
None
回答2:
try this:
locale.setlocale(locale.LC_NUMERIC, '')
df = df[['Farm Name']].join(df[['Total Apples', 'Good Apples']].applymap(locale.atof))
来源:https://stackoverflow.com/questions/36475838/remove-comma-from-objects-in-a-pandas-dataframe-column