Remove comma from objects in a pandas dataframe column

柔情痞子 提交于 2020-01-02 14:03:31

问题


I have imported a csv file using pandas.

My dataframe has multiple columns titled "Farm", "Total Apples" and "Good Apples".

The numerical data imported for "Total Apples" and "Good Apples" contains commas to indicate thousands e.g. 1,200 etc. I want to remove the comma so the data looks like 1200 etc.

The variable type for the "Total Apples" and "Good Apples" columns comes up as object.

I tried using df.str.replace and df.strip but have not been successful.

Also tried to change the variable type from object to string and object to integer but couldn't make it work.

Any help would be greatly appreciated.

****EDIT****

Excerpt of data from csv file imported using pd.read_csv:

Farm_Name   Total Apples    Good Apples
EM  18,327  14,176
EE  18,785  14,146
IW  635 486
L   33,929  24,586
NE  12,497  9,609
NW  30,756  23,765
SC  8,515   6,438
SE  22,896  17,914
SW  11,972  9,114
WM  27,251  20,931
Y   21,495  16,662

回答1:


I think you can add parameter thousands to read_csv, then values in columns Total Apples and Good Apples are converted to integers:

Maybe your separator is different, dont forget change it. If separator is whitespace, change it to sep='\s+'.

import pandas as pd
import io

temp=u"""Farm_Name;Total Apples;Good Apples
EM;18,327;14,176
EE;18,785;14,146
IW;635;486
L;33,929;24,586
NE;12,497;9,609
NW;30,756;23,765
SC;8,515;6,438
SE;22,896;17,914
SW;11,972;9,114
WM;27,251;20,931
Y;21,495;16,662"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), sep=";",thousands=',')
print df
   Farm_Name  Total Apples  Good Apples
0         EM         18327        14176
1         EE         18785        14146
2         IW           635          486
3          L         33929        24586
4         NE         12497         9609
5         NW         30756        23765
6         SC          8515         6438
7         SE         22896        17914
8         SW         11972         9114
9         WM         27251        20931
10         Y         21495        16662
print df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11 entries, 0 to 10
Data columns (total 3 columns):
Farm_Name       11 non-null object
Total Apples    11 non-null int64
Good Apples     11 non-null int64
dtypes: int64(2), object(1)
memory usage: 336.0+ bytes
None



回答2:


try this:

locale.setlocale(locale.LC_NUMERIC, '')
df = df[['Farm Name']].join(df[['Total Apples', 'Good Apples']].applymap(locale.atof))


来源:https://stackoverflow.com/questions/36475838/remove-comma-from-objects-in-a-pandas-dataframe-column

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!