I am looking at the famous Titanic dataset from the Kaggle competition found here: http://www.kaggle.com/c/titanic-gettingStarted/data
I have loaded and processed the da
Here is my solution:
# convert string column to category
df.Sex = df.Sex.astype('category')
# create additional column for its codes
df['Sex_code'] = df_clean.Sex.cat.codes
You need to transform the categorical variables into numbers to plot them.
Example (assuming that the column 'Sex' is holding the gender data, with 'M' for males & 'F' for females)
df['Sex_int'] = np.nan
df.loc[df['Sex'] == 'M', 'Sex_int'] = 0
df.loc[df['Sex'] == 'F', 'Sex_int'] = 1
Now all females are represented by 0 & males by 1. Unknown genders (if there are any) will be ignored.
The rest of your code should process the updated dataframe nicely.
after googling and remembering something like the .map() function I fixed it in the following way:
colors=['red','green'] # color codes for survived : 0=red or 1=green
# create mapping Series for gender so it can be plotted
gender = Series([0,1],index=['male','female'])
df['gender']=df.Sex.map(gender)
# create mapping Series for Embarked so it can be plotted
embarked = Series([0,1,2,3],index=df.Embarked.unique())
df['embarked']=df.Embarked.map(embarked)
# add survived also back to the df
df['survived']=target
now I can plot it again...and drop the added columns afterwards.
thanks everyone for responding.....