I have a dataset that looks like so -
yyyy month tmax tmin
0 1908 January 5.0 -1.4
1 1908 February 7
I think you need get_dummies:
df = pd.get_dummies(df['month'])
And if need add new columns to original and remove month
use join with pop:
df2 = df.join(pd.get_dummies(df.pop('month')))
print (df2.head())
yyyy tmax tmin April August December February January July June \
0 1908 5.0 -1.4 0 0 0 0 1 0 0
1 1908 7.3 1.9 0 0 0 1 0 0 0
2 1908 6.2 0.3 0 0 0 0 0 0 0
3 1908 7.4 2.1 1 0 0 0 0 0 0
4 1908 16.5 7.7 0 0 0 0 0 0 0
March May November October September
0 0 0 0 0 0
1 0 0 0 0 0
2 1 0 0 0 0
3 0 0 0 0 0
4 0 1 0 0 0
If NOT need remove column month
:
df2 = df.join(pd.get_dummies(df['month']))
print (df2.head())
yyyy month tmax tmin April August December February January \
0 1908 January 5.0 -1.4 0 0 0 0 1
1 1908 February 7.3 1.9 0 0 0 1 0
2 1908 March 6.2 0.3 0 0 0 0 0
3 1908 April 7.4 2.1 1 0 0 0 0
4 1908 May 16.5 7.7 0 0 0 0 0
July June March May November October September
0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0
2 0 0 1 0 0 0 0
3 0 0 0 0 0 0 0
4 0 0 0 1 0 0 0
If need sort columns there is more possible solutions - use reindex or reindex_axis:
months = ['January', 'February', 'March','April' ,'May', 'June', 'July', 'August', 'September','October', 'November','December']
df1 = pd.get_dummies(df['month']).reindex_axis(months, 1)
print (df1.head())
January February March April May June July August September \
0 1 0 0 0 0 0 0 0 0
1 0 1 0 0 0 0 0 0 0
2 0 0 1 0 0 0 0 0 0
3 0 0 0 1 0 0 0 0 0
4 0 0 0 0 1 0 0 0 0
October November December
0 0 0 0
1 0 0 0
2 0 0 0
3 0 0 0
4 0 0 0
df1 = pd.get_dummies(df['month']).reindex(columns=months)
print (df1.head())
January February March April May June July August September \
0 1 0 0 0 0 0 0 0 0
1 0 1 0 0 0 0 0 0 0
2 0 0 1 0 0 0 0 0 0
3 0 0 0 1 0 0 0 0 0
4 0 0 0 0 1 0 0 0 0
October November December
0 0 0 0
1 0 0 0
2 0 0 0
3 0 0 0
4 0 0 0
Or convert column month
to ordered categorical:
df1 = pd.get_dummies(df['month'].astype('category', categories=months, ordered=True))
print (df1.head())
January February March April May June July August September \
0 1 0 0 0 0 0 0 0 0
1 0 1 0 0 0 0 0 0 0
2 0 0 1 0 0 0 0 0 0
3 0 0 0 1 0 0 0 0 0
4 0 0 0 0 1 0 0 0 0
October November December
0 0 0 0
1 0 0 0
2 0 0 0
3 0 0 0
4 0 0 0
IIUC,
You can use assign
, **
unpacking operator, and pd.get_dummies
:
df.assign(**pd.get_dummies(df['month']))
Output:
yyyy month tmax tmin April August December February January \
0 1908 January 5.0 -1.4 0 0 0 0 1
1 1908 February 7.3 1.9 0 0 0 1 0
2 1908 March 6.2 0.3 0 0 0 0 0
3 1908 April 7.4 2.1 1 0 0 0 0
4 1908 May 16.5 7.7 0 0 0 0 0
5 1908 June 17.7 8.7 0 0 0 0 0
6 1908 July 20.1 11.0 0 0 0 0 0
7 1908 August 17.5 9.7 0 1 0 0 0
8 1908 September 16.3 8.4 0 0 0 0 0
9 1908 October 14.6 8.0 0 0 0 0 0
10 1908 November 9.6 3.4 0 0 0 0 0
11 1908 December 5.8 -0.3 0 0 1 0 0
12 1909 January 5.0 0.1 0 0 0 0 1
13 1909 February 5.5 -0.3 0 0 0 1 0
14 1909 March 5.6 -0.3 0 0 0 0 0
15 1909 April 12.2 3.3 1 0 0 0 0
16 1909 May 14.7 4.8 0 0 0 0 0
17 1909 June 15.0 7.5 0 0 0 0 0
18 1909 July 17.3 10.8 0 0 0 0 0
19 1909 August 18.8 10.7 0 1 0 0 0
20 1909 September 14.5 8.1 0 0 0 0 0
21 1909 October 12.9 6.9 0 0 0 0 0
22 1909 November 7.5 1.7 0 0 0 0 0
23 1909 December 5.3 0.4 0 0 1 0 0
24 1910 January 5.2 -0.5 0 0 0 0 1
July June March May November October September
0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0
2 0 0 1 0 0 0 0
3 0 0 0 0 0 0 0
4 0 0 0 1 0 0 0
5 0 1 0 0 0 0 0
6 1 0 0 0 0 0 0
7 0 0 0 0 0 0 0
8 0 0 0 0 0 0 1
9 0 0 0 0 0 1 0
10 0 0 0 0 1 0 0
11 0 0 0 0 0 0 0
12 0 0 0 0 0 0 0
13 0 0 0 0 0 0 0
14 0 0 1 0 0 0 0
15 0 0 0 0 0 0 0
16 0 0 0 1 0 0 0
17 0 1 0 0 0 0 0
18 1 0 0 0 0 0 0
19 0 0 0 0 0 0 0
20 0 0 0 0 0 0 1
21 0 0 0 0 0 1 0
22 0 0 0 0 1 0 0
23 0 0 0 0 0 0 0
24 0 0 0 0 0 0 0