I am creating a new pandas dataframe from a previous dataframe using the .groupby
and .size
methods.
[in] results = df.groupby([\"
What you're seeing are your grouped columns as the index, if you call reset_index
then it restores the column names
so
results = df.groupby(["X", "Y", "Z", "F"]).size()
results.reset_index()
should work
In [11]:
df.groupby(["X","Y","Z","F"]).size()
Out[11]:
X Y Z F
9 27/02/2016 1 N 1
S 1
2 N 1
S 1
3 N 1
dtype: int64
In [12]:
df.groupby(["X","Y","Z","F"]).size().reset_index()
Out[12]:
X Y Z F 0
0 9 27/02/2016 1 N 1
1 9 27/02/2016 1 S 1
2 9 27/02/2016 2 N 1
3 9 27/02/2016 2 S 1
4 9 27/02/2016 3 N 1
Additionally you can achieve what you want by using count
:
In [13]:
df.groupby(["X","Y","Z","F"]).count().reset_index()
Out[13]:
X Y Z F Count
0 9 27/02/2016 1 N 1
1 9 27/02/2016 1 S 1
2 9 27/02/2016 2 N 1
3 9 27/02/2016 2 S 1
4 9 27/02/2016 3 N 1
You could also pass param as_index=False
here:
In [15]:
df.groupby(["X","Y","Z","F"], as_index=False).count()
Out[15]:
X Y Z F Count
0 9 27/02/2016 1 N 1
1 9 27/02/2016 1 S 1
2 9 27/02/2016 2 N 1
3 9 27/02/2016 2 S 1
4 9 27/02/2016 3 N 1
This is normally fine but some aggregate functions will bork if you try to use aggregation methods on columns whose dtypes
cannot be aggregated, for instance if you have str
dtypes and you decide to call mean
for instance.
you can use as_index=False
parameter for the .groupby()
function:
results = df.groupby(["X", "Y", "Z", "F"], as_index=False).size().rename(columns={0:'Count'})