There is a good number of questions about this error, but after looking around I\'m still not able to find/wrap my mind around a solution yet. I\'m trying to pivot a data fr
There are several ways.
df1 = df.groupby(["id","contact_id","Network_Name","question"])['response_answer'].aggregate(lambda x: x).unstack().reset_index()
print (df1)
df1 = df.set_index(["id","contact_id","Network_Name","question"])['response_answer'].unstack().reset_index()
print (df1)
df1 = df.groupby(["id","contact_id","Network_Name","question"])['response_answer'].aggregate('first').unstack().reset_index()
print (df1)
df1 = df.pivot_table(index=["id","contact_id","Network_Name"], columns='question', values=['response_answer'], aggfunc='first')
df1.columns = df1.columns.droplevel()
df1 = df1.reset_index()
print (df1)
Same ans.
id contact_id Network_Name City State Trip_End_Location
0 16 137519 2206 None Ca None
1 17 137520 2206 None Ca None
2 18 137521 2206 None Ca None
3 19 137522 2206 None Ca None
4 20 137523 2208 Lancaster None None
5 21 137524 2208 Lancaster None None
6 22 137525 2208 Lancaster None None
7 23 137526 2208 Lancaster None None
8 24 137527 2208 None None Home
9 25 137528 2208 None None Home
10 26 137529 2208 None None Home
11 27 137530 2208 None None Home
The default aggfunc
in pivot_table
is np.sum
and it doesn't know what to do with strings and you haven't indicated what the index should be properly. Trying something like:
pivot_table = unified_df.pivot_table(index=['id', 'contact_id'],
aggfunc=lambda x: ' '.join(x))
This explicitly sets one row per id, contact_id
pair and pivots the set of response_answer
values on question
. The aggfunc
just assures that if you have multiple answers to the same question in the raw data that we just concatenate them together with spaces. The syntax of pivot_table
might vary depending on your pandas version.
Here's a quick example:
In [24]: import pandas as pd
In [25]: import random
In [26]: df = pd.DataFrame({'id':[100*random.randint(10, 50) for _ in range(100)], 'question': [str(random.randint(0,3)) for _ in range(100)], 'response': [str(random.randint(100,120)) for _ in range(100)]})
In [27]: df.head()
id question response
0 3100 1 116
1 4500 2 113
2 5000 1 120
3 3900 2 103
4 4300 0 117
In [28]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 100 entries, 0 to 99
Data columns (total 3 columns):
id 100 non-null int64
question 100 non-null object
response 100 non-null object
dtypes: int64(1), object(2)
memory usage: 3.1+ KB
In [29]: df.pivot_table(index='id', columns='question', values='response', aggfunc=lambda x: ' '.join(x)).head()
question 0 1 2 3
1000 110 120 NaN 100 NaN
1100 NaN 106 108 104 NaN
1200 104 113 119 NaN 101
1300 102 NaN 116 108 120
1400 NaN NaN 116 NaN