问题
Hello I have the following set of data in csv:
Group Size Some_other_column1 Some_other_column2
Short Small blabla1 blabla6
Moderate Medium babla3 blabla8
Short Small blabla2 blabla7
Moderate Small blabla4 blabla9
Tall Large blabla5 blabla10
Short Medium blabla11 blabla12
I would like to get the following result using python code:
Group Size Count Some_other_column1 Some_other_column2
Short Small 2 blabla1 blabla6
Moderate Medium 1 babla3 blabla8
Short Small 2 blabla2 blabla7
Moderate Small 1 blabla4 blabla9
Tall Large 1 blabla5 blabla10
Short Medium 1 blabla11 blabla12
Basically I need to count the number of group-size pairs and create a new column for that called, let's say, "Count", keeping all the other columns the same. I can use pandas or anything that can help.
For reference, there was another question asked on this topic, but it does not solve my problem since I have multiple columns that I need to keep: Python: get a frequency count based on two columns (variables) in pandas dataframe
There is another topic here: How to assign a name to the a size() column? But this is also not answering my question because I have 2 more columns ("some other column1/2") that I do not want to indirectly drop by applying the method described at the above link. Also, what is equally important, I do not want to merge pairs, I need to keep all of them, because they have different values on Some_other_column1/2.
回答1:
You need insert with GroupBy.transform of size
:
df.insert(2, 'Count', df.groupby(['Group','Size'])['Size'].transform('size'))
print (df)
Group Size Count Some_other_column1 Some_other_column2
0 Short Small 2 blabla1 blabla6
1 Moderate Medium 1 babla3 blabla8
2 Short Small 2 blabla2 blabla7
3 Moderate Small 1 blabla4 blabla9
4 Tall Large 1 blabla5 blabla10
5 Short Medium 1 blabla11 blabla12
来源:https://stackoverflow.com/questions/51320142/get-count-of-occurrences-inside-two-columns-inside-a-csv