group-by

Mysql group by two columns and pick the maximum value of third column

可紊 提交于 2021-02-05 11:16:06
问题 I have a table that has user_id, item_id and interaction_type as columns. interaction_type could be 0, 1,2,3,4 or 5. However, for some user_id and item_id pairs, we might have multiple interaction_types. For example, we might have: user_id item_id interaction_type 2 3 1 2 3 0 2 3 5 4 1 0 5 4 4 5 4 2 What I want is to only keep the maximum interaction_type if there are multiples. So I want this: user_id item_id interaction_type 2 3 5 4 1 0 5 4 4 Here is the query I wrote for this purpose:

Mean per group and with count of variables in group [duplicate]

≯℡__Kan透↙ 提交于 2021-02-05 10:27:19
问题 This question already has answers here : How to use dplyr as alternative to aggregate (2 answers) Count number of rows within each group (15 answers) Closed 1 year ago . I would like to generate a table with groups per range, the mean and the count of variables in each group. I have a data.frame like below: Variable Shap 1 0.10 6 0.50 7 0.30 5 0.40 9 0.10 9 0.25 2 0.24 9 0.23 5 0.22 5 0.21 1 0.20 4 0.19 5 0.18 8 0.17 6 0.16 And would like to get a dataframe like this Range Shap_Avg Counts 0-5

Mean per group and with count of variables in group [duplicate]

早过忘川 提交于 2021-02-05 10:26:50
问题 This question already has answers here : How to use dplyr as alternative to aggregate (2 answers) Count number of rows within each group (15 answers) Closed 1 year ago . I would like to generate a table with groups per range, the mean and the count of variables in each group. I have a data.frame like below: Variable Shap 1 0.10 6 0.50 7 0.30 5 0.40 9 0.10 9 0.25 2 0.24 9 0.23 5 0.22 5 0.21 1 0.20 4 0.19 5 0.18 8 0.17 6 0.16 And would like to get a dataframe like this Range Shap_Avg Counts 0-5

R summarise by group sum giving NA

橙三吉。 提交于 2021-02-05 09:30:50
问题 I have a data frame like this Observations: 2,190,835 Variables: 13 $ patientid <int> 4489, 4489, 4489, 4489, 4489, 4489, 4489, 4489, 4489, 4489, 4489, 4489, 4489, 4489, 4489, 4489, 4489… $ preparationid <dbl> 1000307, 1000307, 1000307, 1000307, 1000307, 1000307, 1000307, 1000307, 1000307, 1000307, 1000307, 1… $ doseday <int> 90, 90, 91, 91, 92, 92, 92, 92, 93, 93, 93, 93, 94, 94, 94, 94, 95, 95, 95, 95, 99, 99, 100, 100, 10… $ route <fct> enteral., enteral., enteral., enteral., enteral.,

return empty rows for not existsting data

陌路散爱 提交于 2021-02-04 20:43:50
问题 Ok, i have a table with a date column and a integer column, and i want to retrieve all the rows grouped by date's day within a certain date range; since there are not rows for every day, is it possible to make mysql return rows for those days with a default value? example source table: date value 2020-01-01 1 2020-01-01 2 2020-01-03 2 2020-01-07 3 2020-01-08 4 2020-01-08 1 Standard behaviour after group ing by date and sum ming values: 2020-01-01 3 2020-01-03 2 2020-01-07 3 2020-01-08 5

Selecting the maximum count from a GROUP BY operation

自作多情 提交于 2021-02-04 18:50:11
问题 Forgive my SQL knowledge, but I have a Person table with following data - Id Name ---- ------ 1 a 2 b 3 b 4 c and I want the following result - Name Total ------ ------ b 2 If I use the GROUP BY query - SELECT Name, Total=COUNT(*) FROM Person GROUP BY Name It gives me - Name Total ------ ------ a 1 b 2 c 1 But I want only the one with maximum count. How do I get that? 回答1: If you want ties SELECT top (1) with ties Name, COUNT(*) AS [count] FROM Person GROUP BY Name ORDER BY count(*) DESC 回答2:

Pandas - Duplicate Row based on condition

大城市里の小女人 提交于 2021-02-04 17:47:37
问题 I'm trying to create a duplicate row if the row meets a condition. In the table below, I created a cumulative count based on a groupby, then another calculation for the MAX of the groupby. df['PathID'] = df.groupby(DateCompleted).cumcount() + 1 df['MaxPathID'] = df.groupby(DateCompleted)['PathID'].transform(max) Date Completed PathID MaxPathID 1/31/17 1 3 1/31/17 2 3 1/31/17 3 3 2/1/17 1 1 2/2/17 1 2 2/2/17 2 2 In this case, I want to duplicate only the record for 2/1/17 since there is only

Python pandas - how to group close elements

倾然丶 夕夏残阳落幕 提交于 2021-02-04 16:15:22
问题 I have a dataframe where I need to group elements with distance of no more than 1. For example, if this is my df: group_number val 0 1 5 1 1 8 2 1 12 3 1 13 4 1 22 5 1 26 6 1 31 7 2 7 8 2 16 9 2 17 10 2 19 11 2 29 12 2 33 13 2 62 So I need to group both by the group_number and val where the values of val are smaller than or equal to 1. So, in this example, lines 2 and 3 would group together, and also lines 8 and 9 would group together. I tried using diff or related functions, but I didn't

Pandas - Expanding average session time

邮差的信 提交于 2021-02-04 16:01:12
问题 The following DF represents events received from users. Id of the user and the timestamp of the event: id timestamp 0 1 2020-09-01 18:14:35 1 1 2020-09-01 18:14:39 2 1 2020-09-01 18:14:40 3 1 2020-09-01 02:09:22 4 1 2020-09-01 02:09:35 5 1 2020-09-01 02:09:53 6 1 2020-09-01 02:09:57 7 2 2020-09-01 18:14:35 8 2 2020-09-01 18:14:39 9 2 2020-09-01 18:14:40 10 2 2020-09-01 02:09:22 11 2 2020-09-01 02:09:35 12 2 2020-09-01 02:09:53 13 2 2020-09-01 02:09:57 I would like to get the average expanding

Mysql join and sum is doubling result

百般思念 提交于 2021-02-04 15:22:21
问题 I have a table of revenue as title_id revenue cost 1 10 5 2 10 5 3 10 5 4 10 5 1 20 6 2 20 6 3 20 6 4 20 6 when i execute this query SELECT SUM(revenue),SUM(cost) FROM revenue GROUP BY revenue.title_id it produces result title_id revenue cost 1 30 11 2 30 11 3 30 11 4 30 11 which is ok, now i want to combine sum result with another table which has structure like this title_id interest 1 10 2 10 3 10 4 10 1 20 2 20 3 20 4 20 when i execute join with aggregate function like this SELECT SUM