group-by

SQL SELECT Sum values without including duplicates

北城以北 提交于 2021-02-08 02:13:17
问题 I have a problem in Oracle SQL that I'm trying to get my head around. I'll illustrate with an example. I have three tables that I am querying: Employees __________________________________________ | EmployeeID | Name | | 1 | John Smith | | 2 | Douglas Hoppalot | | 3 | Harry Holiday | ... InternalCosts ________________________________ | IntID | Amount | EmployeeID | | 1 | 10 | 1 | | 2 | 20 | 2 | | 3 | 30 | 1 | ... ExternalCosts ________________________________ | ExtID | Amount | EmployeeID | |

Pandas calculate length of consecutive equal values from a grouped dataframe

寵の児 提交于 2021-02-07 20:34:55
问题 I want to do what they've done in the answer here: Calculating the number of specific consecutive equal values in a vectorized way in pandas , but using a grouped dataframe instead of a series. So given a dataframe with several columns A B C ------------ x x 0 x x 5 x x 2 x x 0 x x 0 x x 3 x x 0 y x 1 y x 10 y x 0 y x 5 y x 0 y x 0 I want to groupby columns A and B, then count the number of consecutive zeros in C. After that I'd like to return counts of the number of times each length of

Pandas: groupby with condition

谁说我不能喝 提交于 2021-02-07 18:43:27
问题 I have dataframe: ID,used_at,active_seconds,subdomain,visiting,category 123,2016-02-05 19:39:21,2,yandex.ru,2,Computers 123,2016-02-05 19:43:01,1,mail.yandex.ru,2,Computers 123,2016-02-05 19:43:13,6,mail.yandex.ru,2,Computers 234,2016-02-05 19:46:09,16,avito.ru,2,Automobiles 234,2016-02-05 19:48:36,21,avito.ru,2,Automobiles 345,2016-02-05 19:48:59,58,avito.ru,2,Automobiles 345,2016-02-05 19:51:21,4,avito.ru,2,Automobiles 345,2016-02-05 19:58:55,4,disk.yandex.ru,2,Computers 345,2016-02-05 19

Pandas GroupBy and select rows with the minimum value in a specific column

梦想的初衷 提交于 2021-02-07 11:24:26
问题 I am grouping my dataset by column A and then would like to take the minimum value in column B and the corresponding value in column C. data = pd.DataFrame({'A': [1, 2], 'B':[ 2, 4], 'C':[10, 4]}) data A B C 0 1 4 3 1 1 5 4 2 1 2 10 3 2 7 2 4 2 4 4 5 2 6 6 and I would like to get : A B C 0 1 2 10 1 2 4 4 For the moment I am grouping by A, and creating a value that indicates me the rows I will keep in my dataset: a = data.groupby('A').min() a['A'] = a.index to_keep = [str(x[0]) + str(x[1]) for

data.table: group-by, sum, name new column, and slice columns in one step

末鹿安然 提交于 2021-02-07 09:39:47
问题 This seems like it should be easy, but I've never been able to figure out how to do it. Using data.table I want to sum a column, C , by another column A , and just keep those two columns. At the same time, I want to be able to name the new column. My attempts and desired output: library(data.table) dt <- data.table(A= c('a', 'b', 'b', 'c', 'c'), B=c('19', '20', '21', '22', '23'), C=c(150,250,20,220,130)) # Desired Output - is there a way to do this in one step using data.table? # new.data <-

data.table: group-by, sum, name new column, and slice columns in one step

跟風遠走 提交于 2021-02-07 09:39:35
问题 This seems like it should be easy, but I've never been able to figure out how to do it. Using data.table I want to sum a column, C , by another column A , and just keep those two columns. At the same time, I want to be able to name the new column. My attempts and desired output: library(data.table) dt <- data.table(A= c('a', 'b', 'b', 'c', 'c'), B=c('19', '20', '21', '22', '23'), C=c(150,250,20,220,130)) # Desired Output - is there a way to do this in one step using data.table? # new.data <-

MySQL LEFT JOIN with GROUP BY and WHERE IN (sub query)

人盡茶涼 提交于 2021-02-06 15:24:13
问题 I have one table with some statistics per date, which I want listed out with MySQL. For some dates there will be no statistics, so the result should look something like this: 2013-03-01: 3 2013-03-02: 2 2013-03-03: 0 2013-03-04: 1 I figured out that filling in the gaps with 0 -zero- could be solved with a separate table with all possible dates and LEFT JOIN. So far so good. The statistics (impressions) is in the table 'campaigndata': id - int(11) date - date campaignid - int(11) impressions -

Apply different functions to different items in group object: Python pandas

半城伤御伤魂 提交于 2021-02-05 20:30:31
问题 Suppose I have a dataframe as follows: In [1]: test_dup_df Out[1]: exe_price exe_vol flag 2008-03-13 14:41:07 84.5 200 yes 2008-03-13 14:41:37 85.0 10000 yes 2008-03-13 14:41:38 84.5 69700 yes 2008-03-13 14:41:39 84.5 1200 yes 2008-03-13 14:42:00 84.5 1000 yes 2008-03-13 14:42:08 84.5 300 yes 2008-03-13 14:42:10 84.5 88100 yes 2008-03-13 14:42:10 84.5 11900 yes 2008-03-13 14:42:15 84.5 5000 yes 2008-03-13 14:42:16 84.5 3200 yes I want to group a duplicate data at time 14:42:10 and apply

How to group by month including all months?

一世执手 提交于 2021-02-05 11:28:46
问题 I group my table by months SELECT TO_CHAR (created, 'YYYY-MM') AS operation, COUNT (id) FROM user_info WHERE created IS NOT NULL GROUP BY ROLLUP (TO_CHAR (created, 'YYYY-MM')) 2015-04 1 2015-06 10 2015-08 22 2015-09 8 2015-10 13 2015-12 5 2016-01 25 2016-02 37 2016-03 24 2016-04 1 2016-05 1 2016-06 2 2016-08 2 2016-09 7 2016-10 103 2016-11 5 2016-12 2 2017-04 14 2017-05 2 284 But the records don't cover all the months. I would like the output to include all the months, with the missing ones

Mysql group by two columns and pick the maximum value of third column

六月ゝ 毕业季﹏ 提交于 2021-02-05 11:16:29
问题 I have a table that has user_id, item_id and interaction_type as columns. interaction_type could be 0, 1,2,3,4 or 5. However, for some user_id and item_id pairs, we might have multiple interaction_types. For example, we might have: user_id item_id interaction_type 2 3 1 2 3 0 2 3 5 4 1 0 5 4 4 5 4 2 What I want is to only keep the maximum interaction_type if there are multiples. So I want this: user_id item_id interaction_type 2 3 5 4 1 0 5 4 4 Here is the query I wrote for this purpose: