group-by

Mysql join and sum is doubling result

↘锁芯ラ 提交于 2021-02-04 15:21:27
问题 I have a table of revenue as title_id revenue cost 1 10 5 2 10 5 3 10 5 4 10 5 1 20 6 2 20 6 3 20 6 4 20 6 when i execute this query SELECT SUM(revenue),SUM(cost) FROM revenue GROUP BY revenue.title_id it produces result title_id revenue cost 1 30 11 2 30 11 3 30 11 4 30 11 which is ok, now i want to combine sum result with another table which has structure like this title_id interest 1 10 2 10 3 10 4 10 1 20 2 20 3 20 4 20 when i execute join with aggregate function like this SELECT SUM

Aggregate and sum by one key and rest

谁说胖子不能爱 提交于 2021-02-04 08:27:36
问题 It's hard for me to explain what I want to get, so I will show an example: I have objects: {name: 'steve', received: 100} {name: 'carolina', received: 70} {name: 'steve', 'received: 30} {name: 'andrew', received: 10} I can do: { $group : { _id : '$name', sum : { "$sum" :'$received' }, }, }, And i will get: Steve received 130 (100 +30) Carolina received 70 Andrew received 10 But I need something like that: Steve received 130 (100 +30) Everyone else received 80 (70+10) How can I get this effect

How do I improve the performance of pandas GroupBy filter operation?

拈花ヽ惹草 提交于 2021-02-02 08:54:20
问题 This is my first time asking a question. I'm working with a large CSV dataset (it contains over 15 million rows and is over 1.5 GB in size). I'm loading the extracts into Pandas dataframes running in Jupyter Notebooks to derive an algorithm based on the dataset. I group the data by MAC address, which results in 1+ million groups. Core to my algorithm development is running this operation: pandas.core.groupby.DataFrameGroupBy.filter Running this operation takes 3 to 5 minutes, depending on the

How do I improve the performance of pandas GroupBy filter operation?

泪湿孤枕 提交于 2021-02-02 08:53:33
问题 This is my first time asking a question. I'm working with a large CSV dataset (it contains over 15 million rows and is over 1.5 GB in size). I'm loading the extracts into Pandas dataframes running in Jupyter Notebooks to derive an algorithm based on the dataset. I group the data by MAC address, which results in 1+ million groups. Core to my algorithm development is running this operation: pandas.core.groupby.DataFrameGroupBy.filter Running this operation takes 3 to 5 minutes, depending on the

ACCESS/SQL Combining multiple rows with one column into one row and creating multiple columns

半城伤御伤魂 提交于 2021-02-02 03:46:30
问题 I've looked at quite a few examples and nothing fits quite like I need it to. I have a table that has item numbers in one column and image links in another column. The issue I have is I need to combine rows that have the same item number but need to move the data in the HTML_LINK column to multiple columns called imagelink1, imagelink2, imagelink3. The max amount of imagelink columns I will need is 5. I tried a pivot table which worked to combine the rows, but it creates a column the name of

ACCESS/SQL Combining multiple rows with one column into one row and creating multiple columns

此生再无相见时 提交于 2021-02-02 03:44:04
问题 I've looked at quite a few examples and nothing fits quite like I need it to. I have a table that has item numbers in one column and image links in another column. The issue I have is I need to combine rows that have the same item number but need to move the data in the HTML_LINK column to multiple columns called imagelink1, imagelink2, imagelink3. The max amount of imagelink columns I will need is 5. I tried a pivot table which worked to combine the rows, but it creates a column the name of

Get summary data columns in new pandas dataframe from existing dataframe based on other column-ID

北慕城南 提交于 2021-01-29 16:32:05
问题 I'm want to summarize the data in a dataframe and add the new columns to another dataframe. My data contains appartments with an ID-number and it has surface and volume values for each room in the appartment. What I want is having a dataframe that summarizes this and gives me the total surface and volume per appartment. There are two conditions for the original dataframe: Two conditions: - the dataframe can contain empty cells - when the values of surface or volume are equal for all of the

Alternative to partial(nonaggregated column) in group by [Ollivander's Inventory problem on hackerrank]

倖福魔咒の 提交于 2021-01-29 14:34:22
问题 I am trying to solve https://www.hackerrank.com/challenges/harry-potter-and-wands/problem With proper mysql setup one can do following select w.id, wp.age, min(w.coins_needed), w.power from wands w join wands_property wp on wp.code = w.code and wp.is_evil=0 group by w.code order by w.power desc, wp.age desc; But hackerrank's mysql setup does not allow partial grouping. It throws error SELECT list is not in GROUP BY clause and contains nonaggregated column 'run_eootvjd0lna.w.id' which is not

Count how many fall in each group in R [duplicate]

旧巷老猫 提交于 2021-01-29 14:32:33
问题 This question already has answers here : Count number of rows within each group (15 answers) How to sum a variable by group (15 answers) Closed 1 year ago . I have data something similar to this: User id Ranking Country 1 1 USA 2 3 AUS 3 1 USA 4 1 AUS 5 2 USA and I would like to have the following results: USA Ranking 1 = 2 USA Ranking 2 = 1 USA Ranking 3 = 0 AUS Ranking 1 = 1 AUS Ranking 2 = 0 AUS Ranking 3 = 1 How may I do this in R please? 回答1: reshape2::melt(table(d$Country, factor(d

Organization Chart - Group by Manager to show all employees under him

喜你入骨 提交于 2021-01-29 13:42:53
问题 I have written below code to display employee-manager reporting, i have group by so that all employees under one manager should be shown, but it is showing only 1 employee. what i am missing ? LINQ Group By Logic- var emp = (from m in employee group m by m.ManagerId into g join e1 in employee on g.FirstOrDefault().ManagerId equals e1.EmpId into temp from t1 in temp.DefaultIfEmpty() select new { EmpId = g.FirstOrDefault().EmpId, EmployeeName = g.FirstOrDefault().EmployeeName, Gender = g