top-n

Is there a way to get the nlargest items per group in dask?

橙三吉。 提交于 2019-12-08 18:30:18
问题 I have the following dataset: location category percent A 5 100.0 B 3 100.0 C 2 50.0 4 13.0 D 2 75.0 3 59.0 4 13.0 5 4.0 And I'm trying to get the nlargest items of category in dataframe grouped by location. i.e. If I want the top 2 largest percentages for each group the output should be: location category percent A 5 100.0 B 3 100.0 C 2 50.0 4 13.0 D 2 75.0 3 59.0 It looks like in pandas this is relatively straight forward using pandas.core.groupby.SeriesGroupBy.nlargest but dask doesn't

Pandas report top-n in group and pivot

落花浮王杯 提交于 2019-12-05 18:16:33
I am trying to summarise a dataframe by grouping along a single dimension d1 and reporting summary statistics for each element of d1. In particular I am interested in the top n (index and values) for a number of metrics. what I would like to produce is a row for each element of d1. Say I have two dimensions d1, d2 and 4 metrics m1,m2,m3, m4 1) what is the suggested way of grouping by d1, and finding the top n d2 and metric value, for each of metrics m1 - m4. in Wes's book Python for Data Analysis he suggests (page 35) def get_top1000(group): return group.sort_index(by='births', ascending=False

Find names of top-n highest-value (non-zero) columns in each pandas dataframe row

青春壹個敷衍的年華 提交于 2019-12-04 17:36:47
Suppose I have dataframe like id p1 p2 p3 p4 1 0 9 0 4 2 0 0 0 4 3 1 3 10 7 4 1 5 3 1 5 2 3 7 10 Want to find column names of top-n highest-value columns in each pandas data frame row and want to exclude zero value from top 3. id top1 top2 top3 1 p2 p4 2 p4 3 p3 p4 p2 4 p2 p3 p4/p1 5 p4 p3 p2 The present solutions return column names which are having zero too. Is there way to exclude zero values. have this solution arank = df.apply(np.argsort, axis = 1) ranked_cols = df.columns.to_series()[arank.values[:,::-1][:,:3]] new_df = pd.DataFrame(ranked_cols, index=df.index) there also other solutions

How to find column-index of top-n values within each row of huge dataframe

跟風遠走 提交于 2019-12-04 03:05:36
问题 I have a dataframe of format: (example data) Metric1 Metric2 Metric3 Metric4 Metric5 ID 1 0.5 0.3 0.2 0.8 0.7 2 0.1 0.8 0.5 0.2 0.4 3 0.3 0.1 0.7 0.4 0.2 4 0.9 0.4 0.8 0.5 0.2 where score range between [0,1] and I wish to generate a function that, for each id (row), calculates the top n metrics, where n is an input of the function along with the original dataframe. My ideal output would be:(for eg. n = 3) Top_1 Top_2 Top_3 ID 1 Metric4 Metric5 Metric1 2 Metric2 Metric3 Metric5 3 Metric3

select 2nd row in Plsql

我与影子孤独终老i 提交于 2019-12-02 14:49:47
问题 Lets say I have the following table: SomeTable( id, price ) How do I select the 2nd highest priced row from this table? Note : This has to be done in Pl/SQL, in a database agnostic way. Is it possible to do this without any loops? I know how this is done using Oracle constructs like rownum or mysql constructs like limit , so I am not looking for those. 回答1: CREATE TABLE mytable (id NUMBER PRIMARY KEY, price NUMBER NOT NULL); INSERT INTO mytable VALUES (1, 10); INSERT INTO mytable VALUES (2,

select 2nd row in Plsql

大憨熊 提交于 2019-12-02 09:31:59
Lets say I have the following table: SomeTable( id, price ) How do I select the 2nd highest priced row from this table? Note : This has to be done in Pl/SQL, in a database agnostic way. Is it possible to do this without any loops? I know how this is done using Oracle constructs like rownum or mysql constructs like limit , so I am not looking for those. CREATE TABLE mytable (id NUMBER PRIMARY KEY, price NUMBER NOT NULL); INSERT INTO mytable VALUES (1, 10); INSERT INTO mytable VALUES (2, 20); INSERT INTO mytable VALUES (3, 20); INSERT INTO mytable VALUES (4, 30); SELECT id, price FROM ( SELECT

Select TOP N and BOTTOM N

怎甘沉沦 提交于 2019-12-02 09:25:18
Trying to fetch top n bottom n rows. Though it gives me result but, it takes lot of time. I believe it scans table twice. Code used: WITH TI AS (SELECT * FROM (SELECT Column1, Column2, Colmn3 FROM TABLE ORDER BY DESC ) WHERE ROWNUM<=5), T2 AS (SELECT * FROM (SELECT Column1, Column2, Colmn3 FROM TABLE ORDER BY ASC ) WHERE ROWNUM<=5) SELECT * FROM T1 UNION ALL SELECT * FROM T2 How can i fetch this in more faster way?? Considering that tables are updated regularly. The best way to solve this problem depends in part on your Oracle version. Here is a very simple (and, I suspect, very efficient)

Selecting top N rows for each group based on value in column

有些话、适合烂在心里 提交于 2019-12-02 05:06:17
问题 I have dataframe like below :- x<-c(3,2,1,8,7,11,10,9,7,5,4) y<-c("a","a","a", "b","b","c","c","c","c","c","c") z<-c(2,2,2,1,1,3,3,3,3,3,3) df<-data.frame(x,y,z) df x y z 1 3 a 2 2 2 a 2 3 1 a 2 4 8 b 1 5 7 b 1 6 11 c 3 7 10 c 3 8 9 c 3 9 7 c 3 10 5 c 3 11 4 c 3 I want to select top n row for each group by column y where n is provided in column z. So the output should be like : output: x y z 1 3 a 2 2 2 a 2 3 8 b 1 4 11 c 3 5 10 c 3 6 9 c 3 回答1: A solution with base R: # df is split according

Selecting top N rows for each group based on value in column

假如想象 提交于 2019-12-02 00:30:31
I have dataframe like below :- x<-c(3,2,1,8,7,11,10,9,7,5,4) y<-c("a","a","a", "b","b","c","c","c","c","c","c") z<-c(2,2,2,1,1,3,3,3,3,3,3) df<-data.frame(x,y,z) df x y z 1 3 a 2 2 2 a 2 3 1 a 2 4 8 b 1 5 7 b 1 6 11 c 3 7 10 c 3 8 9 c 3 9 7 c 3 10 5 c 3 11 4 c 3 I want to select top n row for each group by column y where n is provided in column z. So the output should be like : output: x y z 1 3 a 2 2 2 a 2 3 8 b 1 4 11 c 3 5 10 c 3 6 9 c 3 A solution with base R: # df is split according to y, then we keep only the top "z" value (after ordering x) # and rbind everything back together: do.call

How to display the record with the highest value in Oracle?

删除回忆录丶 提交于 2019-12-01 23:00:08
I have 4 tables with the following structure: Table artist : artistID lastname firstname nationality dateofbirth datedcease Table work : workId title copy medium description artist ID Table Trans : TransactionID Date Acquired Acquistionprice datesold askingprice salesprice customerID workID Table Customer : customerID lastname Firstname street city state zippostalcode country areacode phonenumber email First question is which artist has the most works of artsold and how many of the artist works have been sold. My SQL query is this: SELECT * From dtoohey.artist A1 INNER JOIN ( SELECT COUNT(W1