top-n | 易学教程

Is there a way to get the nlargest items per group in dask?

阅读更多关于 Is there a way to get the nlargest items per group in dask?

问题 I have the following dataset: location category percent A 5 100.0 B 3 100.0 C 2 50.0 4 13.0 D 2 75.0 3 59.0 4 13.0 5 4.0 And I'm trying to get the nlargest items of category in dataframe grouped by location. i.e. If I want the top 2 largest percentages for each group the output should be: location category percent A 5 100.0 B 3 100.0 C 2 50.0 4 13.0 D 2 75.0 3 59.0 It looks like in pandas this is relatively straight forward using pandas.core.groupby.SeriesGroupBy.nlargest but dask doesn't

Pandas report top-n in group and pivot

阅读更多关于 Pandas report top-n in group and pivot

I am trying to summarise a dataframe by grouping along a single dimension d1 and reporting summary statistics for each element of d1. In particular I am interested in the top n (index and values) for a number of metrics. what I would like to produce is a row for each element of d1. Say I have two dimensions d1, d2 and 4 metrics m1,m2,m3, m4 1) what is the suggested way of grouping by d1, and finding the top n d2 and metric value, for each of metrics m1 - m4. in Wes's book Python for Data Analysis he suggests (page 35) def get_top1000(group): return group.sort_index(by='births', ascending=False

Find names of top-n highest-value (non-zero) columns in each pandas dataframe row

阅读更多关于 Find names of top-n highest-value (non-zero) columns in each pandas dataframe row

Suppose I have dataframe like id p1 p2 p3 p4 1 0 9 0 4 2 0 0 0 4 3 1 3 10 7 4 1 5 3 1 5 2 3 7 10 Want to find column names of top-n highest-value columns in each pandas data frame row and want to exclude zero value from top 3. id top1 top2 top3 1 p2 p4 2 p4 3 p3 p4 p2 4 p2 p3 p4/p1 5 p4 p3 p2 The present solutions return column names which are having zero too. Is there way to exclude zero values. have this solution arank = df.apply(np.argsort, axis = 1) ranked_cols = df.columns.to_series()[arank.values[:,::-1][:,:3]] new_df = pd.DataFrame(ranked_cols, index=df.index) there also other solutions

How to find column-index of top-n values within each row of huge dataframe

阅读更多关于 How to find column-index of top-n values within each row of huge dataframe

问题 I have a dataframe of format: (example data) Metric1 Metric2 Metric3 Metric4 Metric5 ID 1 0.5 0.3 0.2 0.8 0.7 2 0.1 0.8 0.5 0.2 0.4 3 0.3 0.1 0.7 0.4 0.2 4 0.9 0.4 0.8 0.5 0.2 where score range between [0,1] and I wish to generate a function that, for each id (row), calculates the top n metrics, where n is an input of the function along with the original dataframe. My ideal output would be:(for eg. n = 3) Top_1 Top_2 Top_3 ID 1 Metric4 Metric5 Metric1 2 Metric2 Metric3 Metric5 3 Metric3

select 2nd row in Plsql

阅读更多关于 select 2nd row in Plsql

问题 Lets say I have the following table: SomeTable( id, price ) How do I select the 2nd highest priced row from this table? Note : This has to be done in Pl/SQL, in a database agnostic way. Is it possible to do this without any loops? I know how this is done using Oracle constructs like rownum or mysql constructs like limit , so I am not looking for those. 回答1: CREATE TABLE mytable (id NUMBER PRIMARY KEY, price NUMBER NOT NULL); INSERT INTO mytable VALUES (1, 10); INSERT INTO mytable VALUES (2,

select 2nd row in Plsql

阅读更多关于 select 2nd row in Plsql

Lets say I have the following table: SomeTable( id, price ) How do I select the 2nd highest priced row from this table? Note : This has to be done in Pl/SQL, in a database agnostic way. Is it possible to do this without any loops? I know how this is done using Oracle constructs like rownum or mysql constructs like limit , so I am not looking for those. CREATE TABLE mytable (id NUMBER PRIMARY KEY, price NUMBER NOT NULL); INSERT INTO mytable VALUES (1, 10); INSERT INTO mytable VALUES (2, 20); INSERT INTO mytable VALUES (3, 20); INSERT INTO mytable VALUES (4, 30); SELECT id, price FROM ( SELECT

Select TOP N and BOTTOM N

阅读更多关于 Select TOP N and BOTTOM N

Trying to fetch top n bottom n rows. Though it gives me result but, it takes lot of time. I believe it scans table twice. Code used: WITH TI AS (SELECT * FROM (SELECT Column1, Column2, Colmn3 FROM TABLE ORDER BY DESC ) WHERE ROWNUM<=5), T2 AS (SELECT * FROM (SELECT Column1, Column2, Colmn3 FROM TABLE ORDER BY ASC ) WHERE ROWNUM<=5) SELECT * FROM T1 UNION ALL SELECT * FROM T2 How can i fetch this in more faster way?? Considering that tables are updated regularly. The best way to solve this problem depends in part on your Oracle version. Here is a very simple (and, I suspect, very efficient)

Selecting top N rows for each group based on value in column

阅读更多关于 Selecting top N rows for each group based on value in column

问题 I have dataframe like below :- x<-c(3,2,1,8,7,11,10,9,7,5,4) y<-c("a","a","a", "b","b","c","c","c","c","c","c") z<-c(2,2,2,1,1,3,3,3,3,3,3) df<-data.frame(x,y,z) df x y z 1 3 a 2 2 2 a 2 3 1 a 2 4 8 b 1 5 7 b 1 6 11 c 3 7 10 c 3 8 9 c 3 9 7 c 3 10 5 c 3 11 4 c 3 I want to select top n row for each group by column y where n is provided in column z. So the output should be like : output: x y z 1 3 a 2 2 2 a 2 3 8 b 1 4 11 c 3 5 10 c 3 6 9 c 3 回答1: A solution with base R: # df is split according

Selecting top N rows for each group based on value in column

阅读更多关于 Selecting top N rows for each group based on value in column

I have dataframe like below :- x<-c(3,2,1,8,7,11,10,9,7,5,4) y<-c("a","a","a", "b","b","c","c","c","c","c","c") z<-c(2,2,2,1,1,3,3,3,3,3,3) df<-data.frame(x,y,z) df x y z 1 3 a 2 2 2 a 2 3 1 a 2 4 8 b 1 5 7 b 1 6 11 c 3 7 10 c 3 8 9 c 3 9 7 c 3 10 5 c 3 11 4 c 3 I want to select top n row for each group by column y where n is provided in column z. So the output should be like : output: x y z 1 3 a 2 2 2 a 2 3 8 b 1 4 11 c 3 5 10 c 3 6 9 c 3 A solution with base R: # df is split according to y, then we keep only the top "z" value (after ordering x) # and rbind everything back together: do.call

How to display the record with the highest value in Oracle?

阅读更多关于 How to display the record with the highest value in Oracle?

I have 4 tables with the following structure: Table artist : artistID lastname firstname nationality dateofbirth datedcease Table work : workId title copy medium description artist ID Table Trans : TransactionID Date Acquired Acquistionprice datesold askingprice salesprice customerID workID Table Customer : customerID lastname Firstname street city state zippostalcode country areacode phonenumber email First question is which artist has the most works of artsold and how many of the artist works have been sold. My SQL query is this: SELECT * From dtoohey.artist A1 INNER JOIN ( SELECT COUNT(W1