quantile

Find top deciles from dataframe by group

只谈情不闲聊 提交于 2019-12-02 05:32:41
I am attempting to create new variables using a function and lapply rather than working right in the data with loops. I used to use Stata and would have solved this problem with a method similar to that discussed here . Since naming variables programmatically is so difficult or at least awkward in R (and it seems you can't use indexing with assign ), I have left the naming process until after the lapply . I am then using a for loop to do the renaming prior to merging and again for the merging. Are there more efficient ways of doing this? How would I replace the loops? Should I be doing some

Compute quantiles incorporating Sample Design (Survey package)

本小妞迷上赌 提交于 2019-12-01 13:49:47
I want to compute a new column using the quantiles of another column (a continuous variable) incorporating the Sample Design of a complex survey. The idea is to create in the the data frame a new variable that indicates which quantile group each observation falls into Here is how I execute the idea without incorporating the sample design, so you can understand what I'm aiming for. # Load Data data(api) # Convert data to data.table format (mostly to increase speed of the process) apiclus1 <- as.data.table(apiclus1) # Create deciles variable apiclus1[, decile:=cut(api00, breaks=quantile(api00,

Python equivalent of Excel's PERCENTILE.EXC

落花浮王杯 提交于 2019-12-01 08:39:53
I am using Pandas to compute some financial risk analytics, including Value at Risk. In short, to compute Value at Risk (VaR), you take a time series of simulated portfolio changes in value, and then compute a specific tail percentile loss. For example, 95% VaR is the 5th percentile figure in that time series. I have my time series in a Pandas dataframe, and am currently using the pd.quantile() function to compute the percentile. My question is, typical market convention for VaR is use an exclusionary percentile (ie: 95% VaR is interpreted as: there is a 95% chance your portfolio will not

Python equivalent of Excel's PERCENTILE.EXC

旧时模样 提交于 2019-12-01 06:57:32
问题 I am using Pandas to compute some financial risk analytics, including Value at Risk. In short, to compute Value at Risk (VaR), you take a time series of simulated portfolio changes in value, and then compute a specific tail percentile loss. For example, 95% VaR is the 5th percentile figure in that time series. I have my time series in a Pandas dataframe, and am currently using the pd.quantile() function to compute the percentile. My question is, typical market convention for VaR is use an

Quartiles in SQL query

主宰稳场 提交于 2019-12-01 05:53:34
问题 I have a very simple table like that: CREATE TABLE IF NOT EXISTS LuxLog ( Sensor TINYINT, Lux INT, PRIMARY KEY(Sensor) ) It contains thousands of logs from different sensors. I would like to have Q1 and Q3 for all sensors. I can do one query for every data, but it would be better for me to have one query for all sensors (getting Q1 and Q3 back from one query) I though it would be a fairly simple operation, as quartiles are broadly used and one of the main statistical variables in frequency

what's the inverse of the quantile function on a pandas Series?

那年仲夏 提交于 2019-11-30 04:24:35
The quantile functions gives us the quantile of a given pandas series s , E.g. s.quantile(0.9) is 4.2 Is there the inverse function (i.e. cumulative distribution) which finds the value x such that s.quantile(x)=4 Thanks I had the same question as you did! I found an easy way of getting the inverse of quantile using scipy. #libs required from scipy import stats import pandas as pd import numpy as np #generate ramdom data with same seed (to be reproducible) np.random.seed(seed=1) df = pd.DataFrame(np.random.uniform(0,1,(10)), columns=['a']) #quantile function x = df.quantile(0.5)[0] #inverse of

Definitions of quantiles in R

邮差的信 提交于 2019-11-28 11:18:45
Main question: Suppose you have a discrete, finite data set $d$. Then the command summary(d) returns the Min, 1st quartile, Median, mean, 3rd quartile, and max. My question is: what formula does R use to compute the 1st quartile? Background: My data set was: d=c(1,2,3,3,4,9) . summary(d) returns 2.25 as the first quartile. Now, one way to compute the first quartile is to choose a value q1 such that 25% of the data set is less than of equal to q1. Clearly, this is not what R is using. So, I was wondering, what formula does R use to compute the first quartile? Google searches on this topic have

Definitions of quantiles in R

旧时模样 提交于 2019-11-27 06:12:36
问题 Main question: Suppose you have a discrete, finite data set $d$. Then the command summary(d) returns the Min, 1st quartile, Median, mean, 3rd quartile, and max. My question is: what formula does R use to compute the 1st quartile? Background: My data set was: d=c(1,2,3,3,4,9) . summary(d) returns 2.25 as the first quartile. Now, one way to compute the first quartile is to choose a value q1 such that 25% of the data set is less than of equal to q1. Clearly, this is not what R is using. So, I