standard-deviation

Removing outliers easily in R

余生长醉 提交于 2019-12-07 01:40:48
问题 I have data with discrete x-values, such as x = c(3,8,13,8,13,3,3,8,13,8,3,8,8,13,8,13,8,3,3,8,13,8,13,3,3) y = c(4,5,4,6,7,20,1,4,6,2,6,8,2,6,7,3,2,5,7,3,2,5,7,3,2); How can I generate a new dataset of x and y values where I eliminate pairs of values where the y-value is 2 standard deviations above the mean for that bin. For example, in the x=3 bin, 20 is more than 2 SDs above the mean, so that data point should be removed. 回答1: for me you want something like : by(dat,dat$x, function(z) z$y

Calculate variation of IP addresses column using MySQL

纵然是瞬间 提交于 2019-12-06 12:55:31
问题 I'm trying to detect people using proxies to abuse my website. Often they will change proxies and so forth. But there is definitely a pattern of them using one proxy address many times. Much more than is normal for legitimate visitors. Usually most accessing of my website is by unique ip addresses that have only visited once or a few times. Not repeatedly. Let's say I have these ip addresses in a column: 89.46.74.56 89.46.74.56 89.46.74.56 91.14.37.249 104.233.103.6 That would mean there are

Detect major events in signal data?

淺唱寂寞╮ 提交于 2019-12-05 06:55:33
问题 If I have a signal as the one below, how would I go about finding the beginning and end of the two "major events" (illustrated by a green arrow where the event begins, and a red arrow where it ends)? I've tried the method suggested in this answer, but it seems that no matter how much I play around with the lag , threshold and influence variables, it either reacts to the tiny changes in the beginning, middle and end of the graph (where there are no major events), or it doesn't react at all. I

StDev() function returns Null when table contains only one row

我的梦境 提交于 2019-12-05 05:22:51
I am trying to use the StDev function and am getting blank results. I am using it as... SELECT StDev(fldMean) FROM myTable Where fldMean contains a value of 2.3 and should evaluate to 0 but instead I am simply getting an empty result. I can't seem to understand how expressions are to be used in the function, Microsoft's manual really didn't help. SELECT StDev(fldMean) FROM myTable will return Null if [myTable] has only one row because the Standard Deviation cannot be computed from a single observation. You will need at least two rows in that table before you can receive a meaningful result. If

Removing outliers easily in R

こ雲淡風輕ζ 提交于 2019-12-05 05:05:06
I have data with discrete x-values, such as x = c(3,8,13,8,13,3,3,8,13,8,3,8,8,13,8,13,8,3,3,8,13,8,13,3,3) y = c(4,5,4,6,7,20,1,4,6,2,6,8,2,6,7,3,2,5,7,3,2,5,7,3,2); How can I generate a new dataset of x and y values where I eliminate pairs of values where the y-value is 2 standard deviations above the mean for that bin. For example, in the x=3 bin, 20 is more than 2 SDs above the mean, so that data point should be removed. for me you want something like : by(dat,dat$x, function(z) z$y[z$y < 2*sd(z$y)]) dat$x: 3 [1] 4 1 6 5 7 3 2 ---------------------------------------------------------------

Calculate variation of IP addresses column using MySQL

做~自己de王妃 提交于 2019-12-04 20:14:08
I'm trying to detect people using proxies to abuse my website. Often they will change proxies and so forth. But there is definitely a pattern of them using one proxy address many times. Much more than is normal for legitimate visitors. Usually most accessing of my website is by unique ip addresses that have only visited once or a few times. Not repeatedly. Let's say I have these ip addresses in a column: 89.46.74.56 89.46.74.56 89.46.74.56 91.14.37.249 104.233.103.6 That would mean there are 3 uniques out of 5. Giving a "uniqueness score" of 60%. How would I calculate this efficiently using

Detect major events in signal data?

女生的网名这么多〃 提交于 2019-12-03 21:45:57
If I have a signal as the one below, how would I go about finding the beginning and end of the two "major events" (illustrated by a green arrow where the event begins, and a red arrow where it ends)? I've tried the method suggested in this answer , but it seems that no matter how much I play around with the lag , threshold and influence variables, it either reacts to the tiny changes in the beginning, middle and end of the graph (where there are no major events), or it doesn't react at all. I can't simply determine if the signal is above a fixed threshold, as the strength of the signal can

Function that converts a vector of numbers to a vector of standard units

谁说胖子不能爱 提交于 2019-12-03 19:13:03
问题 Is there a function that given a vector of numbers, returns another vector with the standard units corresponding to each value? where standard unit: how many SDs a value is + or - from the mean Example: x <- c(1,3,4,5,7) # note: mean = 4, sd = 2 foo(x) [1] -1.5 -0.5 0.0 0.5 1.5 Is this fictitious "foo" function already included in a package? 回答1: yes, scale() : x <- c(1,3,4,5,7) scale(x) 回答2: The function you are looking for is scale . scale(x) [,1] [1,] -1.3416408 [2,] -0.4472136 [3,] 0

How to efficiently calculate a moving Standard Deviation

倖福魔咒の 提交于 2019-12-03 18:40:28
问题 Below you can see my C# method to calculate Bollinger Bands for each point (moving average, up band, down band). As you can see this method uses 2 for loops to calculate the moving standard deviation using the moving average. It used to contain an additional loop to calculate the moving average over the last n periods. This one I could remove by adding the new point value to total_average at the beginning of the loop and removing the i - n point value at the end of the loop. My question now

How can I do standard deviation in Ruby?

北战南征 提交于 2019-12-03 02:10:02
问题 I have several records with a given attribute, and I want to find the standard deviation. How do I do that? 回答1: module Enumerable def sum self.inject(0){|accum, i| accum + i } end def mean self.sum/self.length.to_f end def sample_variance m = self.mean sum = self.inject(0){|accum, i| accum +(i-m)**2 } sum/(self.length - 1).to_f end def standard_deviation Math.sqrt(self.sample_variance) end end Testing it: a = [ 20, 23, 23, 24, 25, 22, 12, 21, 29 ] a.standard_deviation # => 4.594682917363407