sliding-window

Distinct count on a rolling time window

て烟熏妆下的殇ゞ 提交于 2020-06-17 02:19:47
问题 I want to count the number of distinct catalog numbers that have appeared within the last X minutes. This is usually called a rolling time window. For instance, if I have: row startime orderNumber catalogNumb 1 2007-09-24-15.50 o1 21 2 2007-09-24-15.51 o2 21 3 2007-09-24-15.52 o2 21 4 2007-09-24-15.53 o3 21 5 2007-09-24-15.54 o4 22 6 2007-09-24-15.55 o4 23 7 2007-09-24-15.56 o4 21 8 2007-09-24-15.57 o4 21 For instance, if I want to get this for the last 5 minutes (5 is just one of the

Time complexity of checking whether a set is contained in another set

六月ゝ 毕业季﹏ 提交于 2020-04-30 06:27:18
问题 I am trying to implement the example of finding the shortest substring of a given string s containing the pattern char . My code is working fine, but my goal is to attain the time complexity of O(N) where N is length of s . Here is my code; def shortest_subtstring(s,char): #smallest substring containing char.use sliding window start=0 d=defaultdict(int) minimum=9999 for i in range(len(s)): d[s[i]]+=1 #check whether all the characters from char has been visited. while set(char).issubset(set([j

R - Counting elements in a vector within a certain range, as a sliding window?

纵饮孤独 提交于 2020-02-27 12:34:04
问题 I am using R and I would like to convert a standard vector of integers into a 2-column data frame showing the the number of elements in each vector that fall within a specified-sized window. For instance take this vector: 1, 75, 79, 90, 91, 92, 109, 120, 167, 198, 203, 204, 206, 224, 230, 236, 240, 245, 263, 344 The results for looking at the values that fall with in a window-size of 50 should look like this: 50 1 100 5 150 2 200 2 250 8 300 1 350 1 400 0 With the first column as the number

How do I implement a circular buffer in Python?

痞子三分冷 提交于 2020-01-11 10:48:12
问题 I have a matrix for instance a=[12,2,4,67,8,9,23] and I would like a code that appends a value say 45 to it and removes the first value '12' so in essence I want to make a = [2,4,67,8,9,23,45] I want to work with regular matrices not numpy matrices so I can't use hstack or vstack How do I do this in python? Any help would be appreciated, thanks 回答1: The simplest way: a = a[1:] + [45] 回答2: Use a deque. http://docs.python.org/2/library/collections.html#collections.deque >>> import collections >

R: fast sliding window with given coordinates

天涯浪子 提交于 2020-01-01 02:47:28
问题 I have a data table with nrow being around a million or two and ncol of about 200. Each entry in a row has a coordinate associated with it. Tiny portion of the data: [1,] -2.80331471 -0.8874522 -2.34401863 -3.811584 -2.1292443 [2,] 0.03177716 0.2588624 0.82877467 1.955099 0.6321881 [3,] -1.32954665 -0.5433407 -2.19211837 -2.342554 -2.2142461 [4,] -0.60771429 -0.9758734 0.01558774 1.651459 -0.8137684 Coordinates for the first 4 rows: 9928202 9928251 9928288 9928319 What I would like is a

Average of field in half-an-hour window of timestamps

与世无争的帅哥 提交于 2019-12-25 00:35:46
问题 My dataframe has column-names Timestamp, es and looks like: Timestamp es 2015-04-01 09:07:42 31 2015-04-01 09:08:01 29.5 2015-04-01 09:15:03 18.5 2015-04-01 09:15:05 8.8 2015-04-01 09:15:09 9.6 The time runs till 15:30:30 (around 12000 es data points against each timestamp a day) and the corresponding es. Does R have some function in some package or code to average the es of all the timestamps within half hour. Sample output should look like: 2015-04-01 09:30:00 Value(Average of all es from 9

padding numpy rolling window operations using strides

一世执手 提交于 2019-12-24 02:23:55
问题 I have a function f that I would like to efficiently compute in a sliding window. def efficient_f(x): # do stuff wSize=50 return another_f(rolling_window_using_strides(x, wSize), -1) I have seen on SO that is particularly efficient to do that using strides: from numpy.lib.stride_tricks import as_strided def rolling_window_using_strides(a, window): shape = a.shape[:-1] + (a.shape[-1] - window + 1, window) strides = a.strides + (a.strides[-1],) print np.lib.stride_tricks.as_strided(a, shape

BigQuery : is it possible to execute another query inside an UDF?

泄露秘密 提交于 2019-12-24 00:42:59
问题 I have a table that records a row for each unique user per day with some aggregated stats for that user on that day, and I need to produce a report that tells me for each day, the no. of unique users in the last 30 days including that day. eg. for Aug 31st, it'll count the unique users from Aug 2nd to Aug 31st for Aug 30th, it'll count the unique users from Aug 1st to Aug 30th and so on... I've looked at some related questions but they aren't quite what I need - if a user logs in on multiple

Find start and end dates when one field changes

旧街凉风 提交于 2019-12-22 09:57:48
问题 I have this data in a table FIELD_A FIELD_B FIELD_D 249052903 10/15/2011 N 249052903 11/15/2011 P ------------- VALUE CHANGED 249052903 12/15/2011 P 249052903 1/15/2012 N ------------- VALUE CHANGED 249052903 2/15/2012 N 249052903 3/15/2012 N 249052903 4/15/2012 N 249052903 5/15/2012 N 249052903 6/15/2012 N 249052903 7/15/2012 N 249052903 8/15/2012 N 249052903 9/15/2012 N When ever the value in FIELD_D changes it forms a group and I need the min and max dates in that group. The query shoud

Compute the product of the next n elements in array

白昼怎懂夜的黑 提交于 2019-12-22 08:52:06
问题 I would like to compute the product of the next n adjacent elements of a matrix. The number n of elements to be multiplied should be given in function's input. For example for this input I should compute the product of every 3 consecutive elements, starting from the first. [p, ind] = max_product([1 2 2 1 3 1],3); This gives [1*2*2, 2*2*1, 2*1*3, 1*3*1] = [4,4,6,3] . Is there any practical way to do it? Now I do this using: for ii = 1:(length(v)-2) p = prod(v(ii:ii+n-1)); end where v is the