windowing

Window of full weeks in pandas

旧时模样 提交于 2021-02-17 06:37:09
问题 I am looking for a special window function in pandas: sort of a combination of rolling and expanding. For calculating (for instance) the mean and standard deviating, I want to regard all past data, but ignore the first few records to make sure I have a multiple of 7 (days in my case). That's because I know the data has a strong weekly pattern. Example: s = pd.Series([1, 3, 4, 5, 4, 3, 1, 2, 4, 5, 4, 5, 4, 2, 1, 3, 4, 5, 4, 3, 1, 3], pd.date_range('2020-01-01', '2020-01-22')) s.rolling(7, 7)

How to use windowing functions efficiently to decide next N number of rows based on N number of previous values

爷,独闯天下 提交于 2020-12-09 06:14:17
问题 Hi i have the following data. +----------+----+-------+-----------------------+ | date|item|avg_val|conditions | +----------+----+-------+-----------------------+ |01-10-2020| x| 10| 0| |02-10-2020| x| 10| 0| |03-10-2020| x| 15| 1| |04-10-2020| x| 15| 1| |05-10-2020| x| 5| 0| |06-10-2020| x| 13| 1| |07-10-2020| x| 10| 1| |08-10-2020| x| 10| 0| |09-10-2020| x| 15| 1| |01-10-2020| y| 10| 0| |02-10-2020| y| 18| 0| |03-10-2020| y| 6| 1| |04-10-2020| y| 10| 0| |05-10-2020| y| 20| 0| +----------+--

Hive query generating identifiers for a sequence of row matching a condition

你说的曾经没有我的故事 提交于 2020-01-11 12:06:13
问题 Let's say I have the following hive table as input, let's call it connections : userid | timestamp --------|------------- 1 | 1433258019 1 | 1433258020 2 | 1433258080 2 | 1433258083 2 | 1433258088 2 | 1433258170 [...] | [...] With the following query: SELECT userid, timestamp, timestamp - LAG(timestamp, 1, 0) OVER w AS timediff CASE WHEN timediff > 60 THEN 'new_session' ELSE 'same_session' END AS session_state FROM connections WINDOW w PARTITION BY userid ORDER BY timestamp ASC; I'm

vDSP: Do the FFT functions include windowing?

陌路散爱 提交于 2020-01-07 06:49:41
问题 I am working on implementing an algorithm using vDSP. 1) take FFT 2) take log of square of absolute value (can be done with lookup table) 3) take another FFT 4) take absolute value I'm not sure if it is up to me to throw the incoming data through a windowing function before I run the FFT on it. vDSP_fft_zrip(setupReal, &A, stride, log2n, direction); that is my FFT function Do I need to throw the data through vDSP_hamm_window(...) first? 回答1: The iOS Accelerate library function vDSP_fft_zrip()

How to use a context window to segment a whole log Mel-spectrogram (ensuring the same number of segments for all the audios)?

筅森魡賤 提交于 2020-01-03 01:55:14
问题 I have several audios with different duration. So I don't know how to ensure the same number N of segments of the audio. I'm trying to implement an existing paper, so it's said that first a Log Mel-Spectrogram is performed in the whole audio with 64 Mel-filter banks from 20 to 8000 Hz, by using a 25 ms Hamming window and a 10 ms overlapping. Then, in order to get that I have the following code lines: y, sr = librosa.load(audio_file, sr=None) #sr = 22050 #len(y) = 237142 #duration = 5

How to find the difference between 1st row and nth row of a dataframe based on a condition using Spark Windowing

僤鯓⒐⒋嵵緔 提交于 2019-12-25 18:56:00
问题 Here is my exact requirement. I have to add a new column named ("DAYS_TO_NEXT_PD_ENCOUNTER"). As the name indicates, the values in the new column should have a difference of RANK that has claim_typ as 'PD' and the current row. For one ID, it can occur in-between any of the 'RV's and 'RJ's. For the rows that are present after the first occurence of claim_typ as 'PD', the difference should be null as shown below: The API 'last' works if the clm_typ 'PD' occurs as the last element. It will not

How to find the difference between 1st row and nth row of a dataframe based on a condition using Spark Windowing

泪湿孤枕 提交于 2019-12-25 18:55:28
问题 Here is my exact requirement. I have to add a new column named ("DAYS_TO_NEXT_PD_ENCOUNTER"). As the name indicates, the values in the new column should have a difference of RANK that has claim_typ as 'PD' and the current row. For one ID, it can occur in-between any of the 'RV's and 'RJ's. For the rows that are present after the first occurence of claim_typ as 'PD', the difference should be null as shown below: The API 'last' works if the clm_typ 'PD' occurs as the last element. It will not

An exponentially decaying moving average over a hopping window in Flink SQL: Casting time

大兔子大兔子 提交于 2019-12-24 19:40:38
问题 Now we have SQL with fancy windowing in Flink, I'm trying to have the decaying moving average referred by "what will be possible in future Flink releases for both the Table API and SQL." from their SQL roadmap/preview 2017-03 post: table .window(Slide over 1.hour every 1.second as 'w) .groupBy('productId, 'w) .select( 'w.end, 'productId, ('unitPrice * ('rowtime - 'w.start).exp() / 1.hour).sum / (('rowtime - 'w.start).exp() / 1.hour).sum) Here is my attempt (inspired as well by the calcite

Apache Flink: Skewed data distribution on KeyedStream

微笑、不失礼 提交于 2019-12-23 16:28:11
问题 I have this Java code in Flink: env.setParallelism(6); //Read from Kafka topic with 12 partitions DataStream<String> line = env.addSource(myConsumer); //Filter half of the records DataStream<Tuple2<String, Integer>> line_Num_Odd = line_Num.filter(new FilterOdd()); DataStream<Tuple3<String, String, Integer>> line_Num_Odd_2 = line_Num_Odd.map(new OddAdder()); //Filter the other half DataStream<Tuple2<String, Integer>> line_Num_Even = line_Num.filter(new FilterEven()); DataStream<Tuple3<String,