问题
Let's analyse this sample code where zip() is used to create different windows from a dataset and return them in loop.
months = [Jan, Feb, Mar, Apr, May]
for x, y in zip(months, months[1:]):
print(x, y)
# Output of each window will be:
Jan Feb
Feb Mar
Mar Apr
Apr May
Let's suppose that now I want to calculate the respective length percentage between the months used in each window.
Example in steps:
- When returning the first window (Jan Feb), I want to calculate the % length of Jan over the full window (which equals to Jan + Feb) and return it a new variable
- When returning the second window (Feb Mar), I want to calculate the % length of Feb over the full window (which equals to Feb + Mar) and return it a new variable
- Continuing this process until last window
Any suggestions on how I might implement this idea in the for loop are welcome!
Thank you!
EDIT
months = [Jan, Feb, Mar, Apr, May]
for x, y in zip(months, months[2:]):
print(x, y)
# Output of each window will be:
Jan Feb March
Feb Mar Apr
Mar Apr May
The goal is to calculate the length of two months on each window over the full window length:
- 1st window: Jan + Feb / Jan + Feb + March
- 2nd window: Feb + Mar / Feb + Mar + Apr
- continuing to last window
We can now calculate one month over the size of each window (with start.month). However, how do we adapt this to include more than one month?
Also, instead of using days_in_month, would there be a way to use the length of the datapoints (rows) in each month?
By using length of datapoints (rows) I mean that each month has many datapoints in 'time' format (e.g., 60 mins format). This would imply that 1 day in a month would have 24 different datapoints (rows). Example:
column
rows
01-Jan-2010 T00:00 value
01-Jan-2010 T01:00 value
01-Jan-2010 T02:00 value
... ...
01-Jan-2010 T24:00 value
02-Jan-2010 T00:00 value
... ...
Thank you!
回答1:
Here is one way. (In my case, months
is a period_range
object.)
import pandas as pd
months = pd.period_range(start='2020-01', periods=5, freq='M')
Now, iterate over range. Each iteration is a two-month window.
# print header labels
print('{:10s} {:10s} {:>10s} {:>10s} {:>10s} {:>10s} '.format(
'start', 'end', 'month', 'front (d)', 'total (d)', 'frac'))
for start, end in zip(months, months[1:]):
front_month = start.month
# number of days in first month (e.g., Jan)
front_month_days = start.days_in_month
# number of days in current sliding window (e.g., Jan + Feb)
days_in_curr_window = (end.end_time - start.start_time).days
frac = front_month_days / days_in_curr_window
print('{:10s} {:10s} {:10d} {:10d} {:10d} {:10.3f}'.format(
str(start), str(end), front_month,
front_month_days, days_in_curr_window, frac))
start end month front (d) total (d) frac
2020-01 2020-02 1 31 60 0.517
2020-02 2020-03 2 29 60 0.483
2020-03 2020-04 3 31 61 0.508
2020-04 2020-05 4 30 61 0.492
来源:https://stackoverflow.com/questions/63230518/sliding-windows-measuring-length-of-observations-on-each-looped-window