I have some real rainfall data recorded as the date and time, and the accumulated number of tips on a tipping bucket rain-gauge. The tipping bucket represents 0.5mm of rainfall. I want to cycle through the file and determine the variation in intensity (rainfall/time) So I need a rolling average over multiple fixed time frames: So I want to accumulate rainfall, until 5minutes of rain is accumulated and determine the intensity in mm/hour. So if 3mm is recorded in 5min it is equal to 3/5*60 = 36mm/hr. the same rainfall over 10 minutes would be 18mm/hr...
So if I have rainfall over several hours I may need to review at several standard intervals of say: 5, 10,15,20,25,30,45,60 minutes etc... Also the data is recorded in reverse order in the raw file, so the earliest time is at the end of the file and the later and last time step appears first after a header: Looks like... (here 975 - 961 = 14 tips = 7mm of rainfall) average intensity 1.4mm/hr But between 16:27 and 16:34 967-961 = 6 tips = 3mm in 7 min = 27.71mm/hour
7424 Figtree (O'Briens Rd)
DATE :hh:mm Accum Tips
8/11/2011 20:33 975
8/11/2011 20:14 974
8/11/2011 20:04 973
8/11/2011 20:00 972
8/11/2011 19:35 971
8/11/2011 18:29 969
8/11/2011 16:44 968
8/11/2011 16:34 967
8/11/2011 16:33 966
8/11/2011 16:32 965
8/11/2011 16:28 963
8/11/2011 16:27 962
8/11/2011 15:30 961
Any suggestions?
I am not entirely sure what it is that you have a question about.
Do you know how to read out the file? You can do something like:
data = [] # Empty list of counts
# Skip the header
lines = [line.strip() for line in open('data.txt')][2::]
for line in lines:
print line
date, hour, count = line.split()
h,m = hour.split(':')
t = int(h) * 60 + int(m) # Compute total minutes
data.append( (t, int(count) ) ) # Append as tuple
data.reverse()
Since your data is cumulative, you need to subtract each two entries, this is where python's list comprehensions are really nice.
data = [(t1, d2 - d1) for ((t1,d1), (t2, d2)) in zip(data, data[1:])]
print data
Now we need to loop through and see how many entries are within the last x minutes.
timewindow = 10
for i, (t, count) in enumerate(data):
# Find the entries that happened within the last [...] minutes
withinwindow = filter( lambda x: x[0] > t - timewindow, data )
# now you can print out any kind of stats about this "within window" entries
print sum( count for (t, count) in withinwindow )
Since the time stamps do not come at regular intervals, you should use interpolating to get the most accurate results. This will make the rolling average easier too. I'm using the Interpolate
class in this answer in the below code.
from time import strptime, mktime
totime = lambda x: int(mktime(strptime(x, "%d/%m/%Y %H:%M")))
with open("my_file.txt", "r") as myfile:
# Skip header
for line in myfile:
if line.startswith("DATE"):
break
times = []
values = []
for line in myfile:
date, time, value = line.split()
times.append(totime(" ".join((date, time))))
values.append(int(value))
times.reverse()
values.reverse()
i = Interpolate(times, values)
Now it's just a matter of choosing your intervals and computing the difference between the endpoints of each interval. Let's create a generator function for that:
def rolling_avg(cumulative_lookup, start, stop, step_size, window_size):
for t in range(start + window_size, stop, step_size):
total = cumulative_lookup[t] - cumulative_lookup[t - window_size]
yield total / window_size
Below I'm printing the number of tips per hour in the previous hour with 10 minute intervals:
start = totime("8/11/2011 15:30")
stop = totime("8/11/2011 20:33")
for avg in rolling_avg(i, start, stop, 600, 3600):
print avg * 3600
EDIT: Made totime
return an int and created the rolling_avg
generator.
来源:https://stackoverflow.com/questions/8294602/rolling-average-to-calculate-rainfall-intensity