Sure you could divide the remaining file size by the current download speed, but if your download speed fluctuates (and it will), this doesn\'t produce a very nice result.
I found Ben Dolman's answer very helpful, but for someone like myself who is not so math inclined it still took me about an hour to fully implement this into my code. Here's a more simple way of saying the same thing in python, if there are any inaccuracies let me know but in my testing it works very well:
def exponential_moving_average(data, samples=0, smoothing=0.02):
'''
data: an array of all values.
samples: how many previous data samples are avraged. Set to 0 to average all data points.
smoothing: a value between 0-1, 1 being a linear average (no falloff).
'''
if len(data) == 1:
return data[0]
if samples == 0 or samples > len(data):
samples = len(data)
average = sum(data[-samples:]) / samples
last_speed = data[-1]
return (smoothing * last_speed) + ((1 - smoothing) * average)
input_data = [4.5, 8.21, 8.7, 5.8, 3.8, 2.7, 2.5, 7.1, 9.3, 2.1, 3.1, 9.7, 5.1, 6.1, 9.1, 5.0, 1.6, 6.7, 5.5, 3.2] # this would be a constant stream of download speeds as you go, pre-defined here for illustration
data = []
ema_data = []
for sample in input_data:
data.append(sample)
average_value = exponential_moving_average(data)
ema_data.append(average_value)
# print it out for visualization
for i in range(len(data)):
print("REAL: ", data[i])
print("EMA: ", ema_data[i])
print("--")
I think the best you can do is divide the remaining file size by the average download speed (downloaded so far divided with how long you've been downloading). This will fluctuate a little to start but will be more and more stable the longer you download.
I use this equation I derived myself.
In VB.NET code:
Dim ed As TimeSpan = TimeSpan.FromSeconds((sd - l) / r)
I wrote an algorithm years ago to predict time remaining in a disk imaging and multicasting program that used a moving average with a reset when the current throughput went outside of a predefined range. It would keep things smooth unless something drastic happened, then it would adjust quickly and then return to a moving average again. See example chart here:
The thick blue line in that example chart is the actual throughput over time. Notice the low throughput during the first half of the transfer and then it jumps up dramatically in the second half. The orange line is an overall average. Notice that it never adjusts up far enough to ever give an accurate prediction of how long it will take to finish. The gray line is a moving average (i.e. the average of the last N data points - in this graph N is 5, but in reality, N might need to be larger to smooth enough). It recovers more quickly, but still takes a while to adjust. It will take more time the larger N is. So if your data is pretty noisy, then N will have to be larger and the recovery time will be longer.
The green line is the algorithm I used. It goes along just like a moving average, but when the data moves outside a predefined range (designated by the light thin blue and yellow lines), it resets the moving average and jumps up immediately. The predefined range can also be based on standard deviation so it can adjust to how noisy the data is automatically. I just threw these values into Excel to diagram them for this answer so it's not perfect, but you get the idea.
Data could be contrived to make this algorithm fail to be a good predictor of time remaining though. The bottom line is that you need to have a general idea of how you expect the data to behave and pick an algorithm accordingly. My algorithm worked well for the data sets I was seeing, so we kept using it.
One other important tip is that usually developers ignore setup and teardown times in their progress bars and time estimate calculations. This results in the eternal 99% or 100% progress bar that just sits there for a long time (while caches are being flushed or other cleanup work is happening) or wild early estimates when the scanning of directories or other setup work happens, accruing time but not accruing any percentage progress, which throws everything off. You can run several tests that include the setup and teardown times and come up with an estimate of how long those times are on average or based on the size of the job and add that time to the progress bar. For example, the first 5% of work is setup work and the last 10% is teardown work and then the 85% in the middle is the download or whatever repeating process your tracking is. This can help a lot too.
speed=speedNow*0.5+speedLastHalfMinute*0.3+speedLastMinute*0.2
In extension to Ben Dolman's answer, you could also calculate the fluctuating within the algorithm. It will be more smooth, but it will also predict the avarage speed.
Something like this:
prediction = 50;
depencySpeed = 200;
stableFactor = .5;
smoothFactor = median(0, abs(lastSpeed - averageSpeed), depencySpeed);
smoothFactor /= (depencySpeed - prediction * (smoothFactor / depencySpeed));
smoothFactor = smoothFactor * (1 - stableFactor) + stableFactor;
averageSpeed = smoothFactor * lastSpeed + (1 - smoothFactor) * averageSpeed;
Fluctuation or not, it will be both as stable as the other, with the right values for prediction and depencySpeed; you have to play with it a little depending on your internet speed. This settings are perfect for a avarage speed of 600 kB/s while it fluctuates from 0 to 1MB.