I write a script to measure seek times on a HDD and a small change in how its done results in dramatically different times.
First cycle makes jumps within an area at beginning of the disk. Second cycle selects random areas (of same size) on disk where seeks are performed. This approach is clearly different but I dont understand why it would change results? Notice that for large areas measurements converge for both methods.
Bytes* methods just format numbers nicely (1024 <-> "1KB"). Script must be run under root. Disk is sdb, by default.
import sys, os, time, random
#--------------------------------------------------------------------------------------------------
def BytesString(n):
suffixes = ['B','KB','MB','GB','TB','PB','EB','ZB','YB']
suffix = 0
while n % 1024 == 0 and suffix+1 < len(suffixes):
suffix += 1
n /= 1024
return '{0}{1}'.format(n, suffixes[suffix])
def BytesInt(s):
if all(c in '0123456789' for c in s):
return int(s)
suffixes = ['B','KB','MB','GB','TB','PB','EB','ZB','YB']
for power,suffix in reversed(list(enumerate(suffixes))):
if s.endswith(suffix):
return int(s.rstrip(suffix))*1024**power
raise ValueError('BytesInt requires proper suffix ('+' '.join(suffixes)+').')
def BytesStringFloat(n):
x = float(n)
suffixes = ['B','KB','MB','GB','TB','PB','EB','ZB','YB']
suffix = 0
while x > 1024.0 and suffix+1 < len(suffixes):
suffix += 1
x /= 1024.0
return '{0:0.2f}{1}'.format(x, suffixes[suffix])
#--------------------------------------------------------------------------------------------------
disk = open('/dev/sdb', 'r')
disk.seek(0,2)
disksize = disk.tell()
os.system('echo noop | sudo tee /sys/block/sdb/queue/scheduler > /dev/null')
print 'Syntax: progam [-s -sr -t -tr] [-v]: to run specific modes; for verbose mode.'
print 'Disk name: {0} Disk size: {1} Scheduler disabled.'.format(
disk.name, BytesStringFloat(disksize))
displaytimes = '-v' in sys.argv
#--------------------------------------------------------------------------------------------------
bufsize = 512
bufcount = 100
displaysamplecount = 24
for randomareas in [False,True]:
print
print 'Measuring: Random seek time {0}'.format(
'using random areas of disk.' if randomareas else 'using beginning of disk.')
print 'Samples: {0}{1} Sample size: {2}'.format(
bufcount, ' (displayed {0})'.format(displaysamplecount) if displaytimes else '', bufsize)
for area in [BytesInt('1MB')*2**i for i in range(0,64)]+[disksize]:
if area > disksize:
continue
os.system('echo 3 | sudo tee /proc/sys/vm/drop_caches > /dev/null')
times = []
disk.seek(0)
disk.read(bufsize)
for _ in range(bufcount):
left = random.randint(0, disksize-area) if randomareas else 0
right = left + random.randint(0, area)
disk.seek(left)
disk.read(bufsize)
start = time.time()
disk.seek(right)
disk.read(bufsize)
finish = time.time()
times.append(finish-start)
times = sorted(times)[:bufcount*95/100]
print 'Area tested: {0:6} Average: {1:5.2f} ms Max: {2:5.2f} ms Total: {3:0.2f} sec'.format(
BytesString(area) if area < disksize else BytesStringFloat(area),
sum(times)/len(times)*1000, max(times)*1000, sum(times))
if displaytimes:
print 'Read times: {0} ... {1} ms'.format(
' '.join(['{0:0.2f}'.format(x*1000) for x in times[:displaysamplecount/2]]),
' '.join(['{0:0.2f}'.format(x*1000) for x in times[-displaysamplecount/2:]]))
Measuring: Random seek time using beginning of disk.
Samples: 100 Sample size: 512
Area tested: 1MB Average: 0.14 ms Max: 0.35 ms Total: 0.01 sec
Area tested: 2MB Average: 0.16 ms Max: 0.31 ms Total: 0.02 sec
Area tested: 4MB Average: 0.20 ms Max: 0.75 ms Total: 0.02 sec
Area tested: 8MB Average: 0.19 ms Max: 0.97 ms Total: 0.02 sec
Area tested: 16MB Average: 0.64 ms Max: 7.97 ms Total: 0.06 sec
Area tested: 32MB Average: 2.29 ms Max: 10.56 ms Total: 0.22 sec
Area tested: 64MB Average: 3.89 ms Max: 12.25 ms Total: 0.37 sec
Area tested: 128MB Average: 6.32 ms Max: 13.18 ms Total: 0.60 sec
Area tested: 256MB Average: 6.73 ms Max: 13.04 ms Total: 0.64 sec
Area tested: 512MB Average: 7.43 ms Max: 13.72 ms Total: 0.71 sec
Area tested: 1GB Average: 8.38 ms Max: 13.59 ms Total: 0.80 sec
Area tested: 2GB Average: 8.51 ms Max: 13.81 ms Total: 0.81 sec
Area tested: 4GB Average: 8.87 ms Max: 13.86 ms Total: 0.84 sec
Area tested: 8GB Average: 9.82 ms Max: 14.66 ms Total: 0.93 sec
Area tested: 16GB Average: 9.73 ms Max: 15.95 ms Total: 0.92 sec
Area tested: 32GB Average: 9.89 ms Max: 15.18 ms Total: 0.94 sec
Area tested: 64GB Average: 10.60 ms Max: 15.85 ms Total: 1.01 sec
Area tested: 128GB Average: 11.18 ms Max: 18.68 ms Total: 1.06 sec
Area tested: 256GB Average: 13.31 ms Max: 30.94 ms Total: 1.26 sec
Area tested: 512GB Average: 14.14 ms Max: 31.70 ms Total: 1.34 sec
Area tested: 1TB Average: 15.20 ms Max: 33.35 ms Total: 1.44 sec
Area tested: 1.36TB Average: 15.47 ms Max: 25.30 ms Total: 1.47 sec
Measuring: Random seek time using random areas of disk.
Samples: 100 Sample size: 512
Area tested: 1MB Average: 7.21 ms Max: 35.94 ms Total: 0.69 sec
Area tested: 2MB Average: 5.40 ms Max: 12.92 ms Total: 0.51 sec
Area tested: 4MB Average: 6.97 ms Max: 36.60 ms Total: 0.66 sec
Area tested: 8MB Average: 7.24 ms Max: 15.05 ms Total: 0.69 sec
Area tested: 16MB Average: 7.36 ms Max: 13.03 ms Total: 0.70 sec
Area tested: 32MB Average: 7.34 ms Max: 12.30 ms Total: 0.70 sec
Area tested: 64MB Average: 7.35 ms Max: 13.47 ms Total: 0.70 sec
Area tested: 128MB Average: 7.66 ms Max: 13.37 ms Total: 0.73 sec
Area tested: 256MB Average: 7.93 ms Max: 13.34 ms Total: 0.75 sec
Area tested: 512MB Average: 10.16 ms Max: 39.67 ms Total: 0.97 sec
Area tested: 1GB Average: 8.76 ms Max: 14.38 ms Total: 0.83 sec
Area tested: 2GB Average: 9.42 ms Max: 17.74 ms Total: 0.89 sec
Area tested: 4GB Average: 11.00 ms Max: 23.22 ms Total: 1.05 sec
Area tested: 8GB Average: 10.59 ms Max: 19.60 ms Total: 1.01 sec
Area tested: 16GB Average: 10.91 ms Max: 19.15 ms Total: 1.04 sec
Area tested: 32GB Average: 11.19 ms Max: 26.02 ms Total: 1.06 sec
Area tested: 64GB Average: 12.59 ms Max: 26.49 ms Total: 1.20 sec
Area tested: 128GB Average: 11.97 ms Max: 19.30 ms Total: 1.14 sec
Area tested: 256GB Average: 12.61 ms Max: 22.84 ms Total: 1.20 sec
Area tested: 512GB Average: 13.62 ms Max: 20.48 ms Total: 1.29 sec
Area tested: 1TB Average: 16.72 ms Max: 29.20 ms Total: 1.59 sec
Area tested: 1.36TB Average: 15.96 ms Max: 26.21 ms Total: 1.52 sec
Modern HDDs have built-in caching - if you read a position "some logic" will cache areas around it internally and if you read something near it next time it will provide data from the cache if present else read from disk.
Reading from the start of your disk
Measuring: Random seek time using beginning of disk.
Samples: 100 Sample size: 512
Area tested: 1MB Average: 0.14 ms Max: 0.35 ms Total: 0.01 sec
will cache things from there - successive reads will read from the (faster) cache.
Reading random locations:
Measuring: Random seek time using random areas of disk.
Samples: 100 Sample size: 512
Area tested: 1MB Average: 7.21 ms Max: 35.94 ms Total: 0.69 sec
will not be able to read from cache - unless you read "the same random location" multiple times after each other.
Your code does not use the same random area 100 times:
for _ in range(bufcount): left = random.randint(0, disksize-area) if randomareas else 0 right = left + random.randint(0, area) disk.seek(left) disk.read(bufsize) start = time.time() disk.seek(right) disk.read(bufsize) finish = time.time() times.append(finish-start)
It creates new left
and right
for every one of the 100 bufcounts
- if you are randomly seeking so you do not profit from the HDDs cache (most of the time, unless random hits similar numbers by sheer chance).
来源:https://stackoverflow.com/questions/38292071/disk-seek-time-measurement-method