Disk seek time measurement method

I write a script to measure seek times on a HDD and a small change in how its done results in dramatically different times.

First cycle makes jumps within an area at beginning of the disk. Second cycle selects random areas (of same size) on disk where seeks are performed. This approach is clearly different but I dont understand why it would change results? Notice that for large areas measurements converge for both methods.

Bytes* methods just format numbers nicely (1024 <-> "1KB"). Script must be run under root. Disk is sdb, by default.

import sys, os, time, random


#--------------------------------------------------------------------------------------------------

def BytesString(n):
    suffixes = ['B','KB','MB','GB','TB','PB','EB','ZB','YB']
    suffix = 0
    while n % 1024 == 0 and suffix+1 < len(suffixes):
        suffix += 1
        n /= 1024
    return '{0}{1}'.format(n, suffixes[suffix])

def BytesInt(s):
    if all(c in '0123456789' for c in s):
        return int(s)
    suffixes = ['B','KB','MB','GB','TB','PB','EB','ZB','YB']
    for power,suffix in reversed(list(enumerate(suffixes))):
        if s.endswith(suffix):
            return int(s.rstrip(suffix))*1024**power
    raise ValueError('BytesInt requires proper suffix ('+' '.join(suffixes)+').')

def BytesStringFloat(n):
    x = float(n)
    suffixes = ['B','KB','MB','GB','TB','PB','EB','ZB','YB']
    suffix = 0
    while x > 1024.0 and suffix+1 < len(suffixes):
        suffix += 1
        x /= 1024.0
    return '{0:0.2f}{1}'.format(x, suffixes[suffix])


#--------------------------------------------------------------------------------------------------

disk = open('/dev/sdb', 'r')
disk.seek(0,2)
disksize = disk.tell()
os.system('echo noop | sudo tee /sys/block/sdb/queue/scheduler > /dev/null')

print 'Syntax: progam [-s -sr -t -tr] [-v]:  to run specific modes; for verbose mode.'
print 'Disk name: {0}  Disk size: {1}  Scheduler disabled.'.format(
    disk.name, BytesStringFloat(disksize))

displaytimes = '-v' in sys.argv


#--------------------------------------------------------------------------------------------------

bufsize = 512
bufcount = 100
displaysamplecount = 24

for randomareas in [False,True]:
    print
    print 'Measuring: Random seek time {0}'.format(
        'using random areas of disk.' if randomareas else 'using beginning of disk.')
    print 'Samples: {0}{1}   Sample size: {2}'.format(
        bufcount, ' (displayed {0})'.format(displaysamplecount) if displaytimes else '', bufsize)

    for area in [BytesInt('1MB')*2**i for i in range(0,64)]+[disksize]:
        if area > disksize:
            continue

        os.system('echo 3 | sudo tee /proc/sys/vm/drop_caches > /dev/null')

        times = []
        disk.seek(0)
        disk.read(bufsize)
        for _ in range(bufcount):
            left = random.randint(0, disksize-area) if randomareas else 0
            right = left + random.randint(0, area)
            disk.seek(left)
            disk.read(bufsize)
            start = time.time()
            disk.seek(right)
            disk.read(bufsize)
            finish = time.time()
            times.append(finish-start)

        times = sorted(times)[:bufcount*95/100]
        print 'Area tested: {0:6}   Average: {1:5.2f} ms   Max: {2:5.2f} ms   Total: {3:0.2f} sec'.format(
            BytesString(area) if area < disksize else BytesStringFloat(area), 
            sum(times)/len(times)*1000, max(times)*1000, sum(times))
        if displaytimes:
            print 'Read times: {0} ... {1} ms'.format(
                ' '.join(['{0:0.2f}'.format(x*1000) for x in times[:displaysamplecount/2]]), 
                ' '.join(['{0:0.2f}'.format(x*1000) for x in times[-displaysamplecount/2:]]))

Measuring: Random seek time using beginning of disk.
Samples: 100   Sample size: 512
Area tested: 1MB      Average:  0.14 ms   Max:  0.35 ms   Total: 0.01 sec
Area tested: 2MB      Average:  0.16 ms   Max:  0.31 ms   Total: 0.02 sec
Area tested: 4MB      Average:  0.20 ms   Max:  0.75 ms   Total: 0.02 sec
Area tested: 8MB      Average:  0.19 ms   Max:  0.97 ms   Total: 0.02 sec
Area tested: 16MB     Average:  0.64 ms   Max:  7.97 ms   Total: 0.06 sec
Area tested: 32MB     Average:  2.29 ms   Max: 10.56 ms   Total: 0.22 sec
Area tested: 64MB     Average:  3.89 ms   Max: 12.25 ms   Total: 0.37 sec
Area tested: 128MB    Average:  6.32 ms   Max: 13.18 ms   Total: 0.60 sec
Area tested: 256MB    Average:  6.73 ms   Max: 13.04 ms   Total: 0.64 sec
Area tested: 512MB    Average:  7.43 ms   Max: 13.72 ms   Total: 0.71 sec
Area tested: 1GB      Average:  8.38 ms   Max: 13.59 ms   Total: 0.80 sec
Area tested: 2GB      Average:  8.51 ms   Max: 13.81 ms   Total: 0.81 sec
Area tested: 4GB      Average:  8.87 ms   Max: 13.86 ms   Total: 0.84 sec
Area tested: 8GB      Average:  9.82 ms   Max: 14.66 ms   Total: 0.93 sec
Area tested: 16GB     Average:  9.73 ms   Max: 15.95 ms   Total: 0.92 sec
Area tested: 32GB     Average:  9.89 ms   Max: 15.18 ms   Total: 0.94 sec
Area tested: 64GB     Average: 10.60 ms   Max: 15.85 ms   Total: 1.01 sec
Area tested: 128GB    Average: 11.18 ms   Max: 18.68 ms   Total: 1.06 sec
Area tested: 256GB    Average: 13.31 ms   Max: 30.94 ms   Total: 1.26 sec
Area tested: 512GB    Average: 14.14 ms   Max: 31.70 ms   Total: 1.34 sec
Area tested: 1TB      Average: 15.20 ms   Max: 33.35 ms   Total: 1.44 sec
Area tested: 1.36TB   Average: 15.47 ms   Max: 25.30 ms   Total: 1.47 sec

Measuring: Random seek time using random areas of disk.
Samples: 100   Sample size: 512
Area tested: 1MB      Average:  7.21 ms   Max: 35.94 ms   Total: 0.69 sec
Area tested: 2MB      Average:  5.40 ms   Max: 12.92 ms   Total: 0.51 sec
Area tested: 4MB      Average:  6.97 ms   Max: 36.60 ms   Total: 0.66 sec
Area tested: 8MB      Average:  7.24 ms   Max: 15.05 ms   Total: 0.69 sec
Area tested: 16MB     Average:  7.36 ms   Max: 13.03 ms   Total: 0.70 sec
Area tested: 32MB     Average:  7.34 ms   Max: 12.30 ms   Total: 0.70 sec
Area tested: 64MB     Average:  7.35 ms   Max: 13.47 ms   Total: 0.70 sec
Area tested: 128MB    Average:  7.66 ms   Max: 13.37 ms   Total: 0.73 sec
Area tested: 256MB    Average:  7.93 ms   Max: 13.34 ms   Total: 0.75 sec
Area tested: 512MB    Average: 10.16 ms   Max: 39.67 ms   Total: 0.97 sec
Area tested: 1GB      Average:  8.76 ms   Max: 14.38 ms   Total: 0.83 sec
Area tested: 2GB      Average:  9.42 ms   Max: 17.74 ms   Total: 0.89 sec
Area tested: 4GB      Average: 11.00 ms   Max: 23.22 ms   Total: 1.05 sec
Area tested: 8GB      Average: 10.59 ms   Max: 19.60 ms   Total: 1.01 sec
Area tested: 16GB     Average: 10.91 ms   Max: 19.15 ms   Total: 1.04 sec
Area tested: 32GB     Average: 11.19 ms   Max: 26.02 ms   Total: 1.06 sec
Area tested: 64GB     Average: 12.59 ms   Max: 26.49 ms   Total: 1.20 sec
Area tested: 128GB    Average: 11.97 ms   Max: 19.30 ms   Total: 1.14 sec
Area tested: 256GB    Average: 12.61 ms   Max: 22.84 ms   Total: 1.20 sec
Area tested: 512GB    Average: 13.62 ms   Max: 20.48 ms   Total: 1.29 sec
Area tested: 1TB      Average: 16.72 ms   Max: 29.20 ms   Total: 1.59 sec
Area tested: 1.36TB   Average: 15.96 ms   Max: 26.21 ms   Total: 1.52 sec

Modern HDDs have built-in caching - if you read a position "some logic" will cache areas around it internally and if you read something near it next time it will provide data from the cache if present else read from disk.

Reading from the start of your disk

Measuring: Random seek time using beginning of disk.
Samples: 100   Sample size: 512
Area tested: 1MB      Average:  0.14 ms   Max:  0.35 ms   Total: 0.01 sec

will cache things from there - successive reads will read from the (faster) cache.

Reading random locations:

Measuring: Random seek time using random areas of disk.
Samples: 100   Sample size: 512
Area tested: 1MB      Average:  7.21 ms   Max: 35.94 ms   Total: 0.69 sec

will not be able to read from cache - unless you read "the same random location" multiple times after each other.

Your code does not use the same random area 100 times:

for _ in range(bufcount):
    left = random.randint(0, disksize-area) if randomareas else 0
    right = left + random.randint(0, area)
    disk.seek(left)
    disk.read(bufsize)
    start = time.time()
    disk.seek(right)
    disk.read(bufsize)
    finish = time.time()
    times.append(finish-start)

It creates new left and right for every one of the 100 bufcounts - if you are randomly seeking so you do not profit from the HDDs cache (most of the time, unless random hits similar numbers by sheer chance).

来源：https://stackoverflow.com/questions/38292071/disk-seek-time-measurement-method

标签

python

performance

python-2.7

disk

hard-drive