Linux AIO: Poor Scaling

前端 未结 3 1331
灰色年华
灰色年华 2021-02-04 07:25

I am writing a library that uses the Linux asynchronous I/O system calls, and would like to know why the io_submit function is exhibiting poor scaling on the ext4 f

3条回答
  •  庸人自扰
    2021-02-04 08:06

    Why isn't the execution time of io_submit constant?

    Because you are submitting I/Os that are so big, the block layer has to split them up and then queue the resulting requests. This can then cause you to hit resource limitations that in turn cause io_submit() to behave as if it's blocking...

    What is causing this poor scaling behavior?

    The bigger the I/O is over the splitting threshold (see below) the more likely it becomes that the number of splits done to turn it into appropriately sized requests will also increase (presumably actually doing the splits will cost a small amount of time too). With direct I/O io_submit() does not return until all its requests have been allocated and queued at the block layer level. Further, the amount of requests that can be queued by the block layer for a given disk is limited to /sys/block/[disk_device]/queue/nr_requests. Exceeding this limit leads to io_submit() blocking until enough request slots have been freed up such that all its allocations have been satisfied (this is related to Arvid was recommending).

    Do I need to split up all read requests on ext4 file systems into multiple requests, each of size less than 20,000 pages?

    Ideally you should split your requests into far smaller amounts than that - 20000 pages (assuming a 4096 byte page which is what is used on x86 platforms) is roughly 78 megabytes! This doesn't just apply to when you're using ext4 - doing such large io_submit() I/O sizes to other filesystems or even directly to block devices will be unlikely perform well.

    If you work out which disk device your filesystem is on and look at /sys/block/[disk_device]/queue/max_sectors_kb that will give you an upper bound but the bound at which splitting starts may be even smaller so you may want to limit the size of each I/O to /sys/block/[disk_device]/queue/max_segments * PAGE_SIZE instead.

    Where does this "magic" value of 20,000 come from?

    This is likely down to some combination of:

    • The maximum size each I/O can be before the block layer splits it (at most this will be /sys/block/[disk_device]/queue/max_sectors_kb but the observed split limit may be even lower)
    • The maximum number of I/Os that can be queued before blocking occurs (/sys/block/[disk_device]/queue/nr_requests)
    • Your hardware's command queue depth (/sys/block/[disk_device]/device/queue_depth)
    • How fast your disk is at completing requests. When the kernel can't queue any more I/Os to the real device (due to the hardware queue_depth being full and the kernel's additional queues being full) it becomes blocking on new requests until in-flight ones sent to the hardware have completed.

    If I run my program on another Linux system, how can I determine the largest IO request size to use without experiencing poor scaling behavior?

    Limit each request I/O to the lower of /sys/block/[disk_device]/queue/max_sectors_kb or /sys/block/[disk_device]/queue/max_segments * PAGE_SIZE. I would imagine I/Os no bigger than 524288 bytes should be safe but your hardware may be able to cope with a larger size and thus get a higher throughput but possibly at the expense of completion (as opposed to submission) latency.

    If possible, what can I do to get io_submit not to block for large IO request sizes?

    There's going to be an upper "good" limit and if you surpass it there are going to be consequences which you can't escape.

    Related questions

    asynchronous IO io_submit latency in Ubuntu Linux

提交回复
热议问题