Blocking factor in a DBMS

前端 未结 1 1073
广开言路
广开言路 2021-02-05 16:57

What is the blocking factor in a DBMS,

The bit I looked at said it was the floored value of blocks per record (so B/R floor), where B is block size and R is records. I w

1条回答
  •  北海茫月
    2021-02-05 17:31

    Yes, it means how many whole records fit into a block.

    (A block is the smallest unit of data that the underlying storage system (hdd, san fs, etc) is willing to deal in. It's size is traditionally 512 bytes for hard drives.)

    It is floored because if 100 and a half record would fit, one only stores 100 records per block.

    Blocking factor is pretty heavily used in many dbms related calculations.

    For example:

    The problem

    We have 10 000 000 records. Each record is 80 bytes long. Each record contains an unique key (Lets say social security numbers). We want looking up someone by their social security number to be fast.

    But what is fast?

    We need something to measure performance by. The thing that takes the most time is asking a block from the harddisk. You know, it is a mechanical device. It has to reposition its head, and blabla, so it really a slow operation when compared to how fast the CPU is, or compared to how fast operative memory(RAM) access is. Okay, lets say that we measure the performance of an operation by how many disk accesses it takes. We want to minimize the number of disk accesses. Okay, now we know how to tell if something is slow or fast.

    Many disk accesses -> bad

    Very few disk accesses -> good

    Calculating how many blocks our data needs

    Lets say that on our imaginary hw, each block is 5000 byte. We want calculate how many blocks we need. First, we need to know how many records fit into a single block:

    Blocking factor = floored((Block size)/(Record size)) = floored(5000/80) = floored(62.5) = 62 record/block

    And we have 10000000 records, so we need ceiled(10000000/62)=ceiled(161290.32)=161291 blocks to store all that data.

    Whoa, that's a lot of data. How do I look up someone fast?

    If one were to read all the blocks to find a single record by the key (social security number), then that would take 161291 disk accesses. Not good.

    We can do better. Lets build an index file. We will build a sparse index.

    A sparse index in databases is a file with pairs of keys and pointers for every block in the data file. Every key in this file is associated with a particular pointer to the block in the sorted data file. In clustered indices with duplicate keys, the sparse index points to the lowest search key in each block.

    Okay, so we will have a pointer and a key in our index file for each block. Lets say that on our imaginary hw, a pointer is 4 bytes long, and in our imaginary world a social security number (key) takes up 6 bytes.

    So we are going to store one 10 byte long key-pointer pair for each block in our index. How many of these pairs fit into a single block?

    Blocking factor of the index file = floored(5000/10) = 500
    

    ... so this means that 500 key-pointer pairs fit into a single block. And we need to store 161291 of these, so the index file will take up ceiled(161291/500)=323 blocks

    The index file is ordered by key, so we can do binary search in it to find the pointer to the block which contains the record. Doing binary search in the index file costs at most ceiled(log2(323))=9 disk acceses. We also need +1 disk access to actually read the data block which the index record points to.

    Wow, we got our lookup to work in 10 disk accesses. That's pretty awesome. We could even do better. :)

    Okay, so you can see that blocking factor is heavy used for example in this calculation.

    0 讨论(0)
提交回复
热议问题