Why is a hash table lookup only O(1) time when searching for a key is O(n)?

前端 未结 4 1866
予麋鹿
予麋鹿 2021-01-12 09:16

Technically speaking, based on posts I\'ve read here, Hash table is indeed O(n) time lookup in the worst case. But I don\'t get how the internal mechanics guarantee it to be

4条回答
  •  暖寄归人
    2021-01-12 09:31

    Sounds like you are in search of more elaborated explanation!

    I assume that you already understand that array element lookup takes O(1) i.e. if I already knew that I wanted to lookup 100th element in array then it would only take me O(1) because this is a simple memory address lookup (by adding 100 to the first element's address).

    The method of hashing makes use of this memory address lookup to achieve O(1) average time. Now obviously this means that you need to be able to convert lookup key to memory address. Let me give you a very simplified example of how this works in a hashtable (just to be clear, dictionaries implement hashtable under the hood so when I mention hashtable , the exact same principles are applicable to dictionaries too).

    The simplified example scenario; we need to lookup mailing addresses of customers by their first name. For simplicity assume that names are going to be unique and they've normal a to z letter. Let's say initially we are designing this for only 10 customers (i.e. their names and their addresses).

    Now lets say that we must solve this by storing name-addresses pairs in hashtable and we must create our own hashfunction!!! A hashfunction which would take name as a parameter and convert it into a memory lookup!!.

    Now take a moment and think how many arrays are required here? What would be their type and what would be their size?
    We definitely need one array to store mailing addresses. What should be the size? Well we need to store 10 mailing addresses so size has to be 10! We would also need second array for storing element indexes of first array!! Or in other words we need a second array to store references to mailing addresses (from first array) for our customer names. What should be the size of this array? Definitely larger than 10! But it really comes down to the hashfunction we design. For simplicity let's create a hashfunction which simply takes first letter of the name parameter and converts it to index. i.e. If name starts from A then it hashvalue is 1, for b it is 2, for c it is 3... for z it is 26. So at the least our lookup array size has to be 26 (you must be thinking that that's the wastage of lot of space for storing 10 addresses!! but it may worth it since it is going to give us performance) Let’s try to understand this with an example. Let’s say our first customer name is Bob. To store address for Bob first step is to find the first empty element in mailing address array. This is the first name so whole mailing address array is empty. We can store Bob’s address at index zero in mailing address array. When we store this address we’ll also mark it as Bob’s address at index 0. (I’m using this ‘marking’ terminology to later explain lookup vs searching) Then we find out hashvalue for name Bob. In this case it would be 2! So in the lookup array at location 2 we store 0. (i.e. the index to mailing address for Bob). Now lets say our second customer is Hamish; we store mailing address for Hamish at index 1 (i.e. the second element) in mailing address array; mark it as Hamish’s address and then we find out hashvalue for Hamish. Since Hamish starts from ‘H’, value would be 8. So in our lookup array at location 8 we store value 1 (i.e. the index for Hamish’s address). We can repeat this procedure for all 10 customers and store their addresses. Now later when you want to lookup Bob’s address you could look it up very fast simply by following a simple two step procedure. Step 1- convert name Bob to hashvalue ; answer is 2; go ahead and check location 2 in mailing address array; if it is marked as Bob’s address then return location 2 !! Same for Hamish; H-> gives 8. Go ahead and look up address from location 8; if it is marked as Hamish’s address then return address from location 8. This mechanism is called ‘looking up’. If you hadn’t created the second array (lookup array) then you’d only have mailing address array and you’d have to go through each address one by one and check if it is marked with the customer name you are looking for or not!. Now, what if there are two customer names starting with same letter? That’s called hash collision and that can be dealt with different approaches. What if we need to store 10000 names? That means we are have to use a better hashfunction which would give us less hash collisions. I’m not covering those two terminology here since I believe that question only demanded explaining lookup vs searching.

提交回复
热议问题