Why is a hash table lookup only O(1) time when searching for a key is O(n)?

前端 未结 4 1864
予麋鹿
予麋鹿 2021-01-12 09:16

Technically speaking, based on posts I\'ve read here, Hash table is indeed O(n) time lookup in the worst case. But I don\'t get how the internal mechanics guarantee it to be

相关标签:
4条回答
  • 2021-01-12 09:17

    i think the word "hash" is scaring people. Behind the scene, hash tables are data structures that stores the key/value pairs in an array.

    The only difference here is, we do not care about the position of the key value pair. There is no INDEX here. Look up for an array item O(1). it is independent of size of the array and independent of position. you just enter index number, and item is retrieved.

    so how much time did look up need to complete. it is O(1).

    In hash tables, when you store key/value pair, key value gets hashed and stored in the corresponding memory slot.

    {name:"bob"} //name will be hashed
    
    hash(name) = ab1234wq //this is the memory address
    [["name","bob"]] // will be store at memory adress ab1234wq
    

    when you look up for "name", it will get hashed and as the main feature of hashing functions, it will return same the result "ab1234wq". So the programming engine will look at this address, will see the array and will return the value. As you can see, this operation is same as array look up.

    0 讨论(0)
  • 2021-01-12 09:28

    I think you are confusing terminology and also complicating matters by thinking about buckets.

    Let's imagine a hash table that is implemented in as an array a of length n. Let's also imagine we have n possible keys and a perfect hash function H that maps each key k to a unique index i in a.

    Let's initialize our hash table by setting every value in a to nil.

    We can insert a key, value pair (k1, v1) into our hash table by placing the value at the appropriate position in the array:

    a[H(k1)] = v1
    

    Now let's say that later we forgot if k1 is in the hash table and we want to check if it's there. To do this we simply look up a[H(k1)] and see if any value is there, i.e. a[H(k1)] != nil. This is clearly a constant time lookup.

    But what if we want to see if v1, or even some other v2 is anywhere in our hashtable? This is not so easy because we have no function that maps a vi to a position in our array. It could be associated with any key. So the only way to see if it exists in the table is to scan the entire array, checking every value:

    for i in 0..n-1:
      if a[i] == v2:
        return true
    return false
    

    To make this a bit more concrete, imagine your keys are names, and your values are cities of residence. Now compare asking "Is Bob Jones in the hash table?" to "Is there anyone from New York in the hash table?". We can hash "Bob Jones" and see if there's anything in the corresponding array position (because that's how "Bob Jones" would have been inserted), but we have no similarly quick way to look up "New York".

    I am assuming this is what you are asking, and you have confused the terminology a bit. Please comment if this is not what you wanted.

    0 讨论(0)
  • 2021-01-12 09:29

    Great question!

    Assume

    1. We want to map strings to values
    2. hashFunction(string) => hashedIndex : int in O(1)
    3. valueArray : [any] stores values
    4. valueIndex : int is the first empty index in valueArray
    5. lookupArray : [int] stores each valueIndex at hashedIndex
    6. array lookups are O(1).
    // Setting a value
    
    valueArray[valueIndex] = value 
    
    hashedIndex = hashFunction(string)
    
    lookupArray[hashedIndex] = valueIndex
    
    
    // Looking up a value
    
    hashedIndex = hashFunction(string) // O(1)
    
    valueIndex = lookupArray[hashedIndex]; // O(1) array lookup
    
    value = valueArray[valueIndex]; // O(1) array lookup
    

    Lots of details omitted to answer your question clearly.

    Hope that helps!

    0 讨论(0)
  • 2021-01-12 09:31

    Sounds like you are in search of more elaborated explanation!

    I assume that you already understand that array element lookup takes O(1) i.e. if I already knew that I wanted to lookup 100th element in array then it would only take me O(1) because this is a simple memory address lookup (by adding 100 to the first element's address).

    The method of hashing makes use of this memory address lookup to achieve O(1) average time. Now obviously this means that you need to be able to convert lookup key to memory address. Let me give you a very simplified example of how this works in a hashtable (just to be clear, dictionaries implement hashtable under the hood so when I mention hashtable , the exact same principles are applicable to dictionaries too).

    The simplified example scenario; we need to lookup mailing addresses of customers by their first name. For simplicity assume that names are going to be unique and they've normal a to z letter. Let's say initially we are designing this for only 10 customers (i.e. their names and their addresses).

    Now lets say that we must solve this by storing name-addresses pairs in hashtable and we must create our own hashfunction!!! A hashfunction which would take name as a parameter and convert it into a memory lookup!!.

    Now take a moment and think how many arrays are required here? What would be their type and what would be their size?
    We definitely need one array to store mailing addresses. What should be the size? Well we need to store 10 mailing addresses so size has to be 10! We would also need second array for storing element indexes of first array!! Or in other words we need a second array to store references to mailing addresses (from first array) for our customer names. What should be the size of this array? Definitely larger than 10! But it really comes down to the hashfunction we design. For simplicity let's create a hashfunction which simply takes first letter of the name parameter and converts it to index. i.e. If name starts from A then it hashvalue is 1, for b it is 2, for c it is 3... for z it is 26. So at the least our lookup array size has to be 26 (you must be thinking that that's the wastage of lot of space for storing 10 addresses!! but it may worth it since it is going to give us performance) Let’s try to understand this with an example. Let’s say our first customer name is Bob. To store address for Bob first step is to find the first empty element in mailing address array. This is the first name so whole mailing address array is empty. We can store Bob’s address at index zero in mailing address array. When we store this address we’ll also mark it as Bob’s address at index 0. (I’m using this ‘marking’ terminology to later explain lookup vs searching) Then we find out hashvalue for name Bob. In this case it would be 2! So in the lookup array at location 2 we store 0. (i.e. the index to mailing address for Bob). Now lets say our second customer is Hamish; we store mailing address for Hamish at index 1 (i.e. the second element) in mailing address array; mark it as Hamish’s address and then we find out hashvalue for Hamish. Since Hamish starts from ‘H’, value would be 8. So in our lookup array at location 8 we store value 1 (i.e. the index for Hamish’s address). We can repeat this procedure for all 10 customers and store their addresses. Now later when you want to lookup Bob’s address you could look it up very fast simply by following a simple two step procedure. Step 1- convert name Bob to hashvalue ; answer is 2; go ahead and check location 2 in mailing address array; if it is marked as Bob’s address then return location 2 !! Same for Hamish; H-> gives 8. Go ahead and look up address from location 8; if it is marked as Hamish’s address then return address from location 8. This mechanism is called ‘looking up’. If you hadn’t created the second array (lookup array) then you’d only have mailing address array and you’d have to go through each address one by one and check if it is marked with the customer name you are looking for or not!. Now, what if there are two customer names starting with same letter? That’s called hash collision and that can be dealt with different approaches. What if we need to store 10000 names? That means we are have to use a better hashfunction which would give us less hash collisions. I’m not covering those two terminology here since I believe that question only demanded explaining lookup vs searching.

    0 讨论(0)
提交回复
热议问题