What is the cost of deleting a value from a hashtable?

半城伤御伤魂 提交于 2019-12-11 10:52:55

问题


Now I have this question where I was asked the cost of deleting a value from a hash table when we used linear probing while the insertion process.

What I could figure out from reading various stuff on the internet is that it has to do something with the load factor. Though I am not sure, but I read a relation between the load factor and no of probes required and it is No of probes = 1 / (1-LF).

So I believe the cost has to be dependent on the probe sequence. But then another thought ruins everything.

What if the element was inserted in p probes and now I am trying to delete this element. But before this I had already deleted few elements having the same hash code and were a part of insertion in probes less than p.

In this case I reach to a stage where I see a slot empty in the hash table but I am not sure if the element I am trying to delete is already deleted or is at some other location as a result of probing.

I also found that once I delete an element I must mark this slot with some special indicator to inform that it is available, but this doesn't solve my problem of being uncertain about the element which I am willing to delete.

Could anyone please suggest how to find the cost in such cases? Is the approach going to vary if it is non-linear probing?


回答1:


The standard approach is "lookup the element, mark as deleted". Marking obviously has O(1) cost, so the total operation cost is the same as just lookup: O(1) expected. It can be as high as O(n) in degenerate cases (e.g. all elements have the same hash). O(1) expected is all we can say theoretically.

About the load factor. The higher the load factor (ratio of number of occupied buckets to the total number), the larger is the expected factor (but this doesn't change the theoretical O cost). Note that in this case load factor includes number of both present in the table elements plus the number of buckets that got marked as deleted previously.

Other probing kinds (e.g. quadratic) don't change the theoretical cost, but may alter the expected constant factor or its variance. If you look at "fallback" sequences, in linear ordering the sequences of different buckets overlap. This means that if for some bucket the sequence is long, for adjacent buckets it will also be long. E.g.: if buckets 4 to 10 are occupied, sequence for bucket #4 is 7 bucket long (4, 5, 6, ..., 10), for #5 it's 6 and so on. For quadratic probing this is not the case.

However, linear probing has the benefit of better memory-cache behavior, since you check memory cells close to each other. In practice, though, for quadratic probing fallback sequences are rarely long enough for this to matter.

Finally, in linear probing case, it is possible to work without deleted mark, but for this you'd have to complicate deleting procedure considerably (still O(1) expected, though, but with much higher constant factor). Whether it is worth it has to be decided with actual profiling; for example, this simplifies inserting somewhat and lookup a bit. For a C++ implementation this would have the downside that erase() would invalidate iterators, though.



来源:https://stackoverflow.com/questions/10855989/what-is-the-cost-of-deleting-a-value-from-a-hashtable

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!