Sorted hash table (map, dictionary) data structure design

两盒软妹~` 提交于 2019-11-30 04:24:13

Why exactly do you need a sort() function ?
What do you perhaps want and need is a Red-Black Tree.

http://en.wikipedia.org/wiki/Red-black_tree

These trees are automatically sorting your input by a comparator you give. They are complex, but have excellent O(n) characteristics. Couple your tree entries as key with a hash map as dictionary and you get your datastructure.

In Java it is implemented as TreeMap as instance of SortedMap.

For "O(1) get, put, and remove operations" you essentially need O(1) lookup, which implies a hash function (as you know), but the requirements of a good hash function often break the requirement to be easily sorted. (If you had a hash table where adjacent values mapped to the same bucket, it would degenerate to O(N) on lots of common data, which is a worse case you typically want a hash function to avoid.)

I can think of how to get you 90% of the way there. Set up a hashtable alongside a parallel index that is sorted. The index has a clean part (ordered) and a dirty part (unordered). The index would map keys to the values (or references to the values stored in the hashtable - whichever suits you in terms of performance or memory use). When you add to the hashtable, the new entry is pushed onto the back of the dirty list. When you remove from the hashtable, the entry is nulled/removed from the clean and dirty parts of the index. You can sort the index, which sorts the dirty entries only, then merges them into the already sorted 'clean' part of the index. And obviously you can iterate over the index.

As far as I can see, this gives you the O(1) everywhere except on the remove operation and is still fairly simple to implement with standard containers (at least as provided by C++, Java, or Python). It also gives you the "second sort is cheaper" condition by only needing to sort the dirty index entries and then letting you do an O(N) merge. The cost of all this is obviously extra memory for the index and extra indirection when using it.

I don't know if there's a name, but you could store the current index of each item on the hash.

That is, you have a HashMap< Object, Pair( Integer, Object ) > and a List<Object> objects

When you put, add to the tail or head of the list and insert into the hashmap with your data and the index of insertion. This is O(1).

When you get, pull from the hashmap and ignore the index. This is O(1).

When you remove, you pull from the map. Take the index and remove from the list as well. This is O(1)

When you sort, just sort the list. Either update the indexes in the map during the sort, or update after the sort is complete. This does not affect the O(nlgn) sort, as it's a linear step. O(nlgn + n) == O(nlgn)

Ordered Dictionary

Recent versions of Python (2.7, 3.1) have "ordered dictionaries" which sound like what you're describing.

The official Python "ordered dictionary" implementation is inspired by previous 3rd-party implementations, as described in the PEP 372.

References:

What you're looking at is a hashtable with pointers in the entries to the next entry in sorted order. It's a lot like the LinkedHashMap in java except that the links are tracking a sort order rather than the insertion order. You can actually implement this totally by wrapping a LinkedHashMap and having the implementation of sort transfer the entries from the LinkedHashMap into a TreeMap and then back into a LinkedHashMap.

Here's an implementation that sorts the entries in an array list rather than transferring to a tree map. I think the sort algorithm used by Collection.sort will do a good job of merging the new entries into the already sorted portion.

public class SortaSortedMap<K extends Comparable<K>,V> implements Map<K,V> {

    private LinkedHashMap<K,V> innerMap;

    public SortaSortedMap() {
        this.innerMap = new LinkedHashMap<K,V>();
    }

    public SortaSortedMap(Map<K,V> map) {
        this.innerMap = new LinkedHashMap<K,V>(map);
    }

    public Collection<V> values() {
        return innerMap.values();
    }

    public int size() {
        return innerMap.size();
    }

    public V remove(Object key) {
        return innerMap.remove(key);
    }

    public V put(K key, V value) {
        return innerMap.put(key, value);
    }

    public Set<K> keySet() {
        return innerMap.keySet();
    }

    public boolean isEmpty() {
        return innerMap.isEmpty();
    }

    public Set<Entry<K, V>> entrySet() {
        return innerMap.entrySet();
    }

    public boolean containsKey(Object key) {
        return innerMap.containsKey(key);
    }

    public V get(Object key) {
        return innerMap.get(key);
    }

    public boolean containsValue(Object value) {
        return innerMap.containsValue(value);
    }

    public void clear() {
        innerMap.clear();
    }

    public void putAll(Map<? extends K, ? extends V> m) {
        innerMap.putAll(m);
    }

    public void sort() {
        List<Map.Entry<K,V>> entries = new ArrayList<Map.Entry<K,V>>(innerMap.entrySet());
        Collections.sort(entries, new KeyComparator());
        LinkedHashMap<K,V> newMap = new LinkedHashMap<K,V>();
        for (Map.Entry<K,V> e: entries) {
            newMap.put(e.getKey(), e.getValue());
        }
        innerMap = newMap;
    }

    private class KeyComparator implements Comparator<Map.Entry<K,V>> {

        public int compare(Entry<K, V> o1, Entry<K, V> o2) {
            return o1.getKey().compareTo(o2.getKey());
        }

    }

}

I'm not aware of a data structure classification with that exact behavior, at least not in Java Collections (or from nonlinear data structures class). Perhaps you can implement it, and it will henceforth be known as the RudigerMap.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!