.net collection for fast insert/delete

后端 未结 6 1161
梦毁少年i
梦毁少年i 2021-01-30 11:43

I need to maintain a roster of connected clients that are very shortlived and frequently go up and down. Due to the potential number of clients I need a collection that supports

6条回答
  •  爱一瞬间的悲伤
    2021-01-30 12:40

    C5 Generic Collection Library

    The best implementations I have found in C# and C++ are these -- for C#/CLI:

    • http://www.itu.dk/research/c5/Release1.1/ITU-TR-2006-76.pdf
    • http://www.itu.dk/research/c5/

    It's well researched, has extensible unit tests, and since February they also have implemented the common interfaces in .Net which makes it a lot easier to work with the collections. They were featured on Channel9 and they've done extensive performance testing on the collections.

    If you are using data-structures anyway these researchers have a red-black-tree implementation in their library, similar to what you find if you fire up Lütz reflector and have a look in System.Data's internal structures :p. Insert-complexity: O(log(n)).

    Lock-free C++ collections

    Then, if you can allow for some C++ interop and you absolutely need the speed and want as little overhead as possible, then these lock-free ADTs from Dmitriy V'jukov are probably the best you can get in this world, outperforming Intel's concurrent library of ADTs.

    • http://groups.google.com/group/lock-free

    I've read some of the code and it's really the makings of someone well versed in how these things are put together. VC++ can do native C++ interop without annoying boundaries. http://www.swig.org/ can otherwise help you wrap C++ interfaces for consumption in .Net, or you can do it yourself through P/Invoke.

    Microsoft's Take

    They have written tutorials, this one implementing a rather unpolished skip-list in C#, and discussing other types of data-structures. (There's a better SkipList at CodeProject, which is very polished and implement the interfaces in a well-behaved manner.) They also have a few data-structures bundled with .Net, namely the HashTable/Dictionary<,> and HashSet. Of course there's the "ResizeArray"/List type as well together with a stack and queue, but they are all "linear" on search.

    Google's perf-tools

    If you wish to speed up the time it takes for memory-allocation you can use google's perf-tools. They are available at google code and they contain a very interesting multi-threaded malloc-implementation (TCMalloc) which shows much more consistent timing than the normal malloc does. You could use this together with the lock-free structures above to really go crazy with performance.

    Improving response times with memoization

    You can also use memoization on functions to improve performance through caching, something interesting if you're using e.g. F#. F# also allows C++ interop, so you're OK there.

    O(k)

    There's also the possibility of doing something on your own using the research which has been done on bloom-filters, which allow O(k) lookup complexity where k is a constant that depends on the number of hash-functions you have implemented. This is how google's BigTable has been implemented. These filter will get you the element if it's in the set or possibly with a very low likeliness an element which is not the one you're looking for (see the graph at wikipedia -- it's approaching P(wrong_key) -> 0.01 as size is around 10000 elements, but you can go around this by implementing further hash-functions/reducing the set.

    I haven't searched for .Net implementations of this, but since the hashing calculations are independent you can use MS's performance team's implementation of Tasks to speed that up.

    "My" take -- randomize to reach average O(log n)

    As it happens I just did a coursework involving data-structures. In this case we used C++, but it's very easy to translate to C#. We built three different data-structures; a bloom-filter, a skip-list and random binary search tree.

    See the code and analysis after the last paragraph.

    Hardware-based "collections"

    Finally, to make my answer "complete", if you truly need speed you can use something like Routing-tables or Content-addressable memory . This allows you to very quickly O(1) in principle get a "hash"-to-value lookup of your data.

    Random Binary Search Tree/Bloom Filter C++ code

    I would really appreciate feedback if you find mistakes in the code, or just pointers on how I can do it better (or with better usage of templates). Note that the bloom filter isn't like it would be in real life; normally you don't have to be able to delete from it and then it much much more space efficient than the hack I did to allow the delete to be tested.

    DataStructure.h

    #ifndef DATASTRUCTURE_H_
    #define DATASTRUCTURE_H_
    
    class DataStructure
    {
    public:
        DataStructure() {countAdd=0; countDelete=0;countFind=0;}
        virtual ~DataStructure() {}
    
        void resetCountAdd() {countAdd=0;}
        void resetCountFind() {countFind=0;}
        void resetCountDelete() {countDelete=0;}
    
        unsigned int getCountAdd(){return countAdd;}
        unsigned int getCountDelete(){return countDelete;}
        unsigned int getCountFind(){return countFind;}
    
    protected:
        unsigned int countAdd;
        unsigned int countDelete;
        unsigned int countFind;
    };
    
    #endif /*DATASTRUCTURE_H_*/
    

    Key.h

    #ifndef KEY_H_
    #define KEY_H_
    
    #include 
    using namespace std;
    
    const int keyLength = 128;
    
    class Key : public string
    {
    public:
        Key():string(keyLength, ' ') {}
        Key(const char in[]): string(in){}
        Key(const string& in): string(in){}
    
        bool operator<(const string& other);
        bool operator>(const string& other);
        bool operator==(const string& other);
    
        virtual ~Key() {}
    };
    
    #endif /*KEY_H_*/
    

    Key.cpp

    #include "Key.h"
    
    bool Key::operator<(const string& other)
    {
        return compare(other) < 0;
    };
    
    bool Key::operator>(const string& other)
    {
        return compare(other) > 0;
    };
    
    bool Key::operator==(const string& other)
    {
        return compare(other) == 0;
    }
    

    BloomFilter.h

    #ifndef BLOOMFILTER_H_
    #define BLOOMFILTER_H_
    
    #include 
    #include 
    #include 
    #include 
    #include "Key.h"
    #include "DataStructure.h"
    
    #define LONG_BIT 32
    #define bitmask(val) (unsigned long)(1 << (LONG_BIT - (val % LONG_BIT) - 1))
    
    // TODO: Implement RW-locking on the reads/writes to the bitmap.
    
    class BloomFilter : public DataStructure
    {
    public:
        BloomFilter(){}
        BloomFilter(unsigned long length){init(length);}
        virtual ~BloomFilter(){}
    
        void init(unsigned long length);
        void dump();
    
        void add(const Key& key);
        void del(const Key& key);
    
        /**
         * Returns true if the key IS BELIEVED to exist, false if it absolutely doesn't.
         */
        bool testExist(const Key& key, bool v = false);
    
    private:
        unsigned long hash1(const Key& key);
        unsigned long hash2(const Key& key);
        bool exist(const Key& key);
        void getHashAndIndicies(unsigned long& h1, unsigned long& h2, int& i1, int& i2, const Key& key);
        void getCountIndicies(const int i1, const unsigned long h1,
            const int i2, const unsigned long h2, int& i1_c, int& i2_c);
    
        vector m_tickBook;
        vector m_useCounts;
        unsigned long m_length; // number of bits in the bloom filter
        unsigned long m_pockets; //the number of pockets
    
        static const unsigned long m_pocketSize; //bits in each pocket
    };
    
    #endif /*BLOOMFILTER_H_*/
    

    BloomFilter.cpp

    #include "BloomFilter.h"
    
    const unsigned long BloomFilter::m_pocketSize = LONG_BIT;
    
    void BloomFilter::init(unsigned long length)
    {
        //m_length = length;
        m_length = (unsigned long)((2.0*length)/log(2))+1;
        m_pockets = (unsigned long)(ceil(double(m_length)/m_pocketSize));
        m_tickBook.resize(m_pockets);
    
        // my own (allocate nr bits possible to store in the other vector)
        m_useCounts.resize(m_pockets * m_pocketSize);
    
        unsigned long i; for(i=0; i< m_pockets; i++) m_tickBook[i] = 0;
        for (i = 0; i < m_useCounts.size(); i++) m_useCounts[i] = 0; // my own
    }
    
    unsigned long BloomFilter::hash1(const Key& key)
    {
        unsigned long hash = 5381;
        unsigned int i=0; for (i=0; i< key.length(); i++){
            hash = ((hash << 5) + hash) + key.c_str()[i]; /* hash * 33 + c */
        }
    
        double d_hash = (double) hash;
    
        d_hash *= (0.5*(sqrt(5)-1));
        d_hash -= floor(d_hash);
        d_hash *= (double)m_length;
    
        return (unsigned long)floor(d_hash);
    }
    
    unsigned long BloomFilter::hash2(const Key& key)
    {
        unsigned long hash = 0;
        unsigned int i=0; for (i=0; i< key.length(); i++){
            hash = key.c_str()[i] + (hash << 6) + (hash << 16) - hash;
        }
        double d_hash = (double) hash;
    
        d_hash *= (0.5*(sqrt(5)-1));
        d_hash -= floor(d_hash);
        d_hash *= (double)m_length;
    
        return (unsigned long)floor(d_hash);
    }
    
    bool BloomFilter::testExist(const Key& key, bool v){
        if(exist(key)) {
            if(v) cout<<"Key "<< key<<" is in the set"< 0) &&
                ((m_tickBook[i2] & bitmask(h2)) > 0);
    }
    
    /*
     * Gets the values of the indicies for two hashes and places them in
     * the passed parameters. The index is into m_tickBook.
     */
    void BloomFilter::getHashAndIndicies(unsigned long& h1, unsigned long& h2, int& i1,
        int& i2, const Key& key)
    {
        h1 = hash1(key);
        h2 = hash2(key);
        i1 = (int) h1/m_pocketSize;
        i2 = (int) h2/m_pocketSize;
    }
    
    /*
     * Gets the values of the indicies into the count vector, which keeps
     * track of how many times a specific bit-position has been used.
     */
    void BloomFilter::getCountIndicies(const int i1, const unsigned long h1,
        const int i2, const unsigned long h2, int& i1_c, int& i2_c)
    {
        i1_c = i1*m_pocketSize + h1%m_pocketSize;
        i2_c = i2*m_pocketSize + h2%m_pocketSize;
    }
    

    ** RBST.h **

    #ifndef RBST_H_
    #define RBST_H_
    
    #include 
    #include 
    #include 
    #include 
    #include "Key.h"
    #include "DataStructure.h"
    
    #define BUG(str) printf("%s:%d FAILED SIZE INVARIANT: %s\n", __FILE__, __LINE__, str);
    
    using namespace std;
    
    class RBSTNode;
    class RBSTNode: public Key
    {
    public:
        RBSTNode(const Key& key):Key(key)
        {
            m_left =NULL;
            m_right = NULL;
            m_size = 1U; // the size of one node is 1.
        }
        virtual ~RBSTNode(){}
    
        string setKey(const Key& key){return Key(key);}
    
        RBSTNode* left(){return m_left; }
        RBSTNode* right(){return m_right;}
    
        RBSTNode* setLeft(RBSTNode* left) { m_left = left; return this; }
        RBSTNode* setRight(RBSTNode* right) { m_right =right; return this; }
    
    #ifdef DEBUG
        ostream& print(ostream& out)
        {
            out << "Key(" << *this << ", m_size: " << m_size << ")";
            return out;
        }
    #endif
    
        unsigned int size() { return m_size; }
    
        void setSize(unsigned int val)
        {
    #ifdef DEBUG
            this->print(cout);
            cout << "::setSize(" << val << ") called." << endl;
    #endif
    
            if (val == 0) throw "Cannot set the size below 1, then just delete this node.";
            m_size = val;
        }
    
        void incSize() {
    #ifdef DEBUG
            this->print(cout);
            cout << "::incSize() called" << endl;
    #endif
    
            m_size++;
        }
    
        void decrSize()
        {
    #ifdef DEBUG
            this->print(cout);
            cout << "::decrSize() called" << endl;
    #endif
    
            if (m_size == 1) throw "Cannot decrement size below 1, then just delete this node.";
            m_size--;
        }
    
    #ifdef DEBUG
        unsigned int size(RBSTNode* x);
    #endif
    
    private:
        RBSTNode(){}
        RBSTNode* m_left;
        RBSTNode* m_right;
        unsigned int m_size;
    };
    
    class RBST : public DataStructure
    {
    public:
        RBST() {
            m_size = 0;
            m_head = NULL;
            srand(time(0));
        };
    
        virtual ~RBST() {};
    
        /**
         * Tries to add key into the tree and will return
         *      true  for a new item added
         *      false if the key already is in the tree.
         *
         * Will also have the side-effect of printing to the console if v=true.
         */
        bool add(const Key& key, bool v=false);
    
        /**
         * Same semantics as other add function, but takes a string,
         * but diff name, because that'll cause an ambiguity because of inheritance.
         */
        bool addString(const string& key);
    
        /**
         * Deletes a key from the tree if that key is in the tree.
         * Will return
         *      true  for success and
         *      false for failure.
         *
         * Will also have the side-effect of printing to the console if v=true.
         */
        bool del(const Key& key, bool v=false);
    
        /**
         * Tries to find the key in the tree and will return
         *      true if the key is in the tree and
         *      false if the key is not.
         *
         * Will also have the side-effect of printing to the console if v=true.
         */
        bool find(const Key& key, bool v = false);
    
        unsigned int count() { return m_size; }
    
    #ifdef DEBUG
        int dump(char sep = ' ');
        int dump(RBSTNode* target, char sep);
        unsigned int size(RBSTNode* x);
    #endif
    
    private:
        RBSTNode* randomAdd(RBSTNode* target, const Key& key);
        RBSTNode* addRoot(RBSTNode* target, const Key& key);
        RBSTNode* rightRotate(RBSTNode* target);
        RBSTNode* leftRotate(RBSTNode* target);
    
        RBSTNode* del(RBSTNode* target, const Key& key);
        RBSTNode* join(RBSTNode* left, RBSTNode* right);
    
        RBSTNode* find(RBSTNode* target, const Key& key);
    
        RBSTNode* m_head;
        unsigned int m_size;
    };
    
    #endif /*RBST_H_*/
    

    ** RBST.cpp **

    #include "RBST.h"
    
    bool RBST::add(const Key& key, bool v){
        unsigned int oldSize = m_size;
        m_head = randomAdd(m_head, key);
        if (m_size > oldSize){
            if(v) cout<<"Node "<left(), sep);
        cout<< *target<right(), sep);
        return ret;
    };
    #endif
    
    /**
     * Rotates the tree around target, so that target's left
     * is the new root of the tree/subtree and updates the subtree sizes.
     *
     *(target)  b               (l) a
     *         / \      right      / \
     *        a   ?     ---->     ?   b
     *       / \                     / \
     *      ?   x                   x   ?
     *
     */
    RBSTNode* RBST::rightRotate(RBSTNode* target) // private
    {
        if (target == NULL) throw "Invariant failure, target is null"; // Note: may be removed once tested.
        if (target->left() == NULL) throw "You cannot rotate right around a target whose left node is NULL!";
    
    #ifdef DEBUG
        cout    <<"Right-rotating b-node ";
        target->print(cout);
        cout    << " for a-node ";
        target->left()->print(cout);
        cout    << "." << endl;
    #endif
    
        RBSTNode* l = target->left();
        int as0 = l->size();
    
        // re-order the sizes
        l->setSize( l->size() + (target->right() == NULL ? 0 : target->right()->size()) + 1); // a.size += b.right.size + 1; where b.right may be null.
        target->setSize( target->size() -as0 + (l->right() == NULL ? 0 : l->right()->size()) ); // b.size += -a_0_size + x.size where x may be null.
    
        // swap b's left (for a)
        target->setLeft(l->right());
    
        // and a's right (for b's left)
        l->setRight(target);
    
    #ifdef DEBUG
        cout    << "A-node size: " << l->size() << ", b-node size: " << target->size() << "." << endl;
    #endif
    
        // return the new root, a.
        return l;
    };
    
    /**
     * Like rightRotate, but the other way. See docs for rightRotate(RBSTNode*)
     */
    RBSTNode* RBST::leftRotate(RBSTNode* target)
    {
        if (target == NULL) throw "Invariant failure, target is null";
        if (target->right() == NULL) throw "You cannot rotate left around a target whose right node is NULL!";
    
    #ifdef DEBUG
        cout    <<"Left-rotating a-node ";
        target->print(cout);
        cout    << " for b-node ";
        target->right()->print(cout);
        cout    << "." << endl;
    #endif
    
        RBSTNode* r = target->right();
        int bs0 = r->size();
    
        // re-roder the sizes
        r->setSize(r->size() + (target->left() == NULL ? 0 : target->left()->size()) + 1);
        target->setSize(target->size() -bs0 + (r->left() == NULL ? 0 : r->left()->size()));
    
        // swap a's right (for b's left)
        target->setRight(r->left());
    
        // swap b's left (for a)
        r->setLeft(target);
    
    #ifdef DEBUG
        cout    << "Left-rotation done: a-node size: " << target->size() << ", b-node size: " << r->size() << "." << endl;
    #endif
    
        return r;
    };
    
    //
    /**
     * Adds a key to the tree and returns the new root of the tree.
     * If the key already exists doesn't add anything.
     * Increments m_size if the key didn't already exist and hence was added.
     *
     * This function is not called from public methods, it's a helper function.
     */
    RBSTNode* RBST::addRoot(RBSTNode* target, const Key& key)
    {
        countAdd++;
    
        if (target == NULL) return new RBSTNode(key);
    
    #ifdef DEBUG
        cout << "addRoot(";
        cout.flush();
        target->print(cout) << "," << key << ") called." << endl;
    #endif
    
        if (*target < key)
        {
            target->setRight( addRoot(target->right(), key) );
            target->incSize(); // Should I?
            RBSTNode* res = leftRotate(target);
    #ifdef DEBUG
            if (target->size() != size(target))
                BUG("in addRoot 1");
    #endif
            return res;
        }
    
        target->setLeft( addRoot(target->left(), key) );
        target->incSize(); // Should I?
        RBSTNode* res = rightRotate(target);
    #ifdef DEBUG
        if (target->size() != size(target))
            BUG("in addRoot 2");
    #endif
        return res;
    };
    
    /**
     * This function is called from the public add(key) function,
     * and returns the new root node.
     */
    RBSTNode* RBST::randomAdd(RBSTNode* target, const Key& key)
    {
        countAdd++;
    
        if (target == NULL)
        {
            m_size++;
            return new RBSTNode(key);
        }
    
    #ifdef DEBUG
        cout << "randomAdd(";
        target->print(cout) << ", \"" << key << "\") called." << endl;
    #endif
    
        int r = (rand() % target->size()) + 1;
    
        // here is where we add the target as root!
        if (r == 1)
        {
            m_size++;   // TODO: Need to lock.
            return addRoot(target, key);
        }
    
    #ifdef DEBUG
        printf("randomAdd recursion part, ");
    #endif
    
        // otherwise, continue recursing!
        if (*target <= key)
        {
    #ifdef DEBUG
        printf("target <= key\n");
    #endif
            target->setRight( randomAdd(target->right(), key) );
            target->incSize(); // TODO: Need to lock.
    #ifdef DEBUG
            if (target->right()->size() != size(target->right()))
                BUG("in randomAdd 1");
    #endif
        }
        else
        {
    #ifdef DEBUG
        printf("target > key\n");
    #endif
            target->setLeft( randomAdd(target->left(), key) );
            target->incSize(); // TODO: Need to lock.
    #ifdef DEBUG
            if (target->left()->size() != size(target->left()))
                BUG("in randomAdd 2");
    #endif
        }
    
    #ifdef DEBUG
        printf("randomAdd return part\n");
    #endif
    
        m_size++;       // TODO: Need to lock.
        return target;
    };
    
    /////////////////////////////////////////////////////////////
    /////////////////////  DEL FUNCTIONS ////////////////////////
    /////////////////////////////////////////////////////////////
    
    /**
     * Deletes a node with the passed key.
     * Returns the root node.
     * Decrements m_size if something was deleted.
     */
    RBSTNode* RBST::del(RBSTNode* target, const Key& key)
    {
        countDelete++;
    
        if (target == NULL) return NULL;
    
    #ifdef DEBUG
        cout << "del(";
        target->print(cout) << ", \"" << key << "\") called." << endl;
    #endif
    
        RBSTNode* ret = NULL;
    
        // found the node to delete
        if (*target == key)
        {
            ret = join(target->left(), target->right());
    
            m_size--;
            delete target;
    
            return ret; // return the newly built joined subtree!
        }
    
        // store a temporary size before recursive deletion.
        unsigned int size = m_size;
    
        if (*target < key)  target->setRight( del(target->right(), key) );
        else                target->setLeft( del(target->left(), key) );
    
        // if the previous recursion changed the size, we need to decrement the size of this target too.
        if (m_size < size) target->decrSize();
    
    #ifdef DEBUG
        if (RBST::size(target) != target->size())
            BUG("in del");
    #endif
    
        return target;
    };
    
    /**
     * Joins the two subtrees represented by left and right
     * by randomly choosing which to make the root, weighted on the
     * size of the sub-tree.
     */
    RBSTNode* RBST::join(RBSTNode* left, RBSTNode* right)
    {
        if (left == NULL) return right;
        if (right == NULL) return left;
    
    #ifdef DEBUG
        cout << "join(";
        left->print(cout);
        cout << ",";
        right->print(cout) << ") called." << endl;
    #endif
    
        // Find the chance that we use the left tree, based on its size over the total tree size.
        // 3 s.d. randomness :-p e.g. 60.3% chance.
        bool useLeft = ((rand()%1000) < (signed)((float)left->size()/(float)(left->size() + right->size()) * 1000.0));
    
        RBSTNode* subtree = NULL;
    
        if (useLeft)
        {
            subtree = join(left->right(), right);
    
            left->setRight(subtree)
                ->setSize((left->left() == NULL ? 0 : left->left()->size())
                            + subtree->size() + 1 );
    
    #ifdef DEBUG
            if (size(left) != left->size())
                BUG("in join 1");
    #endif
    
            return left;
        }
    
        subtree = join(right->left(), left);
    
        right->setLeft(subtree)
             ->setSize((right->right() == NULL ? 0 : right->right()->size())
                        + subtree->size() + 1);
    
    #ifdef DEBUG
        if (size(right) != right->size())
            BUG("in join 2");
    #endif
    
        return right;
    };
    
    /////////////////////////////////////////////////////////////
    /////////////////////  FIND FUNCTIONS ///////////////////////
    /////////////////////////////////////////////////////////////
    
    /**
     * Tries to find the key in the tree starting
     * search from target.
     *
     * Returns NULL if it was not found.
     */
    RBSTNode* RBST::find(RBSTNode* target, const Key& key)
    {
        countFind++; // Could use private method only counting the first call.
        if (target == NULL) return NULL; // not found.
        if (*target == key) return target; // found (does string override ==?)
        if (*target < key) return find(target->right(), key); // search for gt to the right.
        return find(target->left(), key); // search for lt to the left.
    };
    
    #ifdef DEBUG
    
    unsigned int RBST::size(RBSTNode* x)
    {
        if (x == NULL) return 0;
        return 1 + size(x->left()) + size(x->right());
    }
    
    #endif
    

    I'll save the SkipList for another time since it's already possible to find good implementations of a SkipList from the links and my version wasn't much different.

    The graphs generated from the test-file are as follows:

    Graph showing time taken to add new items for BloomFilter, RBST and SkipList. graph http://haf.se/content/dl/addtimer.png

    Graph showing time taken to find items for BloomFilter, RBST and SkipList graph http://haf.se/content/dl/findtimer.png

    Graph showing time taken to delete items for BloomFilter, RBST and SkipList graph http://haf.se/content/dl/deltimer.png

    So as you can see, the random binary search tree was rather a lot better than the SkipList. The bloom filter lives up to its O(k).

提交回复
热议问题