C++ design: How to cache most recent used

前端 未结 10 1416
走了就别回头了
走了就别回头了 2021-02-09 12:21

We have a C++ application for which we try to improve performance. We identified that data retrieval takes a lot of time, and want to cache data. We can\'t store all data in mem

相关标签:
10条回答
  • 2021-02-09 12:59

    Scanning a map of 1000 elements will take very little time, and the scan will only be performed when the item is not in the cache which, if your locality of reference ideas are correct, should be a small proportion of the time. Of course, if your ideas are wrong, the cache is probably a waste of time anyway.

    0 讨论(0)
  • 2021-02-09 12:59

    In my approach, it's needed to have a hash-table for lookup stored objects quickly and a linked-list for maintain the sequence of last used.

    When an object are requested. 1) try to find a object from the hash table 2.yes) if found(the value have an pointer of the object in linked-list), move the object in linked-list to the top of the linked-list. 2.no) if not, remove last object from the linked-list and remove the data also from hash-table then put object into hash-table and top of linked-list.

    For example Let's say we have a cache memory only for 3 objects.

    The request sequence is 1 3 2 1 4.

    1) Hash-table : [1] Linked-list : [1]

    2) Hash-table : [1, 3] Linked-list : [3, 1]

    3) Hash-table : [1,2,3] Linked-list : [2,3,1]

    4) Hash-table : [1,2,3] Linked-list : [1,2,3]

    5) Hash-table : [1,2,4] Linked-list : [4,1,2] => 3 out

    0 讨论(0)
  • 2021-02-09 13:00

    As a simpler alternative, you could create a map that grows indefinitely and clears itself out every 10 minutes or so (adjust time for expected traffic).

    You could also log some very interesting stats this way.

    0 讨论(0)
  • 2021-02-09 13:04

    Have your map<long,CacheEntry> but instead of having an access timestamp in CacheEntry, put in two links to other CacheEntry objects to make the entries form a doubly-linked list. Whenever an entry is accessed, move it to the head of the list (this is a constant-time operation). This way you will both find the cache entry easily, since it's accessed from the map, and are able to remove the least-recently used entry, since it's at the end of the list (my preference is to make doubly-linked lists circular, so a pointer to the head suffices to get fast access to the tail as well). Also remember to put in the key that you used in the map into the CacheEntry so that you can delete the entry from the map when it gets evicted from the cache.

    0 讨论(0)
  • 2021-02-09 13:07

    Update: I got it now...

    This should be reasonably fast. Warning, some pseudo-code ahead.

    // accesses contains a list of id's. The latest used id is in front(),
    // the oldest id is in back().
    std::vector<id> accesses;
    std::map<id, CachedItem*> cache;
    
    CachedItem* get(long id) {
        if (cache.has_key(id)) {
             // In cache.
             // Move id to front of accesses.
             std::vector<id>::iterator pos = find(accesses.begin(), accesses.end(), id);
             if (pos != accesses.begin()) {
                 accesses.erase(pos);
                 accesses.insert(0, id);
             }
             return cache[id];
        }
    
        // Not in cache, fetch and add it.
        CachedItem* item = noncached_fetch(id);
        accesses.insert(0, id);
        cache[id] = item;
        if (accesses.size() > 1000)
        {
            // Remove dead item.
            std::vector<id>::iterator back_it = accesses.back();
            cache.erase(*back_it);
            accesses.pop_back();
        }
        return item;
    }
    

    The inserts and erases may be a little expensive, but may also not be too bad given the locality (few cache misses). Anyway, if they become a big problem, one could change to std::list.

    0 讨论(0)
  • 2021-02-09 13:09

    Another option might be to use boost::multi_index. It is designed to separate index from data and by that allowing multiple indexes on the same data.

    I am not sure this really would be faster then to scan through 1000 items. It might use more memory then good. Or slow down search and/or insert/remove.

    0 讨论(0)
提交回复
热议问题