Locking on an interned string?

前端 未结 6 1244
栀梦
栀梦 2021-01-01 16:43

Update: It is acceptable if this method is not thread safe, but I\'m interested in learning how I would make it thread safe. Also, I do not want to lock on

相关标签:
6条回答
  • 2021-01-01 17:15

    Problems with @wsanville's own solution, partly mentioned before:

    1. other parts of your code base might lock on the same interned string instances for different purposes, causing only performance issues, if lucky, and deadlocks if unlucky (potentially only in the future, as the code base grows, being extended by coders unaware of your String.Intern locking pattern) - note that this includes locks on the same interned string even if they are in different AppDomains, potentially leading to cross-AppDomain deadlocks
    2. it's impossible for you to reclaim the interned memory in case you decided to do so
    3. String.Intern() is slow

    To address all these 3 issues, you could implement your own Intern() that you tie to your specific locking purpose, i.e. do not use it as a global, general-purpose string interner:

    private static readonly ConcurrentDictionary<string, string> concSafe = 
        new ConcurrentDictionary<string, string>();
    static string InternConcurrentSafe(string s)
    {
        return concSafe.GetOrAdd(s, String.Copy);
    }
    

    I called this method ...Safe(), because when interning I will not store the passed in String instance, as that might e.g. be an already interned String, making it subject to the problems mentioned in 1. above.

    To compare the performance of various ways of interning strings, I also tried the following 2 methods, as well as String.Intern.

    private static readonly ConcurrentDictionary<string, string> conc = 
        new ConcurrentDictionary<string, string>();
    static string InternConcurrent(string s)
    {
        return conc.GetOrAdd(s, s);
    }
    
    private static readonly Dictionary<string, string> locked = 
        new Dictionary<string, string>(5000);
    static string InternLocked(string s)
    {
        string interned;
        lock (locked)
            if (!locked.TryGetValue(s, out interned))
                interned = locked[s] = s;
        return interned;
    }
    

    Benchmark

    100 threads, each randomly selecting one of 5000 different strings (each containing 8 digits) 50000 times and then calling the respective intern method. All values after warming up sufficiently. This is Windows 7, 64bit, on a 4core i5.

    N.B. Warming up the above setup implies that after warming up, there won't be any writes to the respective interning dictionaries, but only reads. It's what I was interested in for the use case at hand, but different write/read ratios will probably affect the results.

    Results

    • String.Intern(): 2032 ms
    • InternLocked(): 1245 ms
    • InternConcurrent(): 458 ms
    • InternConcurrentSafe(): 453 ms

    The fact that InternConcurrentSafe is as fast as InternConcurrent makes sense in light of the fact that these figures are after warming up (see above N.B.), so there are in fact no or only a few invocations of String.Copy during the test.


    In order to properly encapsulate this, create a class like this:

    public class StringLocker
    {
        private readonly ConcurrentDictionary<string, string> _locks =
            new ConcurrentDictionary<string, string>();
    
        public string GetLockObject(string s)
        {
            return _locks.GetOrAdd(s, String.Copy);
        }
    }
    

    and after instantiating one StringLocker for every use case you might have, it is as easy as calling

    lock(myStringLocker.GetLockObject(s))
    {
        ...
    

    N.B.

    Thinking again, there's no need to return an object of type string if all you want to do is lock on it, so copying the characters is totally unnecessary, and the following would perform better than above class.

    public class StringLocker
    {
        private readonly ConcurrentDictionary<string, object> _locks =
            new ConcurrentDictionary<string, object>();
    
        public object GetLockObject(string s)
        {
            return _locks.GetOrAdd(s, k => new object());
        }
    }
    
    0 讨论(0)
  • 2021-01-01 17:15

    Never lock on strings. In particular on those that are interned. See this blog entry on the danger of locking on interned strings.

    Just create a new object and lock on that:

    object myLock = new object();
    
    0 讨论(0)
  • 2021-01-01 17:29

    A variant of Daniel's answer...

    Rather than creating a new lock object for every single string you could share a small-ish set of locks, choosing which lock to use depending on the string's hashcode. This will mean less GC pressure if you potentially have thousands, or millions, of keys, and should allow enough granularity to avoid any serious blocking (perhaps after a few tweaks, if necessary).

    public static T CheckCache<T>(string key, Func<T> fn, DateTime expires)
    {
        object cached = HttpContext.Current.Cache[key];
        if (cached != null)
            return (T)cached;
    
        int stripeIndex = (key.GetHashCode() & 0x7FFFFFFF) % _stripes.Length;
    
        lock (_stripes[stripeIndex])
        {
            T result = fn();
            HttpContext.Current.Cache.Insert(key, result, null, expires,
                                             Cache.NoSlidingExpiration);
            return result;
        }
    }
    
    // share a set of 32 locks
    private static readonly object[] _stripes = Enumerable.Range(0, 32)
                                                          .Select(x => new object())
                                                          .ToArray();
    

    This will allow you to tweak the locking granularity to suit your particular needs just by changing the number of elements in the _stripes array. (However, if you need close to one-lock-per-string granularity then you're better off going with Daniel's answer.)

    0 讨论(0)
  • 2021-01-01 17:29

    I added a solution in Bardock.Utils package inspired by @eugene-beresovsky answer.

    Usage:

    private static LockeableObjectFactory<string> _lockeableStringFactory = 
        new LockeableObjectFactory<string>();
    
    string key = ...;
    
    lock (_lockeableStringFactory.Get(key))
    {
        ...
    }
    

    Solution code:

    namespace Bardock.Utils.Sync
    {
        /// <summary>
        /// Creates objects based on instances of TSeed that can be used to acquire an exclusive lock.
        /// Instanciate one factory for every use case you might have.
        /// Inspired by Eugene Beresovsky's solution: https://stackoverflow.com/a/19375402
        /// </summary>
        /// <typeparam name="TSeed">Type of the object you want lock on</typeparam>
        public class LockeableObjectFactory<TSeed>
        {
            private readonly ConcurrentDictionary<TSeed, object> _lockeableObjects = new ConcurrentDictionary<TSeed, object>();
    
            /// <summary>
            /// Creates or uses an existing object instance by specified seed
            /// </summary>
            /// <param name="seed">
            /// The object used to generate a new lockeable object.
            /// The default EqualityComparer<TSeed> is used to determine if two seeds are equal. 
            /// The same object instance is returned for equal seeds, otherwise a new object is created.
            /// </param>
            public object Get(TSeed seed)
            {
                return _lockeableObjects.GetOrAdd(seed, valueFactory: x => new object());
            }
        }
    }
    
    0 讨论(0)
  • 2021-01-01 17:32

    I would go with the pragmatic approach and use the dummy variable.
    If this is not possible for whatever reason, I would use a Dictionary<TKey, TValue> with key as the key and a dummy object as the value and lock on that value, because strings are not suitable for locking:

    private object _syncRoot = new Object();
    private Dictionary<string, object> _syncRoots = new Dictionary<string, object>();
    
    public static T CheckCache<T>(string key, Func<T> fn, DateTime expires)
    {
        object keySyncRoot;
        lock(_syncRoot)
        {
    
            if(!_syncRoots.TryGetValue(key, out keySyncRoot))
            {
                keySyncRoot = new object();
                _syncRoots[key] = keySyncRoot;
            }
        }
        lock(keySyncRoot)
        {
    
            object cache = HttpContext.Current.Cache.Get(key);
            if (cache == null)
            {
                T result = fn();
                HttpContext.Current.Cache.Insert(key, result, null, expires, 
                                                 Cache.NoSlidingExpiration);
                return result;
            }
            else
                return (T)cache;
        }
    }
    

    However, in most cases this is overkill and unnecessary micro optimization.

    0 讨论(0)
  • 2021-01-01 17:36

    According to the documentation, the Cache type is thread safe. So the downside for not synchronizing yourself is that when the item is being created, it may be created a few times before the other threads realize they don't need to create it.

    If the situation is simply to cache common static / read-only things, then don't bother synchronizing just to save the odd few collisions that might occur. (Assuming the collisions are benign.)

    The locking object won't be specific to the strings, it will be specific to the granularity of the lock you require. In this case, you are trying to lock access to the cache, so one object would service locking the cache. The idea of locking on the specific key that comes in isn't the concept locking is usually concerned with.

    If you want to stop expensive calls from occurring multiple times, then you can rip the loading logic out into a new class LoadMillionsOfRecords, call .Load and lock once on an internal locking object as per Oded's answer.

    0 讨论(0)
提交回复
热议问题