问题
Ok I was a little unsure on how best name this problem :) But assume this scenarion, you're going out and fetching some webpage (with various urls) and caching it locally. The cache part is pretty easy to solve even with multiple threads.
However, imagine that one thread starts fetching an url, and a couple of milliseconds later another want to get the same url. Is there any good pattern for making the seconds thread's method wait on the first one to fetch the page , insert it into the cache and return it so you don't have to do multiple requests. With little enough overhead that it's worth doing even for requests that take about 300-700 ms? And without locking requests for other urls
Basically when requests for identical urls comes in tightly after each other I want the second request to "piggyback" the first request
I had some loose idea of having a dictionary where you insert an object with the key as url when you start fetching a page and lock on it. If there's any matching the key already it get's the object, locks on it and then tries to fetch the url for the actual cache.
I'm a little unsure of the particulars however to make it really thread-safe, using ConcurrentDictionary might be one part of it...
Is there any common pattern and solutions for scenarios like this?
Breakdown wrong behavior:
Thread 1: Checks the cache, it doesnt exists so starts fetching the url
Thread 2: Starts fetching the same url since it still doesn't exist in Cache
Thread 1: finished and inserts into the cache, returns the page
Thread 2: Finishes and also inserts into cache (or discards it), returns the page
Breakdown correct behavior:
Thread 1: Checks the cache, it doesnt exists so starts fetching the url
Thread 2: Wants the same url, but sees it's currently being fetched so waits on thread 1
Thread 1: finished and inserts into the cache, returns the page
Thread 2: Notices that thread 1 is finished and returns the page thread 1 it fetched
EDIT
Most solutions sofar seem to misunderstand the problem and only addressing the caching, as I said that isnt the problem, the problem is when doing an external web fetch to make the second fetch that is done before the first one has cached it to use the result from the first rather then doing a second
回答1:
You could use a ConcurrentDictionary<K,V> and a variant of double-checked locking:
public static string GetUrlContent(string url)
{
object value1 = _cache.GetOrAdd(url, new object());
if (value1 == null) // null check only required if content
return null; // could legitimately be a null string
var urlContent = value1 as string;
if (urlContent != null)
return urlContent; // got the content
// value1 isn't a string which means that it's an object to lock against
lock (value1)
{
object value2 = _cache[url];
// at this point value2 will *either* be the url content
// *or* the object that we already hold a lock against
if (value2 != value1)
return (string)value2; // got the content
urlContent = FetchContentFromTheWeb(url); // todo
_cache[url] = urlContent;
return urlContent;
}
}
private static readonly ConcurrentDictionary<string, object> _cache =
new ConcurrentDictionary<string, object>();
回答2:
EDIT: My code is quite a bit uglier now, but uses a separate lock per URL. This allows different URLs to be fetched asynchronously, however each URL will only be fetched once.
public class UrlFetcher
{
static Hashtable cache = Hashtable.Synchronized(new Hashtable());
public static String GetCachedUrl(String url)
{
// exactly 1 fetcher is created per URL
InternalFetcher fetcher = (InternalFetcher)cache[url];
if( fetcher == null )
{
lock( cache.SyncRoot )
{
fetcher = (InternalFetcher)cache[url];
if( fetcher == null )
{
fetcher = new InternalFetcher(url);
cache[url] = fetcher;
}
}
}
// blocks all threads requesting the same URL
return fetcher.Contents;
}
/// <summary>Each fetcher locks on itself and is initilized with null contents.
/// The first thread to call fetcher.Contents will cause the fetch to occur, and
/// block until completion.</summary>
private class InternalFetcher
{
private String url;
private String contents;
public InternalFetcher(String url)
{
this.url = url;
this.contents = null;
}
public String Contents
{
get
{
if( contents == null )
{
lock( this ) // "this" is an instance of InternalFetcher...
{
if( contents == null )
{
contents = FetchFromWeb(url);
}
}
}
return contents;
}
}
}
}
回答3:
Will the Semaphore
please stand up! stand up! stand up!
use Semaphore
you can easily synchronize your threads with it.
on both cases where
- you are trying to load a page that is currently being cached
- you are saving cache to a file where a page is loading from it.
in both scenarios you will face troubles.
it is just like writers and readers problem that is a common problem in Operating System Racing Issues. just when a thread wants to rebuild a cache or start caching a page no thread should read from it. if a thread is reading it it should wait until reading finished and replace the cache, no 2 threads should cache same page in to a same file. hence it is possible for all readers to read from a cache at anytime since no writer is writing on it.
you should read some semaphore using samples on msdn, it is very easy to use. just the thread that wants to do something is call the semaphore and if the resource can granted it do the works otherwise sleeps and wait to be woken up when the resource is ready.
回答4:
Disclaimer: This might be a n00bish answer. Please pardon me, if it is.
I'd recommend using some shared dictionary object with locks to keep a track of the url being currently fetched or have already been fetched.
At every request, check the url against this object.
If an entry for the url is present, check the cache. (this means one of the threads has either fetched it or is currently fetching it)
If its available in the cache, use it, else put the current thread to sleep for a while and check back again. (if not in cache, some thread is still fetching it, so wait while its done)
If the entry is not found in the dictionary object, add the url to it and send the request. Once it obtains a response, add it to cache.
This logic should work, however, you would need to take care of cache expiration and removal of the entry from the dictionary object.
回答5:
This is not exactly for concurrent caches but for all caches:
"A cache with a bad policy is another name for a memory leak" (Raymond Chen)
来源:https://stackoverflow.com/questions/4392956/pattern-for-concurrent-cache-sharing