Whose responsibility is it to cache / memoize function results?

问题

I'm working on software which allows the user to extend a system by implementing a set of interfaces.

In order to test the viability of what we're doing, my company "eats its own dog food" by implementing all of our business logic in these classes in the exact same way a user would.

We have some utility classes / methods that tie everything together and use the logic defined in the extendable classes.

I want to cache the results of the user-defined functions. Where should I do this?

Is it the classes themselves? This seems like it can lead to a lot of code duplication.
Is it the utilities/engine which uses these classes? If so, an uninformed user may call the class function directly and not receive any caching benefit.

Example code

public interface ILetter { string[] GetAnimalsThatStartWithMe(); }

public class A : ILetter { public string[] GetAnimalsThatStartWithMe()
                           { 
                               return new [] { "Aardvark", "Ant" }; 
                           }
                         }
public class B : ILetter { public string[] GetAnimalsThatStartWithMe()
                           { 
                               return new [] { "Baboon", "Banshee" }; 
                           } 
                         }
/* ...Left to user to define... */
public class Z : ILetter { public string[] GetAnimalsThatStartWithMe()
                           { 
                               return new [] { "Zebra" };
                           }
                         }

public static class LetterUtility
{
    public static string[] GetAnimalsThatStartWithLetter(char letter)
    {
        if(letter == 'A') return (new A()).GetAnimalsThatStartWithMe();
        if(letter == 'B') return (new B()).GetAnimalsThatStartWithMe();
        /* ... */
        if(letter == 'Z') return (new Z()).GetAnimalsThatStartWithMe();
        throw new ApplicationException("Letter " + letter + " not found");
    }
}

Should LetterUtility be responsible for caching? Should each individual instance of ILetter? Is there something else entirely that can be done?

I'm trying to keep this example short, so these example functions don't need caching. But consider I add this class that makes (new C()).GetAnimalsThatStartWithMe() take 10 seconds every time it's run:

public class C : ILetter
{
    public string[] GetAnimalsThatStartWithMe()
    {
        Thread.Sleep(10000);
        return new [] { "Cat", "Capybara", "Clam" };
    }
}

I find myself battling between making our software as fast as possible and maintaining less code (in this example: caching the result in LetterUtility) and doing the exact same work over and over (in this example: waiting 10 seconds every time C is used).

回答1:

Which layer is best responsible for caching of the results of these user-definable functions?

The answer is pretty obvious: the layer that can correctly implement the desired cache policy is the right layer.

A correct cache policy needs to have two characteristics:

It must never serve up stale data; it must know whether the method being cached is going to produce a different result, and invalidate the cache at some point before the caller would get stale data
It must manage cached resources efficiently on the user's behalf. A cache without an expiration policy that grows without bounds has another name: we usually call them "memory leaks".

What's the layer in your system that knows the answers to the questions "is the cache stale?" and "is the cache too big?" That's the layer that should implement the cache.

回答2:

Something like caching can be considered a "cross-cutting" concern (http://en.wikipedia.org/wiki/Cross-cutting_concern):

In computer science, cross-cutting concerns are aspects of a program which affect other concerns. These concerns often cannot be cleanly decomposed from the rest of the system in both the design and implementation, and can result in either scattering (code duplication), tangling (significant dependencies between systems), or both. For instance, if writing an application for handling medical records, the bookkeeping and indexing of such records is a core concern, while logging a history of changes to the record database or user database, or an authentication system, would be cross-cutting concerns since they touch more parts of the program.

Cross cutting concerns can often be implemented via Aspect Oriented Programming (http://en.wikipedia.org/wiki/Aspect-oriented_programming).

In computing, aspect-oriented programming (AOP) is a programming paradigm which aims to increase modularity by allowing the separation of cross-cutting concerns. AOP forms a basis for aspect-oriented software development.

There are many tools in .NET to facilitate Aspect Oriented Programming. I'm most fond of those that provide completely transparent implementation. In the example of caching:

public class Foo
{
    [Cache(10)] // cache for 10 minutes
    public virtual void Bar() { ... }
}

That's all you need to do...everything else happens automatically by defining a behavior like so:

public class CachingBehavior
{
   public void Intercept(IInvocation invocation) { ... } 
   // this method intercepts any method invocations on methods attributed with the [Cache] attribute. 
  // In the case of caching, this method would check if some cache store contains the data, and if it does return it...else perform the normal method operation and store the result
}

There are two general schools for how this happens:

Post build IL weaving. Tools like PostSharp, Microsoft CCI, and Mono Cecil can be configured to automatically rewrite these attributed methods to automatically delegate to your behaviors.
Runtime proxies. Tools like Castle DynamicProxy and Microsoft Unity can automatically generate proxy types (a type derived from Foo that overrides Bar in the example above) that delegates to your behavior.

回答3:

Although I do not know C#, this seems like a case for using AOP (Aspect-Oriented Programming). The idea is that you can 'inject' code to be executed at certain points in the execution stack.

You can add the caching code as follows:

IF( InCache( object, method, method_arguments ) )
  RETURN Cache(object, method, method_arguments);
ELSE
  ExecuteMethod(); StoreResultsInCache();

You then define that this code should be executed before every call of your interface functions (and all subclasses implementing these functions as well).

Can some .NET expert enlighten us how you would do this in .NET ?

回答4:

In general, caching and memoisation makes sense when:

Obtaining the result is (or at least can be) high-latency or otherwise expensive than the expense caused by caching itself.
The results have a look-up pattern where there will be frequent calls with the same inputs to the function (that is, not just the arguments but any instance, static and other data that affects the result).
There isn't an already existing caching mechanism within the code the code in question calls into that makes this unnecessary.
There won't be another caching mechanism within the code that calls the code in question that makes this unnecessary (why it almost never makes sense to memoise GetHashCode() within that method, despite people often being tempted to when the implementation is relatively expensive).
Is impossible to become stale, unlikely to become stale while the cache is loaded, unimportant if it becomes stale, or where staleness is easy to detect.

There are cases where every use-case for a component will match all of these. There are many more where they will not. For example, if a component caches results but is never called twice with the same inputs by a particular client component, then that caching is just a waste that has had a negative impact upon performance (maybe negligible, maybe severe).

More often it makes much more sense for the client code to decide upon the caching policy that would suit it. It will also often be easier to tweak for a particular use at this point in the face of real-world data than in the component (since the real-world data it'll face could vary considerably from use to use).

It's even harder to know what degree of staleness could be acceptable. Generally, a component has to assume that 100% freshness is required from it, while the client component can know that a certain amount of staleness will be fine.

On the other hand, it can be easier for a component to obtain information that is of use to the cache. Components can work hand-in-hand in these cases, though it is much more involved (an example would be the If-Modified-Since mechanism used by RESTful webservices, where a server can indicate that a client can safely use information it has cached).

Also, a component can have a configurable caching policy. Connection pooling is a caching policy of sorts, consider how that's configurable.

So in summary:

The component that can work out what caching is both possible and useful.

Which is most often the client code. Though having details of likely latency and staleness documented by the component's authors will help here.

Can less often be the client code with help from the component, though you have to expose details of the caching to allow that.

And can sometimes be the component with the caching policy configurable by the calling code.

Can only rarely be the component, because it's rarer for all possible use-cases to be served well by the same caching policy. One important exception is where the same instance of that component will serve multiple clients, because then the factors that affect the above are spread over those multiple clients.

回答5:

All of the previous posts brought up some good points, here is a very rough outline of a way you might do it. I wrote this up on the fly so it might need some tweaking:

interface IMemoizer<T, R>
{
   bool IsValid(T args); //Is the cache valid, or stale, etc. 
   bool TryLookup(T args, out R result);    
   void StoreResult(T args, R result); 
}

static IMemoizerExtensions
{
   Func<T, R> Memoizing<T, R>(this IMemoizer src, Func<T, R> method)
   {
      return new Func<T, R>(args =>
      {
         R result;

         if (src.TryLookup(args, result) && src.IsValid(args))
         {
            return result; 
         }
         else
         {
            result = method.Invoke(args); 
            memoizer.StoreResult(args, result); 
            return result; 
         }
      }); 
   }   
}

来源：https://stackoverflow.com/questions/8328895/whose-responsibility-is-it-to-cache-memoize-function-results

标签

caching

responsibility