问题
Good day,
I have some very slooooow and complicated function, say f[x,y]
. And I need to construct detailed ContourPlot
of it. Moreover the function f[x,y]
sometimes fails due to lack of physical memory. In such cases I have to stop evaluation and investigate the problem case of the point {x,y} by myself. Then I should can add the element {x,y,f[x,y]} to a list of computed values of f[x,y]
(say "cache") and restart evaluation of ContourPlot
. ContourPlot
must take all already computed values of f
from the cache. I would prefer to store such list in some file for having ability to reuse it later. And it is probably simpler to add problematic points to this file by hand.
What is the fastest way to implement this if the list of computed values of f
may contain 10000-50000 points?
回答1:
Let's assume our slow function has the signature f[x, y]
.
Pure In-memory Approach
If you are satisfied with an in-memory cache, the simplest thing to do would be to use memoization:
Clear@fmem
fmem[x_, y_] := fmem[x, y] = f[x, y]
This adds a definition to itself every time it is called with a combination of arguments that it has not seen before.
File-backed In-memory Approach
However, if you are running out of memory or suffering kernel crashes during the long computation, you will want to back this cache with some kind of persistence. The simplest thing would be to keep a running log file:
$runningLogFile = "/some/directory/runningLog.txt";
Clear@flog
flog[x_, y_] := flog[x, y] = f[x, y] /.
v_ :> (PutAppend[Unevaluated[flog[x, y] = v;], $runningLogFile]; v)
If[FileExistsQ[$runningLogFile]
, Get[$runningLogFile]
, Export[$runningLogFile, "", "Text"];
]
flog
is the same as fmem
, except that it also writes an entry into the running log that can be used to restore the cached definition in a later session. The last expression reloads those definitions when it finds an existing log file (or creates the file if it does not exist).
The textual nature of the log file is convenient when manual intervention is required. Be aware that the textual representation of floating-point numbers introduces unavoidable round-off errors, so you may get slightly different results after reloading the values from the log file. If this is of great concern, you might consider using the binary DumpSave
feature although I will leave the details of that approach to the reader as it is not quite as convenient for keeping an incremental log.
SQL Approach
If memory is really tight, and you want to avoid having a large in-memory cache to make room for the other computations, the previous strategy might not be appropriate. In that case, you might consider using Mathematica's built-in SQL database to store the cache completely externally:
fsql[x_, y_] :=
loadCachedValue[x, y] /. $Failed :> saveCachedValue[x, y, f[x, y]]
I define loadCachedValue
and saveCachedValue
below. The basic idea is to create an SQL table where each row holds an x
, y
, f
triple. The SQL table is queried every time a value is needed. Note that this approach is substantially slower than the in-memory cache, so it makes the most sense when the computation of f
takes much longer than the SQL access time. The SQL approach does not suffer from the round-off errors that afflicted the text log file approach.
The definitions of loadCachedValue
and saveCachedValue
now follow, along with some other useful helper functions:
Needs["DatabaseLink`"]
$cacheFile = "/some/directory/cache.hsqldb";
openCacheConnection[] :=
$cache = OpenSQLConnection[JDBC["HSQL(Standalone)", $cacheFile]]
closeCacheConnection[] :=
CloseSQLConnection[$cache]
createCache[] :=
SQLExecute[$cache,
"CREATE TABLE cached_values (x float, y float, f float)
ALTER TABLE cached_values ADD CONSTRAINT pk_cached_values PRIMARY KEY (x, y)"
]
saveCachedValue[x_, y_, value_] :=
( SQLExecute[$cache,
"INSERT INTO cached_values (x, y, f) VALUES (?, ?, ?)", {x, y, value}
]
; value
)
loadCachedValue[x_, y_] :=
SQLExecute[$cache,
"SELECT f FROM cached_values WHERE x = ? AND y = ?", {x, y}
] /. {{{v_}} :> v, {} :> $Failed}
replaceCachedValue[x_, y_, value_] :=
SQLExecute[$cache,
"UPDATE cached_values SET f = ? WHERE x = ? AND y = ?", {value, x, y}
]
clearCache[] :=
SQLExecute[$cache,
"DELETE FROM cached_values"
]
showCache[minX_, maxX_, minY_, maxY_] :=
SQLExecute[$cache,
"SELECT *
FROM cached_values
WHERE x BETWEEN ? AND ?
AND y BETWEEN ? AND ?
ORDER BY x, y"
, {minX, maxX, minY, maxY}
, "ShowColumnHeadings" -> True
] // TableForm
This SQL code uses floating point values as primary keys. This is normally a questionable practice in SQL but works fine in the present context.
You must call openCacheConnection[]
before attempting to use any of these functions. You should call closeCacheConnection[]
after you have finished. One time only, you must call createCache[]
to initialize the SQL database. replaceCachedValue
, clearCache
and showCache
are provided for manual interventions.
回答2:
The simplest and possibly most efficient way to do this is just to set up the cached values as special case definitions for your function. The lookup is fairly fast due to hashing.
A function:
In[1]:= f[x_, y_] := Cos[x] + Cos[y]
Which points are used during a ContourPlot?
In[2]:= points = Last[
Last[Reap[
ContourPlot[f[x, y], {x, 0, 4 Pi}, {y, 0, 4 Pi},
EvaluationMonitor :> Sow[{x, y}]]]]];
In[3]:= Length[points]
Out[3]= 10417
Set up a version of f with precomputed values for 10000 of the evaluations:
In[4]:= Do[With[{x = First[p], y = Last[p]}, precomputedf[x, y] = f[x, y];], {p,
Take[points, 10000]}];
In the above, you would use something like precomputedf[x, y] = z
instead of precomputed[x, y] = f[x, y]
, where z is your precomputed value that you have stored in your external file.
Here is the "else" case which just evaluates f:
In[5]:= precomputedf[x_, y_] := f[x, y]
Compare timings:
In[6]:= ContourPlot[f[x, y], {x, 0, 4 Pi}, {y, 0, 4 Pi}]; // Timing
Out[6]= {0.453539, Null}
In[7]:= ContourPlot[precomputedf[x, y], {x, 0, 4 Pi}, {y, 0, 4 Pi}]; // Timing
Out[7]= {0.440996, Null}
Not much difference in timing because in this example f is not an expensive function.
A separate remark for your particular application: Perhaps you could use ListContourPlot instead. Then you can choose exactly which points are evaluated.
来源:https://stackoverflow.com/questions/5287817/the-best-way-to-construct-a-function-with-memory