I am developing some engineering simulations. This involves implementing some long equations such as this equation to calculate stress in a rubber like material:
pow(x, 0.1e1/0.3e1)
is the same as cbrt(x)
.l1
, l2
, and l3
are positive real numbers and if a
is a non-zero real number. (We have yet to hear from the OP regarding the specific nature of these coefficients. Given the nature of the problem, these are reasonable assumptions.)I use Maple to generate the C++ code to avoid mistakes.
Maple and Mathematica sometimes miss the obvious. Even more importantly, the users of Maple and Mathematica sometimes make mistakes. Substituting "oftentimes", or maybe even "almost always", in lieu of "sometimes is probably closer to the mark.
You could have helped Maple simplify that expression by telling it about the parameters in question. In the example at hand, I suspect that l1
, l2
, and l3
are positive real numbers and that a
is a non-zero real number. If that's the case, tell it that. Those symbolic math programs typically assume the quantities at hand are complex. Restricting the domain lets the program make assumptions that are not valid in the complex numbers.
Symbolic math programs typically provide the ability to provide information about the various parameters. Use that ability, particularly if your problem involves division or exponentiation. In the example at hand, you could have helped Maple simplify that expression by telling it that l1
, l2
, and l3
are positive real numbers and that a
is a non-zero real number. If that's the case, tell it that. Those symbolic math programs typically assume the quantities at hand are complex. Restricting the domain lets the program make assumptions such as axbx=(ab)x. This is only if a
and b
are positive real numbers and if x
is real. It is not valid in the complex numbers.
Ultimately, those symbolic math programs follow algorithms. Help it along. Try playing with expanding, collecting, and simplifying before you generate code. In this case, you could have collected those terms involving a factor of mu
and those involving a factor of K
. Reducing an expression to its "simplest form" remains a bit of an art.
When you get an ugly mess of generated code, don't accept it as a truth that you must not touch. Try to simplify it yourself. Look at what the symbolic math program had before it generated code. Look at how I reduced your expression to something much simpler and much faster, and how Walter's answer took mine several steps further. There is no magic recipe. If there was a magical recipe, Maple would have applied it and given the answer that Walter gave.
You are doing a lot of addition and subtraction in that calculation. You can get in deep trouble if you have terms that nearly cancel one another. You are wasting a lot of CPU if you have one term that dominates over the others.
Next, you are wasting a lot of CPU by performing repeated calculations. Unless you have enabled -ffast-math
, which lets the compiler break some of the rules of IEEE floating point, the compiler will not (in fact, must not) simplify that expression for you. It will instead do exactly what you told it to do. At a minimum, you should calculate l1 * l2 * l3
prior to computing that mess.
Finally, you are making a lot of calls to pow
, which is extremely slow. Note that several of those calls are of the form (l1*l2*l3)(1/3). Many of those calls to pow
could be performed with a single call to std::cbrt
:
l123 = l1 * l2 * l3;
l123_pow_1_3 = std::cbrt(l123);
l123_pow_4_3 = l123 * l123_pow_1_3;
With this,
X * pow(l1 * l2 * l3, 0.1e1 / 0.3e1)
becomes X * l123_pow_1_3
.X * pow(l1 * l2 * l3, -0.1e1 / 0.3e1)
becomes X / l123_pow_1_3
.X * pow(l1 * l2 * l3, 0.4e1 / 0.3e1)
becomes X * l123_pow_4_3
.X * pow(l1 * l2 * l3, -0.4e1 / 0.3e1)
becomes X / l123_pow_4_3
.Maple did miss the obvious.
For example, there's a much easier way to write
(pow(l1 * l2 * l3, -0.1e1 / 0.3e1) - l1 * l2 * l3 * pow(l1 * l2 * l3, -0.4e1 / 0.3e1) / 0.3e1)
Assuming that l1
, l2
, and l3
are real rather than complex numbers, and that the real cube root (rather than the principle complex root) are to be extracted, the above reduces to
2.0/(3.0 * pow(l1 * l2 * l3, 1.0/3.0))
or
2.0/(3.0 * l123_pow_1_3)
Using cbrt_l123
instead of l123_pow_1_3
, the nasty expression in the question reduces to
l123 = l1 * l2 * l3;
cbrt_l123 = cbrt(l123);
T =
mu/(3.0*l123)*( pow(l1/cbrt_l123,a)*(2.0*N1-N2-N3)
+ pow(l2/cbrt_l123,a)*(2.0*N2-N3-N1)
+ pow(l3/cbrt_l123,a)*(2.0*N3-N1-N2))
+K*(l123-1.0)*(N1+N2+N3);
Always double check, but always simplify as well.
Here are some of my steps in arriving at the above:
// Step 0: Trim all whitespace.
T=(mu*(pow(l1*pow(l1*l2*l3,-0.1e1/0.3e1),a)*a*(pow(l1*l2*l3,-0.1e1/0.3e1)-l1*l2*l3*pow(l1*l2*l3,-0.4e1/0.3e1)/0.3e1)*pow(l1*l2*l3,0.1e1/0.3e1)/l1-pow(l2*pow(l1*l2*l3,-0.1e1/0.3e1),a)*a/l1/0.3e1-pow(l3*pow(l1*l2*l3,-0.1e1/0.3e1),a)*a/l1/0.3e1)/a+K*(l1*l2*l3-0.1e1)*l2*l3)*N1/l2/l3+(mu*(-pow(l1*pow(l1*l2*l3,-0.1e1/0.3e1),a)*a/l2/0.3e1+pow(l2*pow(l1*l2*l3,-0.1e1/0.3e1),a)*a*(pow(l1*l2*l3,-0.1e1/0.3e1)-l1*l2*l3*pow(l1*l2*l3,-0.4e1/0.3e1)/0.3e1)*pow(l1*l2*l3,0.1e1/0.3e1)/l2-pow(l3*pow(l1*l2*l3,-0.1e1/0.3e1),a)*a/l2/0.3e1)/a+K*(l1*l2*l3-0.1e1)*l1*l3)*N2/l1/l3+(mu*(-pow(l1*pow(l1*l2*l3,-0.1e1/0.3e1),a)*a/l3/0.3e1-pow(l2*pow(l1*l2*l3,-0.1e1/0.3e1),a)*a/l3/0.3e1+pow(l3*pow(l1*l2*l3,-0.1e1/0.3e1),a)*a*(pow(l1*l2*l3,-0.1e1/0.3e1)-l1*l2*l3*pow(l1*l2*l3,-0.4e1/0.3e1)/0.3e1)*pow(l1*l2*l3,0.1e1/0.3e1)/l3)/a+K*(l1*l2*l3-0.1e1)*l1*l2)*N3/l1/l2;
// Step 1:
// l1*l2*l3 -> l123
// 0.1e1 -> 1.0
// 0.4e1 -> 4.0
// 0.3e1 -> 3
l123 = l1 * l2 * l3;
T=(mu*(pow(l1*pow(l123,-1.0/3),a)*a*(pow(l123,-1.0/3)-l123*pow(l123,-4.0/3)/3)*pow(l123,1.0/3)/l1-pow(l2*pow(l123,-1.0/3),a)*a/l1/3-pow(l3*pow(l123,-1.0/3),a)*a/l1/3)/a+K*(l123-1.0)*l2*l3)*N1/l2/l3+(mu*(-pow(l1*pow(l123,-1.0/3),a)*a/l2/3+pow(l2*pow(l123,-1.0/3),a)*a*(pow(l123,-1.0/3)-l123*pow(l123,-4.0/3)/3)*pow(l123,1.0/3)/l2-pow(l3*pow(l123,-1.0/3),a)*a/l2/3)/a+K*(l123-1.0)*l1*l3)*N2/l1/l3+(mu*(-pow(l1*pow(l123,-1.0/3),a)*a/l3/3-pow(l2*pow(l123,-1.0/3),a)*a/l3/3+pow(l3*pow(l123,-1.0/3),a)*a*(pow(l123,-1.0/3)-l123*pow(l123,-4.0/3)/3)*pow(l123,1.0/3)/l3)/a+K*(l123-1.0)*l1*l2)*N3/l1/l2;
// Step 2:
// pow(l123,1.0/3) -> cbrt_l123
// l123*pow(l123,-4.0/3) -> pow(l123,-1.0/3)
// (pow(l123,-1.0/3)-pow(l123,-1.0/3)/3) -> 2.0/(3.0*cbrt_l123)
// *pow(l123,-1.0/3) -> /cbrt_l123
l123 = l1 * l2 * l3;
cbrt_l123 = cbrt(l123);
T=(mu*(pow(l1/cbrt_l123,a)*a*2.0/(3.0*cbrt_l123)*cbrt_l123/l1-pow(l2/cbrt_l123,a)*a/l1/3-pow(l3/cbrt_l123,a)*a/l1/3)/a+K*(l123-1.0)*l2*l3)*N1/l2/l3+(mu*(-pow(l1/cbrt_l123,a)*a/l2/3+pow(l2/cbrt_l123,a)*a*2.0/(3.0*cbrt_l123)*cbrt_l123/l2-pow(l3/cbrt_l123,a)*a/l2/3)/a+K*(l123-1.0)*l1*l3)*N2/l1/l3+(mu*(-pow(l1/cbrt_l123,a)*a/l3/3-pow(l2/cbrt_l123,a)*a/l3/3+pow(l3/cbrt_l123,a)*a*2.0/(3.0*cbrt_l123)*cbrt_l123/l3)/a+K*(l123-1.0)*l1*l2)*N3/l1/l2;
// Step 3:
// Whitespace is nice.
l123 = l1 * l2 * l3;
cbrt_l123 = cbrt(l123);
T =
(mu*( pow(l1/cbrt_l123,a)*a*2.0/(3.0*cbrt_l123)*cbrt_l123/l1
-pow(l2/cbrt_l123,a)*a/l1/3
-pow(l3/cbrt_l123,a)*a/l1/3)/a
+K*(l123-1.0)*l2*l3)*N1/l2/l3
+(mu*(-pow(l1/cbrt_l123,a)*a/l2/3
+pow(l2/cbrt_l123,a)*a*2.0/(3.0*cbrt_l123)*cbrt_l123/l2
-pow(l3/cbrt_l123,a)*a/l2/3)/a
+K*(l123-1.0)*l1*l3)*N2/l1/l3
+(mu*(-pow(l1/cbrt_l123,a)*a/l3/3
-pow(l2/cbrt_l123,a)*a/l3/3
+pow(l3/cbrt_l123,a)*a*2.0/(3.0*cbrt_l123)*cbrt_l123/l3)/a
+K*(l123-1.0)*l1*l2)*N3/l1/l2;
// Step 4:
// Eliminate the 'a' in (term1*a + term2*a + term3*a)/a
// Expand (mu_term + K_term)*something to mu_term*something + K_term*something
l123 = l1 * l2 * l3;
cbrt_l123 = cbrt(l123);
T =
(mu*( pow(l1/cbrt_l123,a)*2.0/(3.0*cbrt_l123)*cbrt_l123/l1
-pow(l2/cbrt_l123,a)/l1/3
-pow(l3/cbrt_l123,a)/l1/3))*N1/l2/l3
+K*(l123-1.0)*l2*l3*N1/l2/l3
+(mu*(-pow(l1/cbrt_l123,a)/l2/3
+pow(l2/cbrt_l123,a)*2.0/(3.0*cbrt_l123)*cbrt_l123/l2
-pow(l3/cbrt_l123,a)/l2/3))*N2/l1/l3
+K*(l123-1.0)*l1*l3*N2/l1/l3
+(mu*(-pow(l1/cbrt_l123,a)/l3/3
-pow(l2/cbrt_l123,a)/l3/3
+pow(l3/cbrt_l123,a)*2.0/(3.0*cbrt_l123)*cbrt_l123/l3))*N3/l1/l2
+K*(l123-1.0)*l1*l2*N3/l1/l2;
// Step 5:
// Rearrange
// Reduce l2*l3*N1/l2/l3 to N1 (and similar)
// Reduce 2.0/(3.0*cbrt_l123)*cbrt_l123/l1 to 2.0/3.0/l1 (and similar)
l123 = l1 * l2 * l3;
cbrt_l123 = cbrt(l123);
T =
(mu*( pow(l1/cbrt_l123,a)*2.0/3.0/l1
-pow(l2/cbrt_l123,a)/l1/3
-pow(l3/cbrt_l123,a)/l1/3))*N1/l2/l3
+(mu*(-pow(l1/cbrt_l123,a)/l2/3
+pow(l2/cbrt_l123,a)*2.0/3.0/l2
-pow(l3/cbrt_l123,a)/l2/3))*N2/l1/l3
+(mu*(-pow(l1/cbrt_l123,a)/l3/3
-pow(l2/cbrt_l123,a)/l3/3
+pow(l3/cbrt_l123,a)*2.0/3.0/l3))*N3/l1/l2
+K*(l123-1.0)*N1
+K*(l123-1.0)*N2
+K*(l123-1.0)*N3;
// Step 6:
// Factor out mu and K*(l123-1.0)
l123 = l1 * l2 * l3;
cbrt_l123 = cbrt(l123);
T =
mu*( ( pow(l1/cbrt_l123,a)*2.0/3.0/l1
-pow(l2/cbrt_l123,a)/l1/3
-pow(l3/cbrt_l123,a)/l1/3)*N1/l2/l3
+ (-pow(l1/cbrt_l123,a)/l2/3
+pow(l2/cbrt_l123,a)*2.0/3.0/l2
-pow(l3/cbrt_l123,a)/l2/3)*N2/l1/l3
+ (-pow(l1/cbrt_l123,a)/l3/3
-pow(l2/cbrt_l123,a)/l3/3
+pow(l3/cbrt_l123,a)*2.0/3.0/l3)*N3/l1/l2)
+K*(l123-1.0)*(N1+N2+N3);
// Step 7:
// Expand
l123 = l1 * l2 * l3;
cbrt_l123 = cbrt(l123);
T =
mu*( pow(l1/cbrt_l123,a)*2.0/3.0/l1*N1/l2/l3
-pow(l2/cbrt_l123,a)/l1/3*N1/l2/l3
-pow(l3/cbrt_l123,a)/l1/3*N1/l2/l3
-pow(l1/cbrt_l123,a)/l2/3*N2/l1/l3
+pow(l2/cbrt_l123,a)*2.0/3.0/l2*N2/l1/l3
-pow(l3/cbrt_l123,a)/l2/3*N2/l1/l3
-pow(l1/cbrt_l123,a)/l3/3*N3/l1/l2
-pow(l2/cbrt_l123,a)/l3/3*N3/l1/l2
+pow(l3/cbrt_l123,a)*2.0/3.0/l3*N3/l1/l2)
+K*(l123-1.0)*(N1+N2+N3);
// Step 8:
// Simplify.
l123 = l1 * l2 * l3;
cbrt_l123 = cbrt(l123);
T =
mu/(3.0*l123)*( pow(l1/cbrt_l123,a)*(2.0*N1-N2-N3)
+ pow(l2/cbrt_l123,a)*(2.0*N2-N3-N1)
+ pow(l3/cbrt_l123,a)*(2.0*N3-N1-N2))
+K*(l123-1.0)*(N1+N2+N3);
Note that this is stricken. It's wrong.
Update
Maple did miss the obvious. For example, there's a much easier way to write
(pow(l1 * l2 * l3, -0.1e1 / 0.3e1) - l1 * l2 * l3 * pow(l1 * l2 * l3, -0.4e1 / 0.3e1) / 0.3e1)
Assuming that l1
, l2
, and l3
are real rather than complex numbers, and that the real cube root (rather than the principle complex root) are to be extracted, the above reduces to zero. This calculation of zero is repeated many times over.
Second update
If I've done the math right (there is no guarantee that I've done the math right), the nasty expression in the question reduces to
l123 = l1 * l2 * l3;
cbrt_l123_inv = 1.0 / cbrt(l123);
nasty_expression =
K * (l123 - 1.0) * (N1 + N2 + N3)
- ( pow(l1 * cbrt_l123_inv, a) * (N2 + N3)
+ pow(l2 * cbrt_l123_inv, a) * (N1 + N3)
+ pow(l3 * cbrt_l123_inv, a) * (N1 + N2)) * mu / (3.0*l123);
The above assumes that l1
, l2
, and l3
are positive real numbers.