your f(x) is just ᴨ (PI cumulative multiplication) squared
- it is hard to write it in here so i will deifine g(x0,x1) instead
g(x0,x1)=x0*(x0+1)*(x0+2)*...*x1
- so:
f(x)=g(1,x)^2
computing h(x,r,c)=f(x)/(f(x-r)*f(r))%c
- when you rewrite it to g() you get:
h(x,r,c)=((g(1,x)/(g(1,x-r)*g(1,r)))^2)%c
- now simplify (and lower magnitude) as much as you can
- so compute
sqr
as last (no need to have in sub-results)
- get rid of duplicate therms
- there are two options get rid of
g(1,x-r)
or g(1,r)
- choose what is better for you I would chose whichever is bigger
- so if
(x-r>r)
then:
h(x,r,c)=(g(x-r+1,x)/g(1,r)^2)%c
- else:
h(x,r,c)=(g(r+1,x)/g(1,x-r)^2)%c
some math tweaks
- you should compute both therms a,b (from (a/b)%c) parallel
- when both can be divided by any of first few primes then divide them booth to keep the magnitude low
- something like::
- `if ((a&1==0)&&(b&1==0)) { a>>=1; b>>=1; } // dividing by prime = 2 ... this is usually enough
- but if your magnitudes are very big then may be you should add also some like 3,5,7,...
- but always measure the performance drop and stop if it drops too much
if x
is big and r
is small
- then compute
b
first
- and while computing
a
- in addition to primes check also if the sub-result is divisible also by
b
- if it is then divide
a
by b
and add the result to the final division result
some speed boost
- you can compute partial
a,b
- and start testing the divisibility only if booth have magnitudes bigger then some treshold
- also you can make the computation waiting for each other
- so if
a>1000000
then wait until the b
is also so big or whole
- and do the same also in revers (if
b>100000
....)
- the bigger treshold the better speed but you are limited by your integer implementation
- if using bigints then you should use treshold smaller then halve the bits of the base number ...
Hope I did not make some silly math mistake...