问题
NOTE This is a theoretical question. I'm happy with the performance of my actual code as it is. I'm just curious about whether there is an alternative.
Is there a trick to do an integer division of a constant value, which is itself an integer power of two, by an integer variable value, without having to use do an actual divide operation?
// The fixed value of the numerator
#define SIGNAL_PULSE_COUNT 0x4000UL
// The division that could use a neat trick.
uint32_t signalToReferenceRatio(uint32_t referenceCount)
{
// Promote the numerator to a 64 bit value, shift it left by 32 so
// the result has an adequate number of bits of precision, and divide
// by the numerator.
return (uint32_t)((((uint64_t)SIGNAL_PULSE_COUNT) << 32) / referenceCount);
}
I've found several (lots) of references for tricks to do division by a constant, both integer and floating point. For example, the question What's the fastest way to divide an integer by 3? has a number of good answers including references to other academic and community materials.
Given that the numerator is constant, and it's an integer power of two, is there a neat trick that could be used in place of doing an actual 64 bit division; some kind of bit-wise operation (shifts, AND, XOR, that kind of stuff) or similar?
I don't want any loss of precision (beyond a possible half bit due to integer rounding) greater than that of doing the actual division, as the precision of the instrument relies on the precision of this measurement.
"Let the compiler decide" is not an answer, because I want to know if there is a trick.
Extra, Contextual Information
I'm developing a driver on a 16 bit data, 24 bit instruction word micro-controller. The driver does some magic with the peripheral modules to obtain a pulse count of a reference frequency for a fixed number of pulses of a signal frequency. The required result is a ratio of the signal pulses to the reference pulse, expressed as an unsigned 32 bit value. The arithmetic for the function is defined by the manufacturer of the device for which I'm developing the driver, and the result is processed further to obtain a floating point real-world value, but that's outside the scope of this question.
The micro-controller I'm using has a Digital Signal Processor that has a number of division operations that I could use, and I'm not afraid to do so if necessary. There would be some minor challenges to overcome with this approach, beyond the putting together the assembly instructions to make it work, such as the DSP being used to do a PID function in a BLDC driver ISR, but nothing I can't manage.
回答1:
You cannot use clever mathematical tricks to not do a division, but you can of course still use programming tricks if you know the range of your reference count:
- Nothing beats a pre-computed lookup table in terms of speed.
- There are fast approximate square root algorithms (probably already in your DSP), and you can improve the approximation by one or two Newton-Raphson iterations. If doing the computation with floating-point numbers is accurate enough for you, you can probably beat a 64bit integer division in terms of speed (but not in clarity of code).
You mentioned that the result will be converted to floating-point later, it might be beneficial to not compute the integer division at all, but use your floating point hardware.
回答2:
I worked out a Matlab version, using fixed point arithmetic.
This method assumes that a integer version of log2(x)
can be calculated efficiently, which is true for dsPIC30/33F and TI C6000 that have instruction to detect the most significant 1 of an integer.
For this reason, this code has strong ISA depency and can not be written in portable/standard C and can be improved using instructions like multiply-and-add, multiply-and-shift, so I won't try translating it to C.
nrdiv.m
function [ y ] = nrdiv( q, x, lut)
% assume q>31, lut = 2^31/[1,1,2,...255]
p2 = ceil(log2(x)); % available in TI C6000, instruction LMBD
% available in Microchip dsPIC30F/33F, instruction FF1L
if p2<8
pre_shift=0;
else
pre_shift=p2-8;
end % shr = (p2-8)>0?(p2-8):0;
xn = shr(x, pre_shift); % xn = x>>pre_shift;
y = shr(lut(xn), pre_shift); % y = lut[xn]>pre_shift;
y = shr(y * (2^32 - y*x), 30); % basic iteration
% step up from q31 to q32
y = shr(y * (2^33 - y*x), (64-q)); % step up from q32 to desired q
if q>39
y = shr(y * (2^(1+q) - y*x), (q)); % when q>40, additional
% iteration is required,
end % no step up is performed
end
function y = shr(x, r)
y=floor(x./2^r); % simulate operator >>
end
test.m
test_number = (2^22-12345);
test_q = 48;
lut_q31 = round(2^31 ./ [1,[1:1:255]]);
display(sprintf('tested 2^%d/%d, diff=%f\n',test_q, test_number,...
nrdiv( 39, (2^22-5), lut_q31) - 2^39/(2^22-5)));
sample output
tested 2^48/4181959, diff=-0.156250
reference:
Newton–Raphson division
回答3:
A little late but here is my solution.
First some assumptions:
Problem:
X=N/D where N is a constant ans a power of 2.
All 32 bit unsigned integers.
X is unknown but we have a good estimate (previous but no longer accurate solution).
An exact solution is not required.
Note: due to integer truncation this is not an accurate algorithm!
An iterative solution is okay (improves with each loop).
Division is much more expensive than multiplication:
For 32bit unsigned integer for Arduino UNO:
'+/-' ~0.75us
'*' ~3.5us
'/' ~36us 4 We seek to replace the Basically lets start with Newton's method:
Xnew=Xold-f(x)/(f`(x)
where f(x)=0 for the solution we seek.
Solving this I get:
Xnew=XNew*(C-X*D)/N
where C=2*N
First trick:
Now that the Numerator (constant) is now a Divisor (constant) then one solution here (which does not require the N to be a power of 2) is:
Xnew=XNew*(C-X*D)*A>>M
where C=2*N, A and M are constants (look for dividing by a constant tricks).
or (staying with Newtons method):
Xnew=XNew*(C-X*D)>>M
where C=2>>M where M is the power.
So I have 2 '*' (7.0us), a '-' (0.75us) and a '>>' (0.75us?) or 8.5us total (rather than 36us), excluding other overheads.
Limitations:
As the data type is 32 bit unsigned, 'M' should not exceed 15 else there will be problems with overflow (you can probably get around this using a 64bit intermediate data type).
N>D (else the algorithm blows up! at least with unsigned integer)
Obviously the algorithm will work with signed and float data types)
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
int main(void)
{
unsigned long c,d,m,x;
// x=n/d where n=1<<m
m=15;
c=2<<m;
d=10;
x=10;
while (true)
{
x=x*(c-d*x)>>m;
printf("%ld",x);
getchar();
}
return(0);
}
回答4:
Having tried many alternatives, I ended up doing normal binary long division in assembly language. However, the routine does use a few optimisations that bring the execution time down to an acceptable level.
/*
* Converts the reference frequency count for a specific signal frequency
* to a ratio.
* Xs = Ns * 2^32 / Nr
* Where:
* 2^32 is a constant scaling so that the maximum accuracy can be achieved.
* Ns is the number of signal counts (fixed at 0x4000 by hardware).
* Nr is the number of reference counts, passed in W1:W0.
* @param W1:W0 The number of reference frequency pulses.
* @return W1:W0 The scaled ratio.
*/
.align 2
.global _signalToReferenceRatio
.type _signalToReferenceRatio, @function
; This is the position of the most significant bit of the fixed Ns (0x4000).
.equ LOG2_DIVIDEND, 14
.equ DIVISOR_LIMIT, LOG2_DIVIDEND+1
.equ WORD_SIZE, 16
_signalToReferenceRatio:
; Create a dividend, MSB-aligned with the divisor, in W2:W3 and place the
; number of iterations required for the MSW in [W14] and the LSW in [W14+2].
LNK #4
MUL.UU W2, #0, W2
FF1L W1, W4
; If MSW is zero the argument is out of range.
BRA C, .returnZero
SUBR W4, #WORD_SIZE, W4
; Find the number of quotient MSW loops.
; This is effectively 1 + log2(dividend) - log2(divisor).
SUBR W4, #DIVISOR_LIMIT, [W14]
BRA NC, .returnZero
; Since the SUBR above is always non-negative and the C flag set, use this
; to set bit W3<W5> and the dividend in W2:W3 = 2^(16+W5) = 2^log2(divisor).
BSW.C W3, W4
; Use 16 quotient LSW loops.
MOV #WORD_SIZE, W4
MOV W4, [W14+2]
; Set up W4:W5 to hold the divisor and W0:W1 to hold the result.
MOV.D W0, W4
MUL.UU W0, #0, W0
.checkLoopCount:
; While the bit count is non-negative ...
DEC [W14], [W14]
BRA NC, .nextWord
.alignQuotient:
; Shift the current quotient word up by one bit.
SL W0, W0
; Subtract divisor from the current dividend part.
SUB W2, W4, W6
SUBB W3, W5, W7
; Check if the dividend part was less than the divisor.
BRA NC, .didNotDivide
; It did divide, so set the LSB of the quotient.
BSET W0, #0
; Shift the remainder up by one bit, with the next zero in the LSB.
SL W7, W3
BTSC W6, #15
BSET W3, #0
SL W6, W2
BRA .checkLoopCount
.didNotDivide:
; Shift the next (zero) bit of the dividend into the LSB of the remainder.
SL W3, W3
BTSC W2, #15
BSET W3, #0
SL W2, W2
BRA .checkLoopCount
.nextWord:
; Test if there are any LSW bits left to calculate.
MOV [++W14], W6
SUB W6, #WORD_SIZE, [W14--]
BRA NC, .returnQ
; Decrement the remaining bit counter before writing it back.
DEC W6, [W14]
; Move the working part of the quotient up into the MSW of the result.
MOV W0, W1
BRA .alignQuotient
.returnQ:
; Return the quotient in W0:W1.
ULNK
RETURN
.returnZero:
MUL.UU W0, #0, W0
ULNK
RETURN
.size _signalToReferenceRatio, .-_signalToReferenceRatio
来源:https://stackoverflow.com/questions/35063224/trick-to-divide-a-constant-power-of-two-by-an-integer