futher optimisation
We saw than if B = 1 << b, then
- A * B == A << b
- A / B == A >> b
- A % B == A & (B - 1) == A & ((1U << b) - 1)
But there are interesting property if we have 2 power of 2.
- B1 * B2 == (1 << b1) * (1 << b2) == 1 << (b1 + b2)
- B1 / B2
- if B1 >= B2, (1 << b1) / (1 << b2) == 1 << (b1 - b2)
- if B1 < B2, 0
- A / (B1 / B2) == A / (1 << (b1 - b2)) == A >> (b1 - b2) because (B1 / B2) can't be null in C
- A * (B1 / B2)
- if b1 - b2 >= 0, A * (1 << (b1 - b2)) == A << (b1 - b2)
- if b1 - b2 < 0, 0
This means macro is not enough, but compiler isn't often clever to detect this. To have efficient code, better feed compiler with precomputed stuff.
int divu3(uint a, uint b) { return a / ((1U<<b) / 4); }int divu300(uint a, uint b) { return a / (1<<(b-2)); }
divu3: stmfd sp!, {r3, lr} mov r3, #1 mov r1, r3, asl r1 mov r1, r1, lsr #2 bl {aeabi_uidiv ldmfd sp!, {r3, pc} divu300: sub r1, r1, #2 mov r0, r0, lsr r1 mov pc, lr
PS : arm compiler is not able to optimize A / B and A * B …