futher optimisation

We saw than if B = 1 << b, then

  1. A * B == A << b
  2. A / B == A >> b
  3. A % B == A & (B - 1) == A & ((1U << b) - 1)

But there are interesting property if we have 2 power of 2.

  1. B1 * B2 == (1 << b1) * (1 << b2) == 1 << (b1 + b2)
  2. B1 / B2
    1. if B1 >= B2, (1 << b1) / (1 << b2) == 1 << (b1 - b2)
    2. if B1 < B2, 0
  3. A / (B1 / B2) == A / (1 << (b1 - b2)) == A >> (b1 - b2) because (B1 / B2) can't be null in C
  4. A * (B1 / B2)
    1. if b1 - b2 >= 0, A * (1 << (b1 - b2)) == A << (b1 - b2)
    2. if b1 - b2 < 0, 0

This means macro is not enough, but compiler isn't often clever to detect this. To have efficient code, better feed compiler with precomputed stuff.

int divu3(uint a, uint b)
{
        return a / ((1U<<b) / 4);
}

int divu300(uint a, uint b) { return a / (1<<(b-2)); }

divu3:
        stmfd   sp!, {r3, lr}
        mov     r3, #1
        mov     r1, r3, asl r1
        mov     r1, r1, lsr #2
        bl      {aeabi_uidiv
        ldmfd   sp!, {r3, pc}
divu300:
        sub     r1, r1, #2
        mov     r0, r0, lsr r1
        mov     pc, lr

PS : arm compiler is not able to optimize A / B and A * B …

Comments on this page are closed.