Theory
Suppose we have 2 unsigned integer A and B. B is a power of 2 ie there is a number b such as B = 1U << b
It can be checked with the following code
static inline ispowerof2(int b) { return b & !(b & (b - 1)); }
b can be computed by ilog2 function
int ilog2(unsigned int v) { int i = -1; while(v) { i++; v >>= 1; } return v; }
or from http://en.wikipedia.org/wiki/Binary logarithm#Integer
/ * Returns the floor form of binary logarithm for a 32 bit integer. * −1 is returned if ''n'' is 0. */ int floorLog2(unsigned int n) { if (n == 0) return -1;int pos = 0; if (n >= 1<<16) { n >>= 16; pos += 16; } if (n >= 1<< 8) { n >>= 8; pos += 8; } if (n >= 1<< 4) { n >>= 4; pos += 4; } if (n >= 1<< 2) { n >>= 2; pos += 2; } if (n >= 1<< 1) { pos += 1; } return pos; }
In this case we have :
- A * B == A << b
- A / B == A >> b
- A % B == A & (B - 1) == A & ((1U << b) - 1)
Usage in C
Either we can define your own function for * / % or we can use compiler optimisation. If we replace B by 1 << b (with a macro), then
int divu(uint a, uint b) { return a / (1U << b); }int modu(uint a, uint b) { return a % (1U << b); }
int mulu(uint a, uint b) { return a * (1U << b); }
will give you in arm assembly (arm-linux-gnueabi-gcc -Os -S )
divu: mov r0, r0, lsr r1 mov pc, lr modu: mvn r3, #0 bic r0, r0, r3, asl r1 mov pc, lr mulu: mov r0, r0, asl r1 mov pc, lr
That's pretty fast without ugly operator redefinition !!!
But add and sub are slower [1], because we need to compute the shift.
Note a gcc missoptimisation (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48812)
int divu3(uint a, uint b) { return a / ((1U<<b) / 4); }
divu3: stmfd sp!, {r3, lr} mov r3, #1 mov r1, r3, asl r1 mov r1, r1, lsr #2 bl {aeabi_uidiv ldmfd sp!, {r3, pc}
This is unfortunate, the compiler could have generated :
divu3: mov r3, #1 mov r1, r3, asl r1 mov r1, r1, lsr #2 mov r0, r0, lsr r1
[1]
int addu(uint a, uint b) { return a + (1U << b); }
addu: mov r3, #1 add r0, r0, r3, asl r1 mov pc, lr
FATFS
A tiny fatfs implementation can be found at http://elm-chan.org/fsw/ff/00index_e.html and petit implementation.
After looking at the code, it seems there is a problem in the code : it assumes fat sector size should match hardware sector size.
This is not true, fat sector size only need to be >= hardware sector size
Also the power of 2 trick can be used.
Note the guy got interresting SPI mmc stuff http://elm-chan.org/docs/mmc/mmc_e.html
futher optimisation
We saw than if B = 1 << b, then
- A * B == A << b
- A / B == A >> b
- A % B == A & (B - 1) == A & ((1U << b) - 1)
But there are interesting property if we have 2 power of 2.
- B1 * B2 == (1 << b1) * (1 << b2) == 1 << (b1 + b2)
- B1 / B2
- if B1 >= B2, (1 << b1) / (1 << b2) == 1 << (b1 - b2)
- if B1 < B2, 0
- A / (B1 / B2) == A / (1 << (b1 - b2)) == A >> (b1 - b2) because (B1 / B2) can't be null in C
- A * (B1 / B2)
- if b1 - b2 >= 0, A * (1 << (b1 - b2)) == A << (b1 - b2)
- if b1 - b2 < 0, 0
This means macro is not enough, but compiler isn't often clever to detect this. To have efficient code, better feed compiler with precomputed stuff.
int divu3(uint a, uint b) { return a / ((1U<<b) / 4); }int divu300(uint a, uint b) { return a / (1<<(b-2)); }
divu3: stmfd sp!, {r3, lr} mov r3, #1 mov r1, r3, asl r1 mov r1, r1, lsr #2 bl {aeabi_uidiv ldmfd sp!, {r3, pc} divu300: sub r1, r1, #2 mov r0, r0, lsr r1 mov pc, lr
PS : arm compiler is not able to optimize A / B and A * B …