04

←	Apr 2011	→
S	M	T	W	T	F	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30

Matthieu Castet

Theory

Suppose we have 2 unsigned integer A and B. B is a power of 2 ie there is a number b such as B = 1U << b

It can be checked with the following code

static inline ispowerof2(int b) {
	return b & !(b & (b - 1));
}

b can be computed by ilog2 function

int ilog2(unsigned int v) {
	int i = -1;
	while(v) {
		i++;
		v >>= 1;
	}
	return v;
}

or from http://en.wikipedia.org/wiki/Binary logarithm#Integer

/
 * Returns the floor form of binary logarithm for a 32 bit integer.
 * −1 is returned if ''n'' is 0.
 */
int floorLog2(unsigned int n) {
  if (n == 0)
    return -1;

int pos = 0;
  if (n >= 1<<16) { n >>= 16; pos += 16; }
  if (n >= 1<< 8) { n >>=  8; pos +=  8; }
  if (n >= 1<< 4) { n >>=  4; pos +=  4; }
  if (n >= 1<< 2) { n >>=  2; pos +=  2; }
  if (n >= 1<< 1) {           pos +=  1; }
  return pos;
}

In this case we have :

A * B == A << b
A / B == A >> b
A % B == A & (B - 1) == A & ((1U << b) - 1)

Usage in C

Either we can define your own function for * / % or we can use compiler optimisation. If we replace B by 1 << b (with a macro), then

int divu(uint a, uint b)
{
        return a / (1U << b);
}

int modu(uint a, uint b)
{
        return a % (1U << b);
}

int mulu(uint a, uint b)
{
        return a * (1U << b);
}

will give you in arm assembly (arm-linux-gnueabi-gcc -Os -S )

divu:
        mov     r0, r0, lsr r1
        mov     pc, lr
modu:
        mvn     r3, #0
        bic     r0, r0, r3, asl r1
        mov     pc, lr
mulu:
        mov     r0, r0, asl r1
        mov     pc, lr

That's pretty fast without ugly operator redefinition !!!

But add and sub are slower [1], because we need to compute the shift.

Note a gcc missoptimisation (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48812)

int divu3(uint a, uint b)
{
        return a / ((1U<<b) / 4);
}

divu3:
        stmfd   sp!, {r3, lr}
        mov     r3, #1
        mov     r1, r3, asl r1
        mov     r1, r1, lsr #2
        bl      {aeabi_uidiv
        ldmfd   sp!, {r3, pc}

This is unfortunate, the compiler could have generated :

divu3:
        mov     r3, #1
        mov     r1, r3, asl r1
        mov     r1, r1, lsr #2
        mov     r0, r0, lsr r1

[1]

int addu(uint a, uint b)
{
        return a + (1U << b);
}

addu:
        mov     r3, #1
        add     r0, r0, r3, asl r1
        mov     pc, lr

Posted Tue Apr 26 22:38:20 2011 Tags: embeded

FATFS

A tiny fatfs implementation can be found at http://elm-chan.org/fsw/ff/00index_e.html and petit implementation.

After looking at the code, it seems there is a problem in the code : it assumes fat sector size should match hardware sector size.

This is not true, fat sector size only need to be >= hardware sector size

Also the power of 2 trick can be used.

Note the guy got interresting SPI mmc stuff http://elm-chan.org/docs/mmc/mmc_e.html

Posted Thu Apr 28 23:19:15 2011 Tags: embeded

Matthieu Castet

futher optimisation

We saw than if B = 1 << b, then

A * B == A << b
A / B == A >> b
A % B == A & (B - 1) == A & ((1U << b) - 1)

But there are interesting property if we have 2 power of 2.

B1 * B2 == (1 << b1) * (1 << b2) == 1 << (b1 + b2)
B1 / B2

if B1 >= B2, (1 << b1) / (1 << b2) == 1 << (b1 - b2)
if B1 < B2, 0

A / (B1 / B2) == A / (1 << (b1 - b2)) == A >> (b1 - b2) because (B1 / B2) can't be null in C
A * (B1 / B2)

if b1 - b2 >= 0, A * (1 << (b1 - b2)) == A << (b1 - b2)
if b1 - b2 < 0, 0

This means macro is not enough, but compiler isn't often clever to detect this. To have efficient code, better feed compiler with precomputed stuff.

int divu3(uint a, uint b)
{
        return a / ((1U<<b) / 4);
}

int divu300(uint a, uint b)
{
        return a / (1<<(b-2));
}

divu3:
        stmfd   sp!, {r3, lr}
        mov     r3, #1
        mov     r1, r3, asl r1
        mov     r1, r1, lsr #2
        bl      {aeabi_uidiv
        ldmfd   sp!, {r3, pc}
divu300:
        sub     r1, r1, #2
        mov     r0, r0, lsr r1
        mov     pc, lr

PS : arm compiler is not able to optimize A / B and A * B …

Posted Fri Apr 29 23:05:24 2011 Tags: embeded