Power of number algorithm

Another example for recursive call is calculating the power of a number. The recursion requires that formal params,local variables and return address are stored onto the stack for each recursive call. However,  having in mind that RAM is a limited resources in AVR devices it is conceivable why recursive algorithms are not widely accepted in avr assembly programming. There is a need for optimization or twisting the stack frame build up requirement. The following rules will be used in building a recursive avr algorithm

  •  Stack frame limit - calculate the maximum stack size in accordance with input params. RAM is a limited resource so beware how deep the rabbit hole may go. Depending on AVR device you may have different stack frame’s size.
  • Save in stack frame only the minimum amount of data, use global data whenever possible. This reduce code usage and stack frame size.
  • CPU saves the return address on each recursive call so apart from this, frame build up and maintenance is our code responsibility.

Two methods of finding the power of a number are presented. Both assume unsigned byte size input - word size output.

Naive solution

The most straightforward solution to the problem would be to implement  x2=x.x.x.x...x , multiplying x  n times with itself. Following implementation does not even use local stack frame variables. All calculations are done over global ones.

;************************************Power of a number***************
;* Naive slow approach stack usage N*4
;* Unsigned arithmetic
;*@INPUT: xx -> number
;*                  n -> power
;*************************************************************************
naive_power:
    //EXIT CONDITION
    tst n
    brne nextframe
    ret
nextframe:
        //BUILD NEXT STACK FRAME
    dec n
    rcall naive_power
    //CALCULATE REWINDING THE STACK
    clr ah
    mov al,xx
    movw bh:bl,output3:output4
        rcall mul16x16_32   ;no stack manipulation
ret

Optimized solution

Suppose we want to compute xn , where x is a real number and n is any integer. It's easy if n is 0, since x0=1 no matter what x is. That's a good base condition case. Only positive numbers are considered so when you multiply powers of x, you add the exponents: xa+xb=xa+b for any base x and any exponents a and b. Therefore, if n is positive and even, then xn=xn/2 .xn/2. If we were to compute y=xn/2 recursively, then we could compute xn=y.y . What if n is positive and odd? Then xn=xn-1.x and and n-1 either is 0 or is positive and even. We just saw how to compute powers of x when the exponent either is 0 or is positive and even. Therefore, we could compute xn-1 recursively , and then use this result to compute xn=xn-1.x

What about when n is negative? Then xn=1/x-n , and the exponent −n is positive. So we can compute x-n recursively and take its reciprocal.

Putting these observations together, we get the following recursive algorithm for computing xn.

  • The base case is when n=0,and x0=1.
  •  If n is positive and even, recursively compute y=xn/2 and then xn=y.y. Notice that you can get away with making just one recursive call in this case, computing xn/2 just once, and then you multiply the result of this recursive call by itself.
  •  If n is positive and odd, recursively compute xn-1 , so that the exponent either is 0 or is positive and even. Then, xn=xn-1.x.
  •  If n is negative, recursively compute x-n , so that the exponent becomes positive. Then, xn=1/x-n.

The following acorn kernel task is based  on positive numbers only. Default task stack is 20 and each stack frame keeps 1 byte and return call of 2 bytes. There is however an internal call to multiply function in each frame after 1 byte is popped off the stack. So total stack frame usage is 4 bytes. Calculating 85 will require (5/2+1)*4=12. The default task stack size is less than 20 bytes so NO increase of the variable TASK_STACK_DEPTH in kernel.inc is required.

;************************************Power of a number***************
;* Optimized fast approach
;* Unsigned arithmetic
;*@INPUT: xx -> number
;*                  n -> power
;*************************************************************************
power:
        //EXIT CONDITION
    tst n
    brne nextframeex
    ret
nextframeex:
    //BUILD NEXT STACK FRAME
        push n
        lsr n
    rcall power
    //CALCULATE REWINDING THE STACK
    pop n
    sbrc n,0   ;is n odd
    rjmp odd
even:
    ;result*result    
    movw ah:al,output3:output4
    movw bh:bl,output3:output4
        rcall mul16x16_32
    ret
odd:
    ;result*result    
    movw ah:al,output3:output4
    movw bh:bl,output3:output4
        rcall mul16x16_32  
    ;x*result
    clr ah
    mov al,xx
    movw bh:bl,output3:output4
        rcall mul16x16_32  
ret

;****************MULTIPLY unsigned 16x16=32bit
;* INPUT= r23:r22 * r21:r20
;* USAGE=r0,r1,r2
;* STACK=0
;* OUTPUT = r19:r18
;*              r17:r16
;******************************************
mul16x16_32:
clr r2
mul r23, r21 ; ah * bh
movw r19:r18, r1:r0
mul r22, r20 ; al * bl
movw r17:r16, r1:r0
mul r23, r20 ; ah * bl
add r17, r0
adc r18, r1
adc r19, r2
mul r21, r22 ; bh * al
add r17, r0
adc r18, r1
adc r19, r2
ret


.def output1=r19   ;not used
.def output2=r18   ;not used
.def output3=r17
.def output4=r16  


.def    bh=r23
.def    bl=r22    

.def    ah=r21
.def    al=r20



.def    n=r24
.def    xx=r25

Task_16:
    ldi bl,0
    ldi bh,0

    ldi output4,1
    
    clr ah
    clr al

    ldi n,5
    ldi xx,8   ;3^6
    
    rcall power
main16:
    nop    
rjmp main16
ret