home .. forth .. misc mail list archive ..

Re: nanoFORTH


>: Myron Plichota
>( @)            ( addr -- n)                    ( T = memory[T])
>( !)            ( n addr --)                    ( memory[T] = NOS)

This (regular) "!" costs 2 pops, which makes it clumsy to realize in hardware
(look at the NOVIX/RTX2000 need to have 2 clock cycles per instruction).

The x21 family breaks through this by having the address off the stack, in an
"address register" (which may be post-incremented), and having the data in T,
popped/stored into memory or pushed/fetched from memory.
Three address registers may be used:
- P, the Program pointer:  fetchPostIncr[P]>push[T] (LIT instruction)
- R, the Return stack top: fetchPostIncr[R]>push[T], pop[T]>storePostIncr[R]
  (R may be popped/loaded with >R, or pushed/copied with R> instruction)
- A, the Address register: fetchPostIncr[A]>push[T], pop[T]>storePostIncr[A]
  may also not postIncr:   fetch[A]>push[T],         pop[T]>store[A]
  (A itself may be loaded/popped from the stack, or copied/pushed to the stack)

This allows very efficient memory transfers:
fetch+ at P instructions which alternatively fetch+ at R and store+ at A.

Addressing from R is also very convenient for accessing data following a CALL
(create/does> constructs, or inline data structures such as ." strings).

Addressing from R is anyway required for the subroutine return instruction,
which does the following (in parallel):
- R drives the memory address bus (and its attached address incrementer)
- the memory data bus is latched into I (the multi-instruction register)
- the memory address incrementer is latched into P (the program pointer)
- the return stack is popped/latched into R

"A" is also sometimes convenient as a temporary register in computations or to
make easier complex stack manipulations.  But it must be saved on the stack if
used during interrupts.  And it requires two dedicated instructions to load/pop
it from the stack and to copy/push it to the stack.
For a 16 instructions machine, A must be sacrificed, addressing from R is
enough.  Memory transfers will be less efficient, but switching between source
and destination addresses may be minimized by using the stack as intermediate
buffer, grouping N fetch+/push in a row, then switching R to the destination
address, then grouping N pop/store+ in a row, switching R back to the source
address, and so on.

CL