home .. forth .. misc mail list archive ..

No Subject


Dear MISC Readers,

Barry Kauler asks:

>I've got the MuP21 Programming Manual, but it doesn't
>tell me how the stack works -- I mean, does it rotate
>around the ends?

Penio Penev replies,

>If you put more items, than the stack can hold, you loose the oldest ones.
>If you get more items, than you put, you get 0es. 

Not quite.  MuP21 has 6 registers in the data stack, T, N, S2, S3, S4, S5.

When you remove an item from the stack S5 remains the same.  So whatever
was in S5 moves into S4 and a copy remains in S5.  If it was a 0 
then you get a 0.

Chuck actually uses this feature quite frequently in OK and OKAD.

He does not do bit-block-transfers, instead he does rectangle-flood.
That is he block transfers a single word to fill a rectangle.

It is interesting to contrast how he does this, and see why it
is much faster than a conventional bit-block-transfer.

In P21Forth I do a conventional bbt.  GPUT ( x1 y1 x y a -- )
does a transfer from address a to x1 y1 in the current window
on the screen a rectangle of size x y.  In the inner loop
it loads one word from the linear array and writes one word
to the screen.  At the end of each line it adds an offset to
the screen address to get to the next line.

I OK Chuck fills all six data stack registers with the pattern
to flood, put the screen address in A, and jumps into table
of inlined !A+ !A+ !A+ !A+ instructions!

In my code there are two offpage memory accesses for data, two
offpage memory accesses for instructions, a couple of onpage
instruction references, loop overhead, and several times as
many instructions to execute for EACH word block transfered.

Chuck has no loop overhead because the code to move a line of
video is inlined, there is one offpage data access, one offpage
instruction access, and three onpage data references per four
words moved to the screen.  This makes Chuck's code about ten
times faster than the bblt code in P21Forth.

P21Forth could be speed up several times.  It could inline the
code to move a line like Chuck's code, and it could transfer more
than one word at a time.  But it must do a read and write for
each transfer, and Chuck only needs writes, so no matter what it
will have much less overhead, and much less offpage access overhead.

The same sort of thing will happen with the return stack.  If you
keep returning or poping from that stack you get lots of copies of R3.

Jeff Fox