home .. forth .. misc mail list archive ..

Re: questions (about x21)


Dear MISC readers:

Greg wrote:
>	As a side note, it looks like the F21 and P21 will be drastically
>different chips from a programmer's poiont of view.  On the F21 I'd be
>able to keep my entire call and data stacks on chip, and just accept the
>limitations of them being only 17/18 deep.  But on the P21, you really
>don't have much choice other than to use those stacks like P21Forth does
>except in tight words, which seems somehow like quite a waste.  I guess
>the chip is more than fast enough to handle the overhead for my uses,
>anyways, but if I'm going to be using it almost as a register chip then
>I'd be tempted to just straight up buy a register chip instead.  Oh well.
>I guess youo already know abuot all those issues because the F21 has such
>deep stacks.

My early ideas were from working with Novix and RTX, from Phil
Koopman's book, and from discussions with Chuck and other people
working on chip hardware.  At the next stage my opinions were formed from
writing dozens of compilers and benchmarks for dozens of simulated
designs for comparison.

In those early days my opinion was similar to the one expressed by 
Greg.  However after spending a few years programming MuP21 and
a number of years programming on a chip with the same core as F21 
and working with other programmers I changed a number of opinions.

At first I had assumed that only 6 data cell deep hardware stacks on
MuP21 was not sufficient for what I considered Forth.  I assumed that
the "assembler" would be a little strange compared to an implementation
with stacks in memory and of adjustable size.  I examined many of the
ways one deals with this in compilers of various types.

After a few years of writing code I found that for the most part
MuP21 was big enough to do native Forth.  I found that it was about
as easy to code in what Chuck called "Machine Forth" as it was to
code in any of the ANS Forths with stacks in memory.  The main
differences were that the only libraries for Machine Forth were
from Chuck, myself, Dr. Ting, Dr. Montvelishsky and a few others
and that the architecture "forced you to write good code" according
to Chuck.

I learned what he meant.  Standard Forth had standardized on common
practices including many rather inefficient constructs as it grew
to be more and more like other programming languages.  Chuck's
Machine Forth was stripped down to match the machines Chuck was
designing.  One of the differences was that a conventional Forth
with stacks in memory meant that the primitive instructions required
multiple instruction word accesses and data and return stack memory
accesses while Machine Forth primitives were fast native opcodes
often requiring 1/4 of a memory access. 

Good Machine Forth minimizes the memory access so things are by
definition smaller than a standard Forth and usually 50 to 100 
times faster.  You can give up 50 to 100x performance by using 
a standard Forth approach as you could on any other processor.  
But you give up most of the idea behind P21 if you try to force 
it into being a Standard Forth engine, a 'C' engine, or a Java engine.

I saw good and bad Machine Forth.  But it general it does almost
force you to write good Forth.  By that I mean pay attention to
the resources at hand (all systems consist of some balance of
resources) and pay attention to the probem and write the code
that is needed.  There wasn't a huge difference between plain
Machine Forth and tweaked Machine Forth, there are only a few
optimization rules.  

On the other hand the problem with Standard Forth was not that
you gave up 50 to 100x up front.  It was that Standard Forth
supports a lot of what Chuck calls "abominations" in the standard
and the common practice of a lot of people is to abuse these
features rather badly.  The standard provides few guidelines
about what is good and what is bad.  The assumptions that
many people make about what is good in standard Forth is what
is in libraries, what they normally do, what looks like other
languages etc.  There is a huge range of what people do in
standard Forth so the performance ratios usually involved
that extra multiplication factor that Chuck talked about in
1x Forth.  When people are not concerned about giving up 100x up
front they often give up another 100x or 1000x along the way.
It is hard to justify the choice of the solution that is 100,000
slower or 1000x bigger in most cases.  It is even harder to 
justify when the two environments seem to provide a similar level
of programmer productivity from a decade of accumalated evidence.

Because P21 has no interupts it really doesn't need many registers.
Chuck designed it to have enough registers to do good Forth in
native mode.  That is what he considers good Forth.  After a number
of years I came to understand what he meant about not putting too
much stuff on the stack, factoring a lot etc.  Still I also observed
that he used tricks that do the opposite like unrolled and inlined
inner loops with computed entry in places where you want maximum
speed.

Chuck feels that MuP21 has a healthy number of stack levels for 
good code, and that F21 has "virtually infinite" stack depth.  It
should be noted that F21 has three interupts and if you
play with those you may need extra stack cells.

At the time of the original F20 specification it had 32 data
stack cells and that was the "minimun" specified in draft version 3
of the ANS Forth Standard at that time.  Interestingly enough
ANS removed that requirement from the final version so there is
no minimum (or maximum) specified to be standard.

I thought it was interesting that Dr. Montvelishsky said that he
thought that P21 was actually a little more "fun" to program than
F21 or I21 because of the smaller stacks.  I think I understand
what he means.  You can be a little more lazy mentally on F21
and get code because you have more stack depth to use (or waste.)

I think from the standpoint of Machine Forth the two architectures
are very similar.  The main differences being the additional instructions,
the additional branching modes, and the change to 2/.  Making 2/
21 bits instead of 20 only effects coding math functions. Of the three
differences the availability of two addressing registers does change the 
way you code a lot of things and is much cleaner than having only one.

The additional branching comes into effect when working with
larger programs (larger than 1K in Machine Forth).  If you are
doing native code then programs bigger than 1K must use call and
jump macros to branch to other pages.  This makes programs larger
and slower.  This is signifigantly reduced on F21 since it also
supports both 14 paged jumps and home page jumps where home
page is 0 in DRAM or SRAM.  This factor is almost as important
as larger stacks from the standpoint of generating code.  And it
is in support of Greg's contention that they are signifigantly
different from a programmer's point of view.

From the point of view of a programmer who has done a lot of 
Machine Forth they appear very similar.  From the point of view
of a programmer generating code in a different environment, or
generating a large amount of code they start to look more different.

But that is all just looking at the CPU.  If you look at the overall
chip then they become quite different as systems.  The huge difference
is that P21 has no I/O other than video output.  F21 has I/O devices
on chip and three sources of interupts.  This results in the huge
difference that on P21 one must build memory mapped I/O hardware and
poll it with the CPU.  If you are clever you can get good real-time
operation but it may be tricky.  The CPU must poll all I/O devices
(other than composite video) in real time and execute whatever
processing functions are required.  There will be a limit to the
number of devices you can place on the P21 bus before you get 
loading problems.  It is all up to the system designer, board designer,
I/O subsystem designer and programmer to work out the details of
what the chip can and can't do in real time.

For instance one device I worked on had P21 connected to an LCD video
display, serial touch screen, serial voice I/O chip, serial port to PC.
P21 could poll these devices, generate video, and run the programs
they had in real time.  But it should be understood that the fact
that P21 had to poll all these device interfaces was the main
limiting factor.

On F21 things are much simpler in the sense that there is a lot of
I/O hardware already connected up and with some interupt instructions
to make the programming cleaner.  To begin with this means that you
can still bit bang multiple megabit data streams like you could on
P21 but the programming might be much easier if you make it interupt
driven.  You also have the ability to just feed data to I/O coprocessors
to get much higher performace than you ever could with a polled
interface. 

Take the F21 in a mouse project.  One could have made a stamp sized
board for P21 with memory and video out and parallel and serial
ports and timer and then programmed it to replace the microcontroller
in a mouse.  The design of the program however would be tied to
the design of the board.  The software would be a bit more difficult
using P21 than using F21, but hardware is much more difficult than with
F21 where there is a handy port on chip and the hardware part took a
couple of minutes.

So either with P21 or F21 most projects will require a combination
of hardware and software design.  The main advantage of either chip
is that if your company needs millions of widgets they need to keep
costs down.  Either chip could be had for a couple of bucks or less
in large quantity.  complete systems will require also boards, memory,
and perhaps extra I/O hardware.  So adding custum I/O hardware will
consume some of the low cost advantage.

If programming also will require designing custom I/O hardware subsystems,
building, testing, etc. then things like on chip hardware with
intelligent I/O coprocessor with interupt driven interfaces vs custom
off chip hardware with a polled CPU interface things are really quite
different.  The difference is far greater than the size of stacks 
or branching modes in my opinion.

Many Forth programmers are familiar with the idea that there is a
tradeoff between hardware and software.  You can do a lot with a
little hardware and some software especially with Forth.  But when
you need high performance you need specialize hardware.  The MISC
idea is that you can use minimal (resource) cost CPU with custom
I/O hardware to get 1000x better economic efficiency than using
the same high (resource) cost generic solutions.

The idea was that P21 demonstrated this for Chuck's OKAD application.
To replace a $25K workstation all Chuck needed was a CPU that could
execute the code to his CAD system, generate video, and poll a 
keyboard interface fast enough to keep up with him.  Thus P21
was sufficient for many problems requiring moderate to high
computation and/or composite video out.  It could be coaxed into
other things by adding extra external I/O hardware and polled 
software interfaces.

I sort of expected to see a few people make PALs or other ASICs
to connect to their P21 to make them into something that could go
beyond the capability of plain P21.  I also was not very impressed
with the P21 boards provided by Offete.  Of the three I prefer the
original development system, but I modified it and replaced the
horrible parallel I/O chip that was in it with a header with 3
cmos chips on it for parallel I/O.  The second board from Offete
used a similar but different arrangment of 3 cmos chips for parallel
I/O and the third board has the serial I/O chip.  The parallel and
serial I/O chips that Ting used I found much harder to program than
the P21.

As for programming the CPU to get a feeling for what it can and
cannot do that was the purpose of the free simulator.  As for
programming real applications that requires a lot of things like
a real board with real I/O limitations, or extending the
simulator to also simulate your custom I/O hardware.  We had a 
lot of sucess doing that for iTV boards and could run board ROMs
on the PC.  I did a similar thing when I added support for a
simulated mouse interface to the emulator so it could run the
ROMs in the F21 in a mouse.

What is needed with F21 is a toolkit of software to program the
analog I/O coprocessor as a sound card, as a laboratory controller
interface, as a digital recording osciloscope, as an acoustic
modem supporting ... standards, as a cable modem at xxx megahertx,
etc. for each of the hardware devices provided on chip.  It would 
be sort of like the one Chuck provided with his Novix Forth kits,
and like the electronics lab that I described in the Physics
department in college. 

I figured lots of people could contribute lots of things like
that. I could still happen.  Stamp sized boards are not difficult
to engineer or build if you have a little money.  Software is
easy to write.  As I mentioned before UltraTechnology only has
a part time programmer (me about 1 hour/week these days) and
a part time chip tester (me) and part time documentation team
(me) etc.  the good news is that I recently got an osciloscope
to be able to do the testing on the analog I/O coprocessor.

I still think F21 is very different than P21.  F21 has five
processors on the chip and only one is the CPU!  The CPU on
F21 is more similar to P21 than it is not.  The chips are
very different IMHO.

Jeff Fox