home .. forth .. misc mail list archive ..

Re: Recent dialog and old initiatives


Ben,

I enjoyed checking out your website. It looks like you are a fan of the
PDP-8 and Nova CPUs. I think they are kind of cool too. From what I can your
goal is not to produce a Forth chip. I am sometimes torn between a pure
Forth and a more general-purpose approach, but I always seem to come back to
Forth, or at least some kind of stack machine.

Ben Franchuk wrote:


> Myron Plichota wrote:
> > I can see how the
> > presence of an A-register stack could affect programming style as
> > fundamentally as the original single A-register does.
>
> Hmm stacks in Forth, what is next? Index registers.!?
> I think Forth needs a few more 'conventional' cpu instructions
> because any unusual word definitions previously could be handled
> with assembly code of the host machine now only has the high level forth
> instructions in hardware.
>

What I like about the A-register stack idea is that it is consistent with
Forth MISC design precepts and could prove to be an elegant bottleneck
opener when you can't avoid using static variables within a loop for
instance. Although I don't like everything about any of the x21 chips, they
show a different approach where you find that you don't need a lot of the
things you always thought you did, such as index registers. The original
question of "What remains to be taken away?" was answered with the MuP21
(Nothing!). The current question is "What would be the most valuable
enhancements to invest real estate on?", and the opinions of the guys who
have all this experience in the x21 arena are of keen interest to me.

If you implement a Forth architecture and instruction set in hardware then
it's best to stick to that philosophy in your software. Forth puts an
additional onus on a programmer to take into account the sequential nature
of data presentation, unlike a random-access C-style stack frame accessed
via indexed addressing modes. This additional concern may require extra time
and effort when writing code at least until experience and good programming
habits are built up. A real Forth machine which has both stacks implemented
on-chip as shift registers yields very fast execution of fundamental Forth
stack operators with a minimum of transistors but at the price of making
stack indexing or re-ordering operators such as PICK very inefficient and to
be avoided at all costs. A virtual Forth machine coded on a general-purpose
CPU with code, statics, and both stacks off-chip in unified memory is
inherently less efficient and therefore doesn't provide the incentives for
such rigor, but will never scream along like a real Forth machine can. In my
experience, it is relatively inefficient to implement virtual Forth on a
general-purpose C-oriented machine, or conversly, to ape C methodology on a
real Forth machine. I have not yet had the pleasure of running a real Forth
machine yet, but I now write Forth on my PC in a style heavily influenced by
x21 precepts. There are so many good general-purpose architecture CPUs
available that I am not very interested in building my own.

> > If I were made of money, I'd produce a 64-bit x21-like MISC with a
> > full-lookahead carry adder, one or two additional instructions to
augment
> > DSP capabilities, six or more serial links, and a glueless SDRAM
interface
> > as the toppers in the features list. Putting a stake in the ground at 64
> > bits may seem excessive and arbitrary, but it banishes integer
queasiness in
> > DSP algorithms and makes problems like pixel boffing and astrogation
math
> > within better reach. A 64-bit address is not so far fetched when you
also
> > consider virtual reality as a potential application. Maybe (probably)
we'll
> > always want more.
>
> I think the problem is not that we need 64bit address space, but because
> the current bunch of cpu's and software is very wasteful of resources.
> They have two sizes of index offsets,8 bit signed and 32bit signed.
> Yet 90%? of memory access is for local variables. The number of
local/gobal
> variables is small.One could get by with <8k offsets if one did not
> place arrays in the middle of local data. ie : int foo,foobar,fooie[1000];
> Pointers to arrays are fine.This goes for both static and local stack
data,that
> really only need have a small amount of space.A memory map could look like
this
> for a conventional cpu.
> <Static simple variables>
> <Fized arrays>
> <Small stack>
> <code segment>
> <heap for local/gobal arrays >
> <heap for random  memory>
> Virtual memory and object oriented programing both are wasteful in
resources if
> used as a cure-all for programing problems.While useful they need to be
used
> with care. BTHW with 4kb (2^12) pages the pagetables for 31 bit virtual
memory
> (2^31) is 2^19 long words or 2 MB per process.With 64 bit memory you need
> virtual memory
> for the page tables alone.
>

The C++ approach is guaranteed to bloat everything, and because proper
debugging of such overblown systems has never been satisfactorily
demonstrated (at least to my knowledge), it's a good thing they have MMUs
(Mommy, why does Windoze act so funny sometimes?). What I'm thinking of is a
real Forth machine with flat addressing and no need for an MMU or any other
complications or obstacles to retaining transparent control of the system. I
intend to adhere to the concept of using human-readable Forth source as the
sole method of program distribution (How's that for open sourcing?). The
compiler will be the only code generator and simply compiles at the current
location as the source streams in and therefore doesn't need to be
complicated with relocatability issues in the first place. The reason for
considering >4Gbyte physical addressing valuable in the first place would be
to handle huge chunks of data, like stereo retina-quality streaming video
for virtual reality displays. I dislike MMUs and protection schemes because
they punish good programmers for the failings of bad ones. IMHO, it is a
better thing to have an undeniable crash than the pretense that the OS has
isolated the rest of the system from any damage.

> > The Steamer16 CPLD design I posted a while back has been revamped to
pack
> > five 3-bit instructions into each 16-bit cell, and HLL-support macros
have
> > been implemented for the assembler. It's first application is a machine
> > vision system which will hopefully earn its keep in the real world of
> > CAD/CAM.
>
> I think packing of a few (3,4?) opcodes per instruction word and more
> local (on chip) memory access is the way to go with external memory
getting
> slower
> compared to internal memory.
>

It obviates the need for an instruction cache in a big way. A real Forth
chip with both stacks on-chip also obviates the need for a data cache as
long as you avoid overuse of static variables.

> > BTW, I strongly encourage anyone out there who wants to design their own
> > silicon to learn VHDL and use any one of the various low-cost tools
> > available. I have used 4 different vendors' tools so far, and they are
all
> > _pretty good_. You can learn a lot and gain confidence from simulations
even
> > if you never actually burn or otherwise fab a chip.
>
> For now schematic capture makes more sense to me because you can see what
you
> get.With the high level hardware languages I am never sure what logic
comes out.
> Also after about 20 pages of schematic you know your do all, know all, cpu
> could use some simplification.:)
>

Actually, you probably can't be sure of what you are getting just by looking
at your schematic. It depends whether an optimizer is part of the toolchain
or not, and it always is in my experience. There is no substitute for taking
a good look at the report files after compiling your design. I have found
that there is an advantage to being able to use your favorite text editor to
work on the design at any workstation you please, including ones which do
not have the chip design tools installed. Also, it's usually quicker to
simply type " x <= a and not b;" then to browse for the equivalent schematic
symbol (and create it if it doesn't exist), and then point and click all
over the place to establish its place in the design. All of the VHDL
compilers I have used have been very good at their job and I have never felt
that the results were deficient in any way. I have done it both ways and
I'll never use the schematic entry method again if I can help it.

Myron Plichota