home .. forth .. misc mail list archive ..

F21 multiprocessing ideas


Regarding the questions about F21 multiprocessing I would like to say
a few things.  First of all the biggest problem is that none of the
F21 prototype chips have worked well enough to do anything yet.  That is
still the first hurdle.  I can still only demo Ultra Technology simulators
and compilers running on PCs, Ultra Technology software on Offete's
MuP21 and on iTV's i21.  ITV has been doing many demos for a long time
and I have a few demos like F21chess. It seems likely F21 will be working 
well enough to do demos pretty soon, but I have said that before.

As for multiprocessing software it can be done in many ways just as
it is possible to interconnect F21 in many ways.  The single serial/
network interface makes for a very simple high speed single path.
This means a ring is trivial.  The serial/network pin also are on
the parallel port so they can be tri-stated and read or driven by
the cpu via bit bang.  All the parallel port pins can be bit banged
and with the availability of programmable cpu interrupts from various
coprocessors this can be done with a little cpu power and the
interupt and parallel port hardware.  With a single very high speed
network path and a bunch of slower speed connects on each chip many
different network topologies can be supported.

If you want fault tolerance you can simply provide 
some alernate bipasses to the main network paths.  If a node
fails then an alternate slower speed bit banged path can always
be taken around it.  General purpose multidimensional hypernode
interconnects can be supported or special purpose dedicated
i/o processor designs can be configured.  Hot insertion or removal
of nodes could be easily supported.

Memory bandwidth has always been the bottleneck on the F21 design.
There is a hierarchy of memory sizes and speeds available on each
node and various internode connects for a hierarchy of speeds
for distributed memory access.  The CPU is internally running at
500 mips but there is no interface providing that speed though
memory.  With the cpu only running an no i/o coprocessors the
speeds look like 333 internal ROM, 200 external SRAM, 100 external
DRAM, 6 external ROM.  However when you give up some bus bandwidth
to I/O coprocessing the cpu gets reduced bandwidth and it is a big
factor.  The CPU speeds with video generating low res composite video
drop to something like 280 interal ROM, 150 SRAM, 25 DRAM, 4 external ROM.
Notice the dramatic drop when the CPU and video coprocessor are fighting
over DRAM.  You don't get the offpage access problems if the CPU is in
SRAM or internal ROM because the video will mostly generate sequential
mostly onpage DRAM accesses and only need a small part of the bus bandwidth.
The cheapest nodes would not have DRAM and not be able to support video
at all anyway.

For these reasons I imagine that in most F21 designs most F21 will not
need to waste bus bandwidth on video generation.  An OS can easily provide
I/O with any process on any node on system displays whereever they are
on the network or even on other systems via some interconnect.   It
might be useful to include a video for each node on some designs but
it is certainly not a requirement.  I do like the way Macs can configure
multiple displays into a large virtual display and drag stuff around on
one big virtual monitor if they want to.  I also think F21 is suitable
for many applications that have no video output anywhere.

bJeff Fox