home .. forth .. misc mail list archive ..

multi/tasking/processing


Eugen Leitl says that thanks to MMU, non-OS code bugs are trapped by OS, which
provides for non-halting machines.  What about OS bugs ?  My Alpha happens to
crash/halt about every 2 weeks, and I must reboot my Pentium about every day.
For how many men*centuries have their OS been "debugged" ?  VMUNIX is about 6
megs on Alpha, and how many megs is MS-Windows ?

Small/simple code allowing full control is the best way I know to reduce bugs.
  I have personnally experienced several bugs in C compilers (MS, Borland and
Sun) and had no way to correct the compiler.  Even with GNU-C, you need a life
to find your way around the 80,000 lines of C coding it.
  Whereas I have written several complete Forth cross-development environments,
for 8051, multi-RTX, multi-P21, each with target processor simulator/debugger,
dis/assembler, native code optimizing compiler, with an umbilical interface.
Each of them requires less than 30 Kbytes on top of a Forth on the PC host side
and are up and working interactively with less than a few hundred bytes on each
target side.  And if any, I _can_ correct bugs, and it doesn't take long.

I have been also fighting for long with C compilers and "real-time distributed
OS" on Transputers and DSPs, to find out why they are doing things the other
way I think, and how I can make them do what I want.  And above everything else
is the nightmare of debugging multiprocessor code, with a black-box OS and
debugger between me and the hardware.

This is why we are developping here at INRIA the A3 methodology and the SynDEx
software that supports it to help real-time application designers to implement
efficiently their algorithms on their target architectures, and to free them
from the burden of writing and debugging distributed code.
  The designer specifies his algorithm and its potential parallelism with a
dependency graph composed of operations (such as a simple addition or a
complete FFT, and other special operations to condition and factorize/loop
other operations) and also specifies his hardware architecture and its
available parallelism as a graph of automata (instruction sequencers or mono-
operation wired circuits/FPGA/ASICs) interconnected by communication media
(serial/parallel point-to-point link or multipoint bus with or w/o memory),
each characterized with the time and space it requires to execute the
operations or the communications it is able to perform.  The designer codes,
debugs and characterizes separately each operation and communication primitive,
only once for each new architecture component.
  Then the designer interacts with a graphic optimization heuristic to
distribute and schedule the operations on the automata and the resulting
inter-automata communications on the communication media, until the resulting
predicted real-time behaviour matches the real-time constraints.
  Finally, an optimized deadlock-free executive is automatically generated that
encodes the distribution and the scheduling of the algorithm operations on the
target architecture.  The generated executive allocates memory for data
transferred between operations, calls operations sequentially on each
automaton, and synchronizes parallel sequences either through semaphores when
the shared communication medium has memory or through message passing when the
medium is synchronous.  No need for an "OS" duplicated on each "node", the
generated executive is as much a "dedicated OS" as the application itself.
  As the generated distributed executive is guaranteed, thanks to the formal
methodology, to provide the same I/O behaviour on any number of automata, the
debugging of the complete algorithm may be done on a monosequencer target (out
of real-time constraints) with a sequential debugger, avoiding the nightmare of
distributed debugging.

We, INRIA and a french private company, are collaborating with Chuck to combine
misc architectural simplicity with the A3 methodology to produce efficient,
although small and simple to program, multiprocessor misc architectures.  Work
is underway to integrate synchronization primitives in the instruction set and
to integrate memory on-chip to increase instruction bandwidth while decreasing
the requirements for off-chip bandwidth.  Work is also underway to make SynDEx
generate stack oriented (Forth-like) executives instead of the present C
executives, and to optimize at compile-time the on-chip memory allocation for
programs and for data.

Christophe.