home .. forth .. misc mail list archive ..

Re: lean + mean


Eugene,

This thread being no more MISC specific, do we continue out of the misc list?
However, it may be of interest for the multiprocessing aspects of F21.
Who is interested in following this thread, privately or on the misc list?
--
Between your lines, I understand that you want to attack a massive problem,
with massive data arrays processed by very repetitive operations, allowing
massive data parallelism, that you want to accelerate (or even process in
real-time?) with a big lattice of SHARCs, each simply executing a slice of
the repetitive loops.

Evidence: if you look at multiDSP, it's because you want very fast processing.
Then don't waste processing power with a (Forth) virtual machine, use the full
power of the DSP directly with assembly.  You say that you'll have just a
single simple loop on each DSP, then this loop shouldn't be very hard to code
efficiently in assembly.

You don't need to bother with a SHARC-resident Forth interpreter/compiler,
this job may be done much easier on the host computer by an umbilical
cross-assembler/compiler written in Forth.  The only thing you need resident
on each SHARC is a minimal monitor accepting remote subroutine calls, with
at least one initial subroutine for downloading new subroutines.

All my umbilical cross-assemblers work this way.  More precisely, the host and
target processors interact such that the host may be seen as a server for the
target.  Initially, the target executes the monitor, which requests the host:
  "Hey, I'm ready to accept a new job, ask the user what he wants me to do."
Then the host lets the user enter some line(s) of code, compiles them on the
fly in a memory-image of the target code memory, marking which range of memory
addresses have been updated, until the user asks for some code to be executed.
Then the host asks the target to execute its side of the code downloader, and
downloads that memory range which has been marked as updated, i.e. subroutines
including the one the user asked to be executed by the target, and resets the
updated-range marks.
When the target's downloader returns to the monitor, the monitor requests:
  "Hey, I'm ready to accept a new job, ask the user what he wants me to do."
Then the host asks the target to execute the user requested subroutine, and
from there acts as an i/o (display, keyboard, disk) server for the target.
The downloaded user subroutine executed by the target may then request some
services to the host (such as "print me this"), until it returns to the monitor
which requests again the host:
  "Hey, I'm ready to accept a new job, ask the user what he wants me to do."
And we're back to the initial state, apart for the downloaded code and the
eventual side effects of the executed user subroutine.

How does the user specify which code the target must execute?  Very simple.

": name  some words ;" is a named subroutine, therefore it may be referenced
later by its name, so it's simply compiled and kept in code memory, i.e. it's
compiled in the host memory image of the target code memory, and its name is
kept in a host dictionary of the target subroutine entries.

"some words ;" is an anonymous subroutine, therefore it may not be referenced
later by name, so the only sensible thing to do with it is to compile it,
execute it, and forget it on the fly, i.e. it's compiled in the host memory
image of the target code memory, then downloaded into the target code memory
with other newly compiled subroutines, then the compilation pointer is reset
to the beginning of the anonymous subroutine (i.e. to the end of the previous,
in fact last, named subroutine) so that its code space is recovered for the
next named or anonymous subroutine, then the downloaded anonymous subroutine
is remotely called and executed by the target, during which the host acts as
i/o server, then the host resume processing the user input.

Note that whether named or anonymous subroutines, we're always compiling,
there is no need for a "compiling/interpreting" state, and therefore no more
beginner-puzzling interpretation-forbidden words (IF ." POSTPONE and so on).
Native cross-compilation, with subroutine-threading and primitive-inlining,
is also much simpler, because you don't need to provide for the interpreting
mode a separate subroutine for each primitive, most primitives are just
host-resident macros which compile their few instructions inline, and may even
look back one or a few previously compiled instructions for eventual peep-hole
optimizations.  Simple and efficient, very minimalist, I enjoy it.

I presented this idea for the first time at the EuroFORML'92 conference in
Southampton, with Rod Crawford ("Who needs the interpreter anyway?"), but
nobody seemed to understand that as an impromvement.  Then at the FORML'95
conference in Asilomar, in an impromptu talk (I didn't take the time to write
and present a paper), with a short demo on 3 different simulated targets (8051,
RTX2000, muP21), which was rewarded for being for long the only proposition of
modification to _remove_ a feature (the interpreter) from Forth.  But it seems
that the idea has not spread more than that.

Since then, I have been playing a lot with ADSP2181 targets (www.analog.com)
and maybe one day I'll play with a SHARC ADSP21065 target, or with a Lucent.
My umbilical cross-assembler/compilers are used by friends of mine:
FF51 for i8051 is used for trailer embedded applications,
FF2K for RTX2000 is used for industrial video quality control applications,
FF21 for muP21 and v21 is the smallest, prettiest, but was just a dream,
FF86 for i80x86 is a project to extend the idea to the host itself,
FF96 for i80196 was abandonned with the idea to use this processor,
FF81 for ADSP2181 is used for home alarm systems applications.
I develop them during summers or "lost" nights, because I'm pretty fully
charged with INRIA's research and development on distributed, embedded,
optimized real-time applications on multi-workstation/DSP/microcontroler/CAN
(http://www-rocq.inria.fr/syndex/).

Can you give pointers on your research work?
CL