home .. forth .. misc mail list archive ..

Re: bits, timings, mem maps, etc.


Andrew Sieber <kd4jtv@bbs.wa4yse.ampr.org>:
>the processor must still load operands from memory every fifth cycle, it
>seems to me that even with 15 PICOsecond sram chips the processor would
>still only run at 80 MIPS.

There is no fifth cycle.
The next instruction word is prefetched as soon as possible.

Look at the instructions encoding: all instructions which reference memory
(call ret T0 C0 jump # @ @+ @R ! !+ !R) have their bit4 set whereas all the
other instructions (arithmetic and stack handling) have their bit4 reset.

During the execution of the 1st (resp. 2nd, 3rd and 4th) instruction slot,
the bit4 of the 4 (resp. 3 last, 2 last, last) instructions are ORed together;
if the result is set, then there is still a memory-referencing instruction to
execute before loading the next instruction word into the instruction register;
otherwise, the next instruction word is fetched from memory while the remaining
instructions (if any) are executing.

Then in the best case, where none of the 4 instructions references memory, the
next instruction word is fetched in parallel with the execution of the 4 instr,
which are executed faster than the memory access, this is why a + in the 4th
slot does not require a nop in the 1st slot of the next instruction word,
because there is time after the + to propagate the carry before the next
instruction word is loaded and the 1st instruction slot is executed.

In the worst case, where all 4 instructions reference memory, the total
execution time is the sum of the 5 memory references, one for each instruction
plus one to fetch the next instruction word.

MIPS has never been a good performance scale to compare processors, and in the
case of miscs it's worse, because the instruction set is so different and
instruction timing depend on voltage and on memory addresses.

Understanding the timing of a processor is vital if you write applications to
be run under real-time constraints. It is dead simple with the RTX2000, the
8501 or the SHARC, it's still very easy to compute with the x21s (but becare,
it depends on the power supply voltage), but it's a nightmare with the
TMS320C40, the i80x86, or other pipelined/cached/complex processors, for which
the only practical way to get (approximate) timings is to measure the execution
duration of big enough sequences of instructions (such as subroutines).

CL
--
email: Christophe.Lavarenne@inria.fr		tel: +33.1.39.63.55.80
INRIA, Domaine de Voluceau Rocquencourt		Institut National de Recherche
B.P.105 - 78153 LE CHESNAY CEDEX FRANCE		en Informatique et Automatique

SynDEx, CAD tool for the distributed implementation of real-time applications.
Take a look at our Web Server: http://www-rocq.inria.fr/syndex