home .. forth .. misc mail list archive ..

Re: adder and carry propagation


On Thu, 3 Apr 1997, alex lasky wrote:

> Begin quote:
> ------------------------------
> The rule of thumb is: carry propagates n bits for the time of one
> instruction slot, where n is, I guess, 8 for P21.  Therefore, if you need
> to fast carry propagation all the way up of numbers of small precision,
> you should _left_ justify them, so carry starts as left as possible and
> reaches T20 quickly.  Adding nops at 8 bits/nop i helpful.  You need two
> nops to cover the entire possible range.  An instruction fetch from DRAM
> counts as many,many nops, so plating a + in the _first_ slot in DRAM is
> safe.                                           ^^^^^^^
> 
> As usual, I remind, that intructions execute in parallel, that is, if you
> put a nop, the carry propagates behind the scenes.  The instruction itself
> _latches_ the result of the respective unit in TOS, that is you need nops
> _before_ the "instruction". 
> ^^^^^^^^
> Penio Penev <Penev@pisa.Rockefeller.edu> 1-212-327-7423
> 
> ------------------------------
> Isn't there a contradiction here? First you say put it in the first slot,
> then you say put nops before the instruction, which implies the last slot.

I should have been more clear, and may be this should definitely go as
FAQ#1 :-)

The moment the stack changes, the add-er, or-er, and-er, xor-er,
rigth-shifter nad left-shifter, and what have you start working behing the
scenes.  If, after the last change ot the stacks, you put a long series of
nops, eventually all of results tabilize behid the scenes in their
respective result registers.  At some point you "execute" an ALU
"instruction," whose efect is to mere _latch_ one of the resut registers
to TOS (the "opcode" of the instrucion serves as index for the desired
result register).  At which point the stacks change again and the stroy
repeats. 

Now, the statement is the following:  10ns (on P21, 5ns on F21) are enough
for most resuts to stablilize.  Taht is, the time decoding the instruction
is enought for the results to be ready, and you don't need additional nops
-- you can latch the results immediately after the instruction is decoded.

One of the reasons for the positive logic on even bits and negative on
uneven could be exatly this -- a shift would require two steps to complete
for logic of the same type (all positive or all negative), so that this
could meant that you would need aditional nops for shifts also.  [The real
reason is that you would need twice as many transistors -- die area,
energy -- for the same polarity logic.]

The statement about the + is the following: carry propagates at a rate of
8 (?) places per 10ns (5ns for F21), so if it has to travel more than that
you need additional time _before_ the latch.  This can be achieved either
with nops _before_ the latch, or with "nops" in the form of memory access
delays _before_ the latch from the first slot of a new instruction. [This
assumes DRAM timing with sufficient delays, SRAM on F21 would be too fast
again.]

Am I clear this time?

--
Penio Penev <Penev@pisa.Rockefeller.edu> 1-212-327-7423