home .. forth .. misc mail list archive ..

Re: pipelining


Penio Penev:
   There are three timings involved -- elementary operation (EO) which
   is ~ 300ps on the F21, Internal Clock (IC) ~ 3,600ps = 3.6ns or
   roughly 12 EO.  memory access time = memory setup time (~3ns) and
   waiting for the SRAM (~12ns)

   OR takes 1 EO to produce the result and several EO to latch it to
   TOS, thus executing withing 1 IC.

   8 add-with-carry steps take 8 EOs plus several to latch the result
   to TOS, thus executing in 1 IC. If one waits 1 IC more _before_
   latching the reslut, 12 more EO are available for 12 more adc
   steps, giving the carry time to propagate further.

I screwed up my description of carry prediction in my last letter.
A single big and gate isn't going to cut it, -1 -1 + doesn't involve
much rippling.  It's calculations like -1 1 + that involve lots of
rippling.  For this you want to OR the x and y bits together (to see
if carry would ripple) then AND a sequence of these with the incoming
carry to anticipate the ripple effect.

Using Penio's notation (above), my full adder takes 3 EO, ripple carry
takes 1 or 2 EO (rougly) per bit rippled -- I'll assume 1.  The ripple
predictor takes (say) 1 EO total setup and 1 EO to anticipate a carry
per stage.

So, with a 32 bit design, let's say we have ripple predictors of width
6, for all but the first two bits.  The first two could have have
their own specialized ripple predictor (x0 y0 x1 y1 or 3and).  Here,
worst case propagation time is two adjacent pairs of five bits of
unpredicted ripple.  This takes 10 EOs (plus setup).

So, it should be possible, in the F32, to do + with at most 1 nop.
With a little fiddling, it might be possible to even get rid of that
nop.

I *think* I have it right this time.  I'm going to try out a few
examples to test out this concept.

[p.s. I'll probably be off the net june 4-june 8]

-- 
Raul D. Miller