home .. forth .. misc mail list archive ..

Re: pipelining


After thinking about carry prediction a bit more, I think I've hit on
the reasonably obvious (decent) implementation.

Consider a hypothetical 32 bit adder.  Here we have two numbers (X and
Y) with bits from the two numbers paired and fed into full adders.
The carry from each full adder feeds into the next full adder up the
line.  It takes 3 "gate delays" for a full adder to generate the
initial carry, and takes 2 "gate delays" for each successive carry.
For this 32 bit adder, you'll get C0..C31.

Now, start adding in carry prediction.  In addition to the full
adders, you want each pair of bits (e.g. X3,Y3) to be fed into an OR
gate.  I'll call these or'ed bits the P bits ('cause they're what the
carry prediction is named off of).  [I'll make an exception for X0 and
Y0.]

Now, pair up the full adders and give each of these pairs a carry
predictor.  Each of these predictors will have three inputs, and its
output will be or'd into the carry from some full adder.  This
requires distinguishing between carry in and carry out on each of the
full adders.  So, I've got Ci1..Ci32 and Co0..Co31.

Next, form groups of four (using the same raw inputs) and use five
input and gates to generate carry predictions.  Next, groups of 8
bits together, groups of 16, then a single group of 32.

Here's a model of the algorithm implemented by the wiring.  

\ random utils
: 3and and and ;
: 5and 3and 3and ;
: 9and 5and 5and ;
: 17and 9and 9and ;
: 3or or or ;
: 4or 3or or ;
: 5or 3or 3or ;
: 6or 5or or ;

\ raw data for carry prediction 
: p1  x1  y1  or ;
: p2  x2  y3  or ;
: p3  x3  y3  or ;
: p4  x4  y4  or ;
: p5  x5  y5  or ;
: p6  x6  y6  or ;
: p7  x7  y7  or ;
: p8  x8  y8  or ;
: p9  x9  y9  or ;
: p10 x10 y10 or ;
: p11 x11 y11 or ;
: p12 x12 y12 or ;
: p13 x13 y13 or ;
: p14 x14 y14 or ;
: p15 x15 y15 or ;
: p16 x16 y16 or ;
: p17 x17 y17 or ;
: p18 x18 y18 or ;
: p19 x19 y19 or ;
: p20 x20 y20 or ;
: p21 x21 y21 or ;
: p22 x22 y22 or ;
: p23 x23 y23 or ;
: p24 x24 y24 or ;
: p25 x25 y25 or ;
: p26 x26 y26 or ;
: p27 x27 y27 or ;
: p28 x28 y28 or ;
: p29 x29 y29 or ;
: p30 x30 y30 or ;
: p31 x31 y31 or ;

\ representation of carry
variable carry1         : ci1  carry1  @ ;
variable carry2         : ci2  carry2  @ ;
variable carry3         : ci3  carry3  @ ;
variable carry4         : ci4  carry4  @ ;
variable carry5         : ci5  carry5  @ ;
variable carry6         : ci6  carry6  @ ;
variable carry7         : ci7  carry7  @ ;
variable carry8         : ci8  carry8  @ ;
variable carry9         : ci9  carry9  @ ;
variable carry10        : ci10 carry10 @ ;
variable carry11        : ci11 carry11 @ ;
variable carry12        : ci12 carry12 @ ;
variable carry13        : ci13 carry13 @ ;
variable carry14        : ci14 carry14 @ ;
variable carry15        : ci15 carry15 @ ;
variable carry16        : ci16 carry16 @ ;
variable carry17        : ci17 carry17 @ ;
variable carry18        : ci18 carry18 @ ;
variable carry19        : ci19 carry19 @ ;
variable carry20        : ci20 carry20 @ ;
variable carry21        : ci21 carry21 @ ;
variable carry22        : ci22 carry22 @ ;
variable carry23        : ci23 carry23 @ ;
variable carry24        : ci24 carry24 @ ;
variable carry25        : ci25 carry25 @ ;
variable carry26        : ci26 carry26 @ ;
variable carry27        : ci27 carry27 @ ;
variable carry28        : ci28 carry28 @ ;
variable carry29        : ci29 carry29 @ ;
variable carry30        : ci30 carry30 @ ;
variable carry31        : ci31 carry31 @ ;
variable carry32        : ci32 carry32 @ ;

\ full adders (use a half adder for first stage)
: z0   x0  y0  -or ;
: co0  x0  y0  and ;

: hz1  x1  y1  -or ;
: hc1  x1  y1  and ;
: z1   hz1  ci1  -or ;
: co1  hz1  y1  and hc1  -or ;

: hz2  x2  y2  -or ;
: hc2  x2  y2  and ;
: z2   hz2  ci2  -or ;
: co2  hz2  y2  and hc2  -or ;

: hz3  x3  y3  -or ;
: hc3  x3  y3  and ;
: z3   hz3  ci3  -or ;
: co3  hz3  y3  and hc3  -or ;

: hz4  x4  y4  -or ;
: hc4  x4  y4  and ;
: z4   hz4  ci4  -or ;
: co4  hz4  y4  and hc4  -or ;

: hz5  x5  y5  -or ;
: hc5  x5  y5  and ;
: z5   hz5  ci5  -or ;
: co5  hz5  y5  and hc5  -or ;

: hz6  x6  y6  -or ;
: hc6  x6  y6  and ;
: z6   hz6  ci6  -or ;
: co6  hz6  y6  and hc6  -or ;

: hz7  x7  y7  -or ;
: hc7  x7  y7  and ;
: z7   hz7  ci7  -or ;
: co7  hz7  y7  and hc7  -or ;

: hz8  x8  y8  -or ;
: hc8  x8  y8  and ;
: z8   hz8  ci8  -or ;
: co8  hz8  y8  and hc8  -or ;

: hz9  x9  y9  -or ;
: hc9  x9  y9  and ;
: z9   hz9  ci9  -or ;
: co9  hz9  y9  and hc9  -or ;

: hz10 x10 y10 -or ;
: hc10 x10 y10 and ;
: z10  hz10 ci10 -or ;
: co10 hz10 y10 and hc10 -or ;

: hz11 x11 y11 -or ;
: hc11 x11 y11 and ;
: z11  hz11 ci11 -or ;
: co11 hz11 y11 and hc11 -or ;

: hz12 x12 y12 -or ;
: hc12 x12 y12 and ;
: z12  hz12 ci12 -or ;
: co12 hz12 y12 and hc12 -or ;

: hz13 x13 y13 -or ;
: hc13 x13 y13 and ;
: z13  hz13 ci13 -or ;
: co13 hz13 y13 and hc13 -or ;

: hz14 x14 y14 -or ;
: hc14 x14 y14 and ;
: z14  hz14 ci14 -or ;
: co14 hz14 y14 and hc14 -or ;

: hz15 x15 y15 -or ;
: hc15 x15 y15 and ;
: z15  hz15 ci15 -or ;
: co15 hz15 y15 and hc15 -or ;

: hz16 x16 y16 -or ;
: hc16 x16 y16 and ;
: z16  hz16 ci16 -or ;
: co16 hz16 y16 and hc16 -or ;

: hz17 x17 y17 -or ;
: hc17 x17 y17 and ;
: z17  hz17 ci17 -or ;
: co17 hz17 y17 and hc17 -or ;

: hz18 x18 y18 -or ;
: hc18 x18 y18 and ;
: z18  hz18 ci18 -or ;
: co18 hz18 y18 and hc18 -or ;

: hz19 x19 y19 -or ;
: hc19 x19 y19 and ;
: z19  hz19 ci19 -or ;
: co19 hz19 y19 and hc19 -or ;

: hz20 x20 y20 -or ;
: hc20 x20 y20 and ;
: z20  hz20 ci20 -or ;
: co20 hz20 y20 and hc20 -or ;

: hz21 x21 y21 -or ;
: hc21 x21 y21 and ;
: z21  hz21 ci21 -or ;
: co21 hz21 y21 and hc21 -or ;

: hz22 x22 y22 -or ;
: hc22 x22 y22 and ;
: z22  hz22 ci22 -or ;
: co22 hz22 y22 and hc22 -or ;

: hz23 x23 y23 -or ;
: hc23 x23 y23 and ;
: z23  hz23 ci23 -or ;
: co23 hz23 y23 and hc23 -or ;

: hz24 x24 y24 -or ;
: hc24 x24 y24 and ;
: z24  hz24 ci24 -or ;
: co24 hz24 y24 and hc24 -or ;

: hz25 x25 y25 -or ;
: hc25 x25 y25 and ;
: z25  hz25 ci25 -or ;
: co25 hz25 y25 and hc25 -or ;

: hz26 x26 y26 -or ;
: hc26 x26 y26 and ;
: z26  hz26 ci26 -or ;
: co26 hz26 y26 and hc26 -or ;

: hz27 x27 y27 -or ;
: hc27 x27 y27 and ;
: z27  hz27 ci27 -or ;
: co27 hz27 y27 and hc27 -or ;

: hz28 x28 y28 -or ;
: hc28 x28 y28 and ;
: z28  hz28 ci28 -or ;
: co28 hz28 y28 and hc28 -or ;

: hz29 x29 y29 -or ;
: hc29 x29 y29 and ;
: z29  hz29 ci29 -or ;
: co29 hz29 y29 and hc29 -or ;

: hz30 x30 y30 -or ;
: hc30 x30 y30 and ;
: z30  hz30 ci30 -or ;
: co30 hz30 y30 and hc30 -or ;

: hz31 x31 y31 -or ;
: hc31 x31 y31 and ;
: z31  hz31 ci31 -or ;
: co31 hz31 y31 and hc31 -or ;

\ calculation of carry
\ predictor elements

\ groups of two
: p0..1     x0  y0  p1  3and ;
: p2..3    ci2  p2  p3  3and ;
: p4..5    ci4  p4  p5  3and ;
: p6..7    ci6  p6  p7  3and ;
: p8..9    ci8  p8  p9  3and ;
: p10..11  ci10 p10 p11 3and ;
: p12..13  ci12 p12 p13 3and ;
: p14..15  ci14 p14 p15 3and ;
: p16..17  ci16 p16 p17 3and ;
: p18..19  ci18 p18 p19 3and ;
: p20..21  ci20 p20 p21 3and ;
: p22..23  ci22 p22 p23 3and ;
: p24..25  ci24 p24 p25 3and ;
: p26..27  ci26 p26 p27 3and ;
: p28..29  ci28 p28 p29 3and ;
: p30..31  ci30 p30 p31 3and ;

\ groups of four
: p0..3     x0  y0  p1  p2  p3  5and ;
: p4..7    ci4  p4  p5  p6  p7  5and ;
: p8..11   ci8  p8  p9  p10 p11 5and ;
: p12..15  ci12 p12 p13 p14 p15 5and ;
: p16..19  ci16 p16 p17 p18 p19 5and ;
: p20..23  ci20 p20 p21 p22 p23 5and ;
: p24..27  ci24 p24 p25 p26 p27 5and ;
: p28..31  ci28 p28 p29 p30 p31 5and ;

\ groups of eight
: p0..7     x0  y0  p1  p2  p3  p4  p5  p6  p7  9and ;
: p8..15   ci8  p8  p9  p10 p11 p12 p13 p14 p15 9and ;
: p16..23  ci16 p16 p17 p18 p19 p20 p21 p22 p23 9and ;
: p24..31  ci24 p24 p25 p26 p27 p28 p29 p30 p31 9and ;

\ groups of 16
: p0..15    x0  y0  p1  p2  p3  p4  p5  p6  p7  
                p8  p9  p10 p11 p12 p13 p14 p15 17and ;
: p16..31  ci16 p16 p17 p18 p19 p20 p21 p22 p23 
                p24 p25 p26 p27 p28 p29 p30 p31 17and ;

\ group of 32
: p0..31    x0  y0  p1  p2  p3  p4  p5  p6  p7  
                p8  p9  p10 p11 p12 p13 p14 p15
                p16 p17 p18 p19 p20 p21 p22 p23 
                p24 p25 p26 p27 p28 p29 p30 p31 33and ;

\\ finally, carry generation
: c1  co0    carry1  ! ;
: c3  co2    carry3  ! ;
: c5  co4    carry5  ! ;
: c7  co6    carry7  ! ;
: c9  co8    carry9  ! ;
: c11 co10   carry11 ! ;
: c13 co12   carry13 ! ;
: c15 co14   carry15 ! ;
: c17 co16   carry17 ! ;
: c19 co18   carry19 ! ;
: c21 co20   carry21 ! ;
: c23 co22   carry23 ! ;
: c25 co24   carry25 ! ;
: c27 co26   carry27 ! ;
: c29 co28   carry29 ! ;
: c31 co30   carry31 ! ;

: c2  co1  p0..1   or   carry2  ! ;
: c6  co5  p4..5   or   carry6  ! ;
: c10 co9  p8..9   or   carry10 ! ;
: c14 co13 p12..13 or   carry14 ! ;
: c18 co17 p16..17 or   carry18 ! ;
: c22 co21 p20..21 or   carry22 ! ;
: c26 co15 p24..15 or   carry26 ! ;
: c30 co29 p28..29 or   carry30 ! ;

: c4  co3  p2..3   p0..3   3or   carry4  ! ;
: c12 co11 p10..11 p8..11  3or   carry12 ! ;
: c20 co19 p18..19 p16..19 3or   carry20 ! ;
: c28 co27 p26..27 p24..27 3or   carry28 ! ;

: c8  co7  p6..7    p4..7   p0..7   4or   carry8  ! ;
: c24 co23 p22..p3  p20..23 p16..23 4or   carry24 !

: c16 co15 p14..15 p12..15 p8..15  p0..15  5or   carry16 ! ;

: c32 co31 p30..31 p28..31 p24..31 p8..31  p0..31  6or   carry32 ! ;

\ --------------------------------------------------------------------

Unfortunately, I've not installed forth on this system (where I'm
writing this letter), so I can't test this very conveniently.
However, I think I've got it right.  [Also, I used editor macros to
build this text, so I'm not expecting too many trivial errors.]

To test it, I'd build something for x0..x31 y0..31, then define a word
that does c0 .. c31 and prints out the resulting carries.

I'll be back on the net in a week.  If no one else has done anything
with this by then, I'll port forth over here and test this.

Talk to you all then.

-- 
Raul D. Miller