home .. forth .. misc mail list archive ..

Re: P64 Math


>> 
>>> When you hit 32 bits I think it's worthwhile to use some sort of carry
>>> propagate acceleration. Something like a carry skip adder doesn't cost too
>>> much silicon, and will be *much* faster than a ripple adder for wide words.
>>
>>Can you name numbers? At least a qualitative factor. It still scales
>>linearly to bit length, I presume?
>> 

CLA:		Pure carry lookahead
Pyramid:	Don't ask
Conditional sum:Carry select with 1bit groups, effectively
BCLA(i):	Ripple within blocks of size i, CLA across blocks
RCLA(i):	CLA across blocks of size i, ripple between blocks
SBCLA(i,j):	CLA across blocks of size i, and superblocks of j blocks
ISBCLA(i,j):	SBCLA variant - more parallelism
SRCLA(i,j):	RCLA with superblocks
Carry complete:	Nasty, non-deterministic thing
Serial:		One bit per cycle, single full adder & a latch.

Note that T(n) in the following table ignores fanout restrictions,
propagation delay and lots of other things, so it's only an estimate, and
will be badly wrong for some architectures (eg CLA).


Adder			Time for n-bit add: T(n)

CLA			4
Carry select		6
Pyramid			2*log2(n)+1
Conditional sum		2*log2(n)+2
SRCLA(4,4)		n/8+6
SRCLA(2,8)		n/8+6
SRCLA(4,8)		n/16+6
Carry complete		2*log2(n)+4 (average)
SBCLA			6 * n^(1/3) - 2
BCLA			4 * n^0.5 - 2
1-level Skip		~4 * n^0.5
RCLA(4)			n/2 + 1
RCLA(8)			n/4 + 1
Ripple			2n - 1
Serial			3n

And the numbers:
(  Cost(and) == Cost(or) == 2
   Cost(xor) == 4
   Cost(latch) == 8

   T is in units of one gate-delay  )

For 16 bit words
Adder			T(16)	Cost	Cost:Perf

CLA			4	1264	51
Carry Select		6	440	26
Pyramid			9	342	31
RCLA(4)			10	336	34
RCLA(8)			6	560	34
Conditional-sum		10	698	70
SRCLA(2,4)		10	392	39
ISBCLA(2,4)		10	360	36
Carry-complete		12(avg)	368	44
2-level skip(2,4)	15	240	36
BCLA(4)			14	300	42
SBCLA(2,2)		14	368	52
1-level skip		15	240	36
Ripple			31	224	69
Serial			48	22	11

For 32 bit words
Adder			T(32)	Cost	Cost:Perf

CLA			4	7392	296
Carry Select		6	992	60
SRCLA(2,4)		10	904	90
ISBCLA(2,4)		10	944	94
Pyramid			11	774	85
Conditional-sum		12	1594	191
Carry-complete		14(avg)	736	103
SBCLA(2,2)		18	748	135
RCLA(4)			18	672	121
RCLA(8)			10	700	70
2-level skip(4,2)	20	572	114
BCLA(4)			22	600	132
1-level skip		23	480	110
Ripple			63	448	282
Serial			96	22	21

For 64 bit words
Adder			T(64)	Cost	Cost:Perf

CLA			4	50624	2025
Carry Select		6	2688	161
Pyramid			13	1734	225
SRCLA(4,4)		14	1808	253
SRCLA(4,8)		10	2032	203
Conditional-sum		14	3578	501
Carry-somplete		16(avg)	1472	236
ISBCLA(4,4)		18	1344	242
ISBCLA(4,8)		14	1568	220
SBCLA(4,4)		22	1548	341
2-level skip(4,4)	28	1140	319
1-level skip		29	960	250
RCLA(4)			34	1344	457
RCLA(8)			18	1400	403
BCLA(4)			38	1299	456
BCLA(8)			30	1320	396
Ripple			127	896	1138
Serial			192	22	42

Points to notice.....

The performance values for the CLA, Pyramid & carry-select are completely
wrong - the fan-out & fan-in needed to build a 64bit CLA doesn't bear
thinking about.

The pure CLA and pure pyramid adders are infeasible to build (due to fan-out
problems), but can be used as part of a hybrid adder scheme.

The cost:performance ratio for the serial adder is way, way better than
anything else, though it has a high latency.

Compare areas & times for the ripple adder and the 1-level skip. At all three
word lengths the skip adder is at least twice as fast as the ripple, and only
marginally bigger. The skip adder begins to look like a very good buy.

Take a look at `Computer Arithmetic Systems', Omondi, Prentice-Hall
ISBN 0-13-334301-4 for more details about this sort of stuff.



--
-- Steve Atkins -- <atkinss@inmos.co.uk> -- +44 454 611439 --

<Squeak!> <Squeak!> <Squeak!>....... <Splash!>