home .. forth .. misc mail list archive ..

rant



Random Rant (P6; Hello, again).

As the result of the last debate ("multitasking being actually
unnecessary") I have grown curious and looked up the number of
tasks (total, including the system ones) running on my home
machine Amiga 2000 (now positively _ancient_, being about 8
years old and still in heavy use (albeit OS-upgraded and
significantly PD-augmented). This is a 6+1 MByte, 7.1 MHz 68000
(gasp!) machine). The result (I was surprised myself): about
50-60 tasks on the average, without any frontend-noticeabe
speedup if most tasks are killed. Each task runs smoothly, no
jerks (as e.g. in Win 3.1) are visible. All this is on a 0.3
-0.6 MIPS weakling. There is absolutely no doubt all these tiny
tasks contributing to the Amiga's hitherto unsurpassed
functionality; in fact I bought the machine end '87 mostly since
it had reentrant multitasking (superior gfx/sound were only
secondary reasons).

Taking the large number (16) of broad (32 bit) registers, the
MC68xxx architecture commands which must be flushed upon context
switch, I very much doubt that flushing all F21 registers
(particularly into SRAM) would eat so very much time. (But still
I claim that zero context switch/hardware stack protection would
be highly valuable for a VM/OS. Too sad it would vastly reduce
runtime performance according to CM.).

While rummaging through my old books I found a quite interesting
graph. It showed process geometries (um; linear scale) and the
number of transistors used (decadic logarithmic scale) vs the
year of production (linear scale, of course ;) of several
Motorola processors. There are nine entries: ranging from
68000-68060. The first time I saw an 68000 mentioned was a 1979
Motorola brochure. The table starts, however, with 68000 (1984),
using a 3.0 um process and ends 1992 with a 0.5 um process
(68060 actually arrived several years later). Structure size
goes down asymptotically. The number of transistors rises
exponentially (linearly in logarithmic scale).

Having grabbed iX #9, Sep 1995 I looked up the according Intel
graph: over a period of about 15 years. Starting with 4004
(1971, 2300 transistors (F21 has only about 12k!)) and ending
with the the P6 (4004, 8086, 80286, i386DX, i486DX, Pentium, P6)
seven entries in toto. At least here Intel is a paragon of
precision: each CPU fits _exactly_ a straight line (Motorola is
much sloppier here); provided P6 not taken into account. The
transistor number doubles each 25 Months (2.1 yr). The P6 CPU
contains 5.5 Mtransistors, the L2 cache 15-16 (depends on
source) MTransistors. (Intel obviously pays 6 transistors/cell
(!) instead of 4 here). Sans L2 cache the P6 lies noticeably
below, with cache _significantly_ above the extrapolation point
(thrice as much transistors as anticipated). A CPU of the P6
CPU+L2 complexity on single die wouldn't have arrived till end
1998 accordind to Moore (name is Moore, Gordon Moore. Not Chuck
Moore =).

The P6 CPU die is quadratic, almost three times the size of the
L2 (256 K) cache chip which is elongated (about 1:1.5 aspect
ratio) and shows two domains (to catch 1/2 of random error hits,
probably). Due to its larger structures the CPU appears
rainbow-coloured, acting as a diffraction grating. L2 die looks
uniform yellowish, indicating its lattice periodicity to lie
about 600 nm (0.6 um structures? whatever?), the wavelength of
visible yellow light.

John Wharton writes in Microprocessor Report (V9N9 May 1995, the
iX source) enumerated a lot of (partially nontrivial) reasons
why INTeL did what it did, namely:

- Intelll commands its production capacities/load at will.

- now cache RAM production becomes Inetl's domain, thus
  preying upon cache manufacturers monopoly grounds.

- used design allows easy future expansions.

- Intetl has beaten Moore's law by almost one year (!).

- die complexity has been deliberately reduced for economic
  reasons, cache part being the wiser. (Obviously, a large
  fast L2 cache can beat the primary small XLfast on-chip cache).

- even more motherboard functionality moves into CPU,
  hence a higher sales value (OEMS give more $$s)

- L2 takes only 40 % of CPU price (L2 has more transistors
  but 1/3 area, prices growing 2nd or 3rd potency of chip
  area. A 128 kByte cache (semi-defective L2) would cause
  Intail virtually no costs at all).

- CPU plug-in upgradeability is enhanced because of above
  reasons.

- less flexibility for the system designer, the more
  power for the marketing (uck!) - nivelated end product
  performance in arbitrary environment.

- Intl can fine-tune exact chip makeup on demand - increased
  flexibility being the aftermath. (Even now 128k L2's are
  in planning, 512 k L2's in 0.35 um process will come next
  year).

- As the production costs of an state-of-the-art are negligeable
  in contrast to production tool costs (1991 Intehl has invested 6
  billion $ into 0.5 u (and smaller) production outfits) they must
  now run at full steam to foster profits.

- However the demand, Inteyl can produce 512 k L2 (at much lower
  yield) to held production load constant (Just in case the
  demand is overwhelming: the L2 can be produced by external
  companies).

- Int#?'s SmartDie programme can held burn-in costs of
  double-deckers down.

Notice than most of above reasons are nontechnical. Obviously,
now chip design has progressed into the realm of politics even
more than before. Chip design and production have increased its
congruency with the consumer's twisted demands. Marketing doth
reign supreme - cheers!

Some interesting figures: the author assumes (though he calls it
"a deep, dark mystery") a defect density of 0.6/cm^2. (I deem
this a bit low. Any real-word data?). Then one 8" wafer (64
dies?) yields 14 good P6's or 32 good L2's. Hence iNtEl must
produce 2.3 L2/ CPU dies for the correct ratio.