home .. forth .. misc mail list archive ..

Re: rambus


Dear MISC readers:

> interesting news announcement from rambus at:
>  http://news.stockmaster.com/display_news.asp?mode=news&doc_id=BW20000616BW2562&ticker=RMBS&UPT=1497
> 
> maybe designing such a rambus interface for the f21 would remove
> much of the memory bottleneck.

Yes more modern memories can provide much higher bandwidth.  I enjoyed
reading the info at the site, chipset descriptions, benchmarks etc.
However it was mostly marketing information with pages of features
listed but nothing on the technical level needed to design an
interface to a particular chipset.  Nothing on the technical 
level where we work.

We had a presentation about RAMBUS at the Parallel Processing
Connection a few years ago by the folks who created it when it
first came out.  I have looked into some of the technical
implementation issues on some of the memory chipsets like
SDRAM and RAMBUS and Chuck has examined them in more detail.

My impresion is that SDRAM would be a better choice for several
reasons.  The chips are available and the cheapest thing these
days.  What I think is more important is that SDRAM is many
many times less complex than RAMBUS from what I can see. They
just are not a good match to a simple processor and a 
reasonable step up.  

Chuck has been working on improving
the chip pinout so that it could mate to SDRAM on a 
single layer board.  (once you get the cpu and so much 
other stuff onto a $1 chip you have to focus on the things
that now become a larger percent of the total cost like
connectors, pcb, clock circuits etc.)

For comparison the old memories we are using were chosen when
the specs said 200 mips.  Intel was getting 64mips out of
those memories with the help of expensive cache and cache
controllers.  We wanted to get several times the throughput
out of the same cheap memories without the expensive interface
chips needed in a PC.  

Speeding up our processor to 500mips internally exposed the
memory bandwidth limitation even further.  The CPU can get
mips out of those same memories than the original 200mip
design but cannot get 500mips because the memory chips
just won't pull it.  With the current memory interface and
DRAM the best we do is about 30Mhz since onpage access can
go as low as 35ns.  The maximum throughput is therefor
only about 120mips in this DRAM and less if you are giving
up memory bandwidth to I/O coprocessors.  250 in the
current SRAM interface.

The current MISC memory interface has no cache but relies
on the property of DRAM to treat a page sort of like cache
and give faster access to onpage memory references than to
offpage.  SDRAM has a similar sort of behavior but have a
more complex hierarchy of variations in timing.  There 
are serval pages sort of cached inside of the SDRAM chips
so that you can be loading instructions from one page
and data from a couple of pages and almost the fastest
access to all of them.  Rather than just onpage and offpage
the memory controller requires keeping track of several 
pages and prediciting the timing needed by the memories
for different access so the interface is signifigantly 
more complex.

But as a result going from a 30Mhz memory with only one
page cached (in the memories) to a 100Mhz or 133Mhz bus
with several fast access pages provides enough bandwidth
that it is no longer the same sort of bottleneck.  With
a similar process we approach the maximum throughput of
the CPU.  However there is no reason to stick with .8u
especially since Mosis shut down their .8u HP process
prototype fab line.

Going to a signifigantly faster process for the MISC
chip would once again expose a memory bandwidth
bottleneck, but it is not much and goes away again
if you switch to double speed SDRAM which may also
be cheap in the future.

The problem that I see with RAMBUS is that it appears to
be about an order of magnitude (or two) more complex than
SDRAM.  I think both Chuck and I dismissed it long ago
as requiring an enourmous degree of complexity to operate.

I was very impressed with the technical and marketing info
at the site about the chips that these memories are designed
to support.  I hadn't realized that Intel was up to 28M
transistors just in the CPU core.  That of course is not enough
to be able to use the high end memories.  You also need substantial
first and second level cache and a fantasticly complex glue chipset
from Intel the 820.  Most of the press releases were really about
the improved performance using the 820 chip along with Pentium III
rather than older Intel chips.

I have always said that it isn't fair to compare an F21 to
a Pentium because you have to compare to a Pentium, bus
controller, interrupt controller, video card, network card,
serial card, parallel port etc. since the Pentium is only
a processor.  (I have also been told that it isn't fair to
make a comparison because the F21 is just so much nicer
to deal with when everything is 1000x less complicated.)

That site made me realize that to make a fair comparison between the 
Pentium III and F21 you have to include not only the Pentium
CPU core transitors but the cache and 820 chip set transistors
as part of the interface.  And of course we would have to also
talk about an F21 implemented in .18u technology to make a 
level playing field.

So a gigahertz Pentium III would be roughly equivalent to
3000 MISC chips.  I am talking about metrics like
manufacture cost, power consumption etc.  If I had a choice
of systems to play with I would pick the one with 3 million
Forth mips over the one with 1 thousand Pentium mips.  But
it is after all pie in the sky unless the MISC chip were
made in large quantity and the most modern process.
(however the system prices would not be 3000/1 since
memory would be needed on each F21 node, that is
the ratio for everything except memory.)

The point is that RAMBUS provides a higher bandwidth at
a very high design cost.  That cost doesn't look like much
if you are talking about 50 million transitors in the
processor and interface but if the processor and interface
is designed to be 3000 times smaller and cheaper it is
going to be pretty hard to get a good match to RAMBUS.
In fact I dont think it could be done.  I suspect that
the RAMBUS interface would be bigger than the present
F21.  Just look at the specs and the size of the busses
inside of the memory controller chips.  They dwarf the
F21 processor.  Find the techncial specs on RAMBUS and
you may be amazed. SDRAM is more complex than what we have
but a fraction of complexity of RAMBUS.  There just isn't
any point in making it bigger and more expensive if the
diminishing returns don't make it worth it.

As for the pie in the sky stuff if you had access to
.18u and prototyping with 28M transistors on the chip
you could split it up as sets of processors and memory
on chip with nothing but I/O pins.  The issue isn't 
whether or not you could get incredible performance,
you could.  The issue is just that prototyping something 
thousands of times bigger will tend to cost thousands of 
times more to do.  The bottom line is that costs are
roughly proportional to transitor count.  Intel spends
billions developing each new chip.  Our intention was
not to compete in that arena.  The idea of our chips
was to make them 10000x cheaper to develop and 1000x
cheaper to manufacture and power etc. so we can't just
copy what they do.
  
Jeff Fox