home .. forth .. misc mail list archive ..
RE: F21/P21 "improvements" | memory cost

To: "'misc@xxxxxxxxxxxxxxxxxxxx'" <misc>
Subject: RE: F21/P21 "improvements" | memory cost
From: Jeff Fox <Jfox@xxxxxxxxx>
Date: Mon, 5 Jun 2000 17:38:52 -0700
Dear MISC readers:

>> Chuck has said that on chip memory is generally speaking expensive
>> compared to memory chips for the above reasons.  Figure out what
>> percentage of most big processors are on chip cache and what
>> percentage of the cost it is and you see what he means.
>
>so what are the numbers actually ?  have you figured this out ?

actual numbers for an actual prototype with an actual amount of memory
on chip can be calculated easily.  Projections for various sizes for a
given chip and process can be calculated easily. It isn't easy to give
number for hundreds of different chips but one can figure out what
likely costs were.

>are you basically saying that because a large amount of chip real
>estate on an (expensive) intel processor goes towards cache means that
>this memory is very expensive in absolute terms ?

RAM is expensive to make.  The development costs for modern memory chips
are quite incredible.  After you spend the billions up front if you make
enough of them they become relatively cheap.  Far cheaper than memory
built with processes designed to fab CPU. It is true not only for Intel 
for but for anyone making chips. CPU processes don't make cheap ram.  You 
could spend $1000 for the ram on a CPU that cost you a few dollars on RAM
chips.  The trick is therefor to drop onto a modified version of the RAM
chip not to add $1000 worth of memory to your CPU chip.

>if yes, i would find that this reasoning gets it the wrong way
>around.  fact is: fast processors have to deal with the classic
>von-neumann-bottleneck, which occurs because large memory banks tend
>to be a lot slower than CPU logic.  short of very radical changes,
>there is currently no easy way around this.

You lost me.  First you were talking about the economies of making RAM
with RAM processes and with CPU specific processes.  Then somehow things
suddenly switched to something about bottlenecks and ram.   Of course
you are saying that cheap memories are slower than the fastest CPU.  Of
course.

>this problem is dealt with by various caching strategies.  in other

In machines that use cache.

It is interesting that Phil Koopman notes that the entire program in
a stack machine may be smaller than the amount of cache memory needed
by a RISC machine to execute the same code at a reasonable speed.  In
other words, if you spend hundreds of times as much on expensive techniques
like deep pipelines and specultive execution etc. you will also need to
speed huge amounts on expensive cache to make up for the problems you
introduced by going that direction.

>words, a lot of the transistors (which you might call "bloat") on a
>modern fast CPU chip are devoted to these "logistics" problems of
>shoving instructions and data in and out of the CPU with as little
>delay as possible.  

Yes, this is one of the mail reasons for the x1000 and x100 manufacture
cost, power dissapation, heat generation etc.  It is the main difference
that by simply using tiny instructions and the zero operand architecture
you can run the CPU at some multiple 4, 6, 7 of the memory clock most of
the time (on good code.)

>while this "overhead" is not directly involved in
>the "real" computations that the CPU is supposed to be doing, it is
>nevertheless essential to sustain high bandwidth.  in some sense, the
>whole RISC paradigm of needing streams of a lot of simple instructions
>in place of fewer complex ones has only exacerbated this problem.  

Yes, and then you have pipeline stalls and cache misses that complicate
the timing equations with events that take hundreds of cycles.  That
approach has these problems built it.

>is probably even worse for a MISC approach.

Not exactly.  Most of the problem has gone away.  You still do have to 
slow down to the speed of memory for random access whether you spend
thousands on cache or not.

>the problem that a chip like the F21 has is that it lacks all this
>essential memory-interface-infrastructure, which means that while the
>core CPU might be bloody fast, it chokes instantaneously as soon as
>needs to fetch any kind of data or instruction from memory.  

Not at all. Perhaps you are picturing code written in C.  In well written
Forth code things stay on the stack and things run as fast as say 4x
the memory clock. 

Now the difference between good Machine Forth and bad Forth was that the
good Forth had the inner loop of the application (say jpeg) 100% on chip
so that only instructions are being run in the inner loop and speed 
is 4x the memory bus.  With old slow DRAM only 140mips max, well below
the 500 in the CPU.  In faster SRAM it goes up to 250.  With a faster
memory interface good code could keep close to the 500.

Now I have seen bad code.  Translations of C directly to Forth.  It could
easily be 100x bigger and 1000x slower than the well written Machine Forth
program. If you are thinking of that kind of code then it would be very
slow on a Forth machine. ;-)

>while the 
>folks behind the F21 seem to dismiss the need for the complexity of
>on-chip memory and caching, it seems like the F21 is limited by
>exactly the same problem, and no great alternative solution has been
>demonstrated either.

We dismiss the problems of RISC as not applying to what we are doing.  You
can spend thousands of dollars for on chip cache and it is easy to write
a program that makes it useless.  Write a program that operates like
real world applications not tailored benchmarks.  You can spend all that
on cache and then squeeze artificial benchmarks into cache to have the
illusion that the RISC machine will stay in cache.  

My Pentium II can run a document through the word processor without running
out of memory and it has 128MB.  It has over 40MB of code running when that
happens.  Do you think it is saying in cache with 40MB of code?

What we did was show that we could elimiate 99.9% of the cost and
complexity.
We can run some applications at about the same speed as the machines with
1000x as much invested in the hardware.  Of course we can't run totally
bloated software that needs 1000x just to do anything.

Jeff Fox
does chuck moore ever read this mailing list, and if not, why not ?

-- 
greetings
markus krummenacker
Previous by thread: RE: F21/P21 "improvements" | memory cost
Next by thread: RE: F21/P21 "improvements" | memory cost
Index(es):
- Thread