home .. forth .. nosc mail list archive ..

[NOSC] Chuck Moore website and new Forth chips


Mark Sandford wrote:
> Agreed, but a chip (processor farm), that can't do a
> significant/interesting demo, isn't much of a
> technology demonstration. 

Can't?  I am currios why you say that.

But from what I have seen the demos that people want
to see are ususally moronic and have nothing to do
with what chips are good for.

Compression and decompression of data streams in
realtime is pretty much an open ended problem,
things like protein folding, gene sorting, simulations
and problem modeling, AI, and a lot of other things
that need computing power are not the sort of things
the investors want to see.  They want to see a
dancing baby doing the latest popular dance.  Then
they don't pay for the demo and don't invest anyway.

> There have been many instances of this in the
> past if you have to wait for the demo and then wait
> for an implementation that does something real people 
> lose interest so you can say that this chip will only
> work in one class of problems but if those problems
> aren't of interest then the whole technology gets 
> dismissed.

True.  I think the real problem there is that the only
problem that is of interest to most people is how to
do anything while carrying a 99.9% overhead built
into their PC.  They are only concerned with how to
get a PC to do much of anything while it is hamstrung
with terrible hardware and software overhead for
backwards compatibilty reasons.  Most people think
that is the only real problem worth addressing, how
to get a few percent increase while carrying the
excess overhead of PC hardware or popular software and
few are even willing to consider starting by simply
removing the overhead and starting with a clean 
slate to get a 1000x improvement.

> What is described above is the classic problem, and
> one that has plagued the CPU industry for years.  This
> has become a main mantra of mine, a system isn't
> limited
> nearly as much by MIPS as by memory bandwidth, and

Very true.  And by the programs being 100 times larger
than they need to be.  The overhead is built into the
systems to create the artificial problem that can
be improved in little steps for marketing purposes.
The easist problems to solve are these sorts of
artificial problems, but they are what drives the
industry.

> as CPU speeds increase at a rate faster than memory
> speeds increase this problem grows.  The classic
> case is the Sieve which used to be a speed test
> but as processor speeds increased beyond what
> memory could provide the test became useless.  As
> such processor designs now while they have faster
> processor clocks every year performance is dominated
> by cache size and design.  I understand that part of
> the MISC concept is that Machine Forth is that much
> smaller and thus faster than traditional Bloatware,
> but if the chip can only run very small routines,
> code or data must be load and stored and the speed
> of the processor is limited by the available
> bandwidth.

Most programs only need a little memory for code.
If you have lots of memory you can run larger programs.

If small programs need megabytes of code then large
programs are not possible.  You kind of have it backwards.
The problem with 99.9% overhead is that it limits the
machines to only trivial problems.  The idea of low
overhead is to be able to sovle serious problems.
Anyone can solve trivial problems, but for marketing
reasons the solutions are bloated up to fill the
machine and require hardware and software upgrades
to even do trivial things.

Look at the requirement that 80386 and 68020 have
been classified as not powerful enought to keep up
with a fast typist. ;-)  I read in c.l.f last year
that it was only recently with >500Mhz 32 bit deeply 
pipelined CPU and sophisiticated optiming native
code compilers that they were able to solve the
same problems that they could solve twenty years
ago with 5Mhz 8 machines running threaded Forth.
To me this says that in twenty years they have
more or less canceled out with hardware and software
the 99.9% overhead that was introduced along the way.

The faster peripherals and larger storage and bigger
displays are the big difference.  The 1000x increase
in processing power is more or less canceled out by
a similar increase in processing overhead.  SUVs
get better milage than they used to also.  The
improvements in the technology are used to cancel
out the introduced overhead to keep profit margins
high and give the consumers the impression that
things are getting better.

> As you mentioned workstation farms are bandwidth
> limited (with fast, wide memory and large caches, with
> one, two or four processors), how is a much faster
> set of 25 processors supposed to survive?  

Sometimes the overhead is such a joke that I can't
believe it doesn't wave a red flag to more people.  I
listened to a lot of presentations at the Parallel
Processing Connection over the years.  When people
would say that they needed X megabytes on each node
for overhead or X gigabytes total overhead to run
a hello world program I always found it simply amazing.

> The technology
> could be proven more effectively with a better memory
> bandwidth, bandwidth requirement match.  This can be
> addressed with faster, wider external memories, and
> more on-chip memory such that the more routines
> can be stored on-chip reducing the program load
> portion of the memory bandwidth equation.  

This is the classic image of parallel processing where
they see node communications as the limiting factor and
thus want the biggest nodes with biggest processor and
biggest caches possible to reduce the level of
parallelism.  But a lot of research over the last
few decades has been into how biological systems can
do so many things so well that these machines can't.
The answer is lots more smaller nodes.

Instead of a single 1000Mhz processor with a huge
cache (that is dwarfed by the size of the software
overhead required) and a huge amount of memory, a
design optimized to carry the markeing introduced
overhead, the same number of transistors can
be 1000x more efficient on problems that are
parallel.

Almost all problems, certainly almost all interesting
problems, are embarrasingling parallel.  The only
problems that are not are the one we artificially
created for ourselves in our antiquated serial
computers with absurd computational overhead.

Humans don't look like Pentiums, they have 2*10^11
processing nodes.  They don't run Unix or Windows.

> 60,000 MIPS that can't be used is worthless, 

If it is considered useless it may never be made.
If people keep repeating that it is useless other
people will keep thinking it is useless.  If none
are ever made the only value will be the educational
value to the few people who study the good ideas
that are there.

Some of the most brilliant people I have met love
the idea of cheap chips with millions of mips.  But
convincing people with money is a more difficult
problem.  Convincing most people seems to simply
be a matter of showing them that it has become
mainstream.  They equate good idea with mainstream
pure and simple.  Followers not leaders.

> that can be used is worth while.  If there isn't
> enough bandwidth or the requirements can't be reduced
> the 60,000 MIPS don't have value.

180 billion bits per second bandwidth between nodes, and
1,200 billion bits per second memory bandwith in a $1
chip makes a $1000 PC look pretty sick.  But you have
to compare 100 25x to a PC to get the picture.  Does
your PC have 18,000 billion bps network and 120,000
billion bps memory bandwidth?  That hasn't stopped it
from being marketed.
 
> A 36bit chip helps bandwidth, while keeping 
> the size small and one chip, and the more on-chip 
> helps reduce requirements buy having more on-chip code.  
> My suggestions are aimed at making
> the demonstration chip more viable.  

Not really. It cuts it by at least a factor of 2.  It
would be useful if the idea is that you have to carry
more overhead on each node.

When I brought the idea of parallel processing to Chuck
more than ten years ago he was slow to embrace it. It
took him time to understand an appreciate the issues.

When he brought his ideas of Forth and MISC designs to
me it took me time to understand and appreciate the
issues.  For instance I just didn't understand it
when he said, "Most programs fit in one K."

I didn't understand because I was picturing programs
with overhead built in for marketing purposes.  After
watching Chuck for years I began to see that with
his approach most programs fit in one K or less.

Programs that other people felt required 10megabytes
became 1K for Chuck.  His VLSI CAD software is only
500 lines of code.  He doesn't need megabytes to do
a hello world program.

> Are there really any ...

Yes.  Most problems, and most programs.  But most
problems are beyond the machines with artificial
self-imposed problems to solve so most people have
never looked at how they could be solved.

> We are engineers can often think of many things
> that could be done but as much as we hate to admit it,
> if nobody wants or can use what you develop, its nothing 
> more than a paper-weight.

That is what the people who hate it, or are threatened
by it, or want to see it fail have keep repeating
for the last decade.  But there have been a few hundred
people who have been influenced by the good ideas and
say it has been a benefit to them.  So even if no chips
get made, the ideas have been recognized as good 
ideas my more people than you might realize.

I am always amazed by the profiles of the people
downloading stuff from my site.  It is popular with
Intel, it is popular with the US Gov, it is popular
with NASA.  And I see the ideas spreading even if
our chips are not being produced by anyone.

But there still are people chanting that it is 
worthless or bad.  It seems that the biggest resistance
are the people who feel threatened by change.  The
mainframe types said all the same things about micros
in the old days.  Worthless toys, not real computers,
nothing more than paperweights that will never amount
to anything but a curriousity.  I have been hearing
that for over thirty years now.
 
> I have a strong belief that the future of processors
> will be dominated by the intelligent RAM concept, 

I like that idea too.  I have wanted to use Chuck's
CAD technology to make cheap content addressable RAM.
But we would like to sell something to get funding first.

> where you put the realitively small
> CPU and put it inside the RAM which can then be very
> wide 128 or 256 bits
> and center the chip on the memory availability which
> will be the limiting
> factor anyway.  The old if Muhammad won't go to the
> mountain bring the mountain
> to him concept, it sounds backwards but you need to
> overcome your problems
> via the simplest route.

The idea of dropping MISC processors into a corner
of conventional memory chips, and being able to access
1000 words in parallel at once has appealed to a 
lot of people.   When iTV had large well funded 
corporate partners in Asia who were manufacturing
the memory chips that we all use those companies
wanted to do some of that.  Then the Asian economies
collapsed and the projects died.

> It seems a little misleading to say that the
> prototyping
> cost with Mosis is $14K when it may take 2, 4 or even
> 8 tries to get things working.  If it really takes 8
> tries the prototyping cost is $112K and 32 Months, this
> doesn't sound that attractive.  Chuck's models and thus
> experience have been (as far as I know) at 0.8um and while 
> his software may be getting better he will have a whole 
> new set of issues to deal with as the geometry gets smaller.  

This is all very true.  But any further work rides on
the work already done and the fab runs that other people,
such as I, have already paid for.  As all the CAD problems
seem to have been solved a few years ago Chuck's optimism
may not be too overly optimizistic and my pessism may be
overly pessimistic.

But what you say isn't quite right regarding the
constraints.  If you can only afford the lowest budgets
then you have a 4 month turn around.  Pay more and get
a 4 day turn around.  If you want the projec to be
completed in 2 months instead of 32 months that is
is really just a budget issue.  Professional paths
are more expensive paths funded on hobby budgets.
Still with mostly hobby budgets we have kept up
with or passed the companies spending billions of
dollars on each round of chip development.

The problem is always that if you say you can do
100 times better on 100 times lower budget you
will be asked to do 1000 times better on a 1,000,000
times lower budget.  Then when you do that they
just say they don't care anyway.

One thing that appealed to ten years ago was that Chuck's
approach solved the big problem that other people are
not struggling with.  Scale.  Chuck's tiled approach
and hand layout, with simulation that takes transistor
size, load, path lenght, and temperature effects being
used to get the tiled design right they scale almost
without effort.  Problem solved.

With a schematic or high level functionality description
and reliance on automated to tools to place and route
they never have any idea what to expect until the last
minute and if they change the scale they have to start
over from scratch.  This the major difference between
Chuck's approach and other people's approach to CAD,
they must have schematic capture and trust in tools
while Chuck doesn't need or want it.  

> This transition has been pretty difficult for the 
> tradition CAD software vendors.  The term deep sub-micron 
> refers to the probelms that are seen as geormetries drop 
> below 0.3um and the gate delays that defined performance 
> historically, stop being dominant.  At 0.35um gate delays 
> rule, and wire delays can be ignored. At 0.25um gate delays 
> and wire delays are near equal and both must be considered.  
> At 0.18um wires dominate and gated delays can't be ignored
> but placement and thus wire lengths now become the
> detirming factor.  

Exactly!  That is why it was the first problem that Chuck
solved ten years ago.

> As Chuck's transistors are faster and he isn't playing
> the safe must work technology game that the traditional
> EDA firms are he will see these issues in a different
> fashion but still these problems will exist and the nature
> will change with geormeties.  So his software may have
> improved with Chuck's understanding of the issues but he will
> need multiple tries to calibrate his technology when
> operating with his new geormtries.

Yes, he still has to make chips and see what happens the
same as everyone else.  But instead of billions per new
chip the costs are much lower.  If you reduce the costs
by a factor of 1000 he can do it 10 times faster.  If
you reduce the funding by 1000000 he can do it about
as fast but it is more work.  And we do get tired
of doing it that way.

> Given the above his best attack maybe to put the
> processor design to the side for a moment and build
> a test chip with variuos transistor and gate designs
> and use this to calibrate his designs before trying
> a new processor on a new techology.  He could try
> various parameters and find either which line up with
> his models or tune his models to work with the given
> transistors once his models are correct getting a
> processor to work should be much easier (Murphy's
> Law still appilies unfortunately).

Yes.  Chuck's doing that was essential to solve the
industry wide thermal bug in the transitor models.
The details were fascinating but proprietary.
 
> This said, while I would like to see Chuck succeed,
> it doesn't seem like it would be easy find investors
> to contribute to a techology that requires significant
> tuning through multiple iterations to work.  The MISC
> ideas are very powerful and it seems that

Very true.  But don't kid yourself that Pentium
or Alpha designs don't require significant tuning or that
some billion dollar efforts don't just get written
off as development costs for designs that didn't work
at all.  They just pick up the pieces and try again.
------------------------

To Unsubscribe from this list, send mail to Mdaemon@xxxxxxxxxxxxxxxxxx with:
unsubscribe NOSC
as the first and only line within the message body
Problems   -   List-Admin@xxxxxxxxxxxxxxxxxx
Main 4th site   -   http://www.