[NOSC] Chuck Moore website and new Forth chips
- Subject: [NOSC] Chuck Moore website and new Forth chips
- From: Mark Sandford <pagercam@xxxxxxxxx>
- Date: Fri, 6 Jul 2001 19:41:15 -0700 (PDT)
Jeff Fox wrote:
>Mark Sandford wrote:
-- Space/NASA stuff deleted for length considerations
>> I would suggest that this be retargeted somewhat as
25
>> processor seems a little overkill, 16 or 9
(assuming
>> you like squared numbers) seems more reasonable and
>> the SRAM at 4ns (250 MHz before timing margins
>> on-chip), would need to get shared between 25
>> processors.  Assuming they are doing similar things
>> this leaves only an effective 10MHz per processor
>> while they are running at 2400MHz, so unless the
>> application is heavily, heavily inner loops they
will
>> spend a great amount of time twiddling their thumbs
>> awaiting their turn on the bus.  
>
>Of course.  The same thing applies to workstation
farms.
>All problems have a balance between node processing
>and node communication.  The design was not created
>for problems that are essential serial and are
>limited by communication bandwidth or serial
processing.
>
>Instead this design is for computationally intense
>problems that can use 60,000 MIPS per $1 cluster chip
>and not for software or problems that would limit
>it to 250MIPS.  A single X18 is capable of 2400MIPS
>so why limit 25 of them to a total of 250MIPS?
>
>The proper model for F21 or 25X is a workstation
>farm, but without the hardware and software overhead
>needed to put C or Unix on each node.  A very small,
>very cheap, Forth workstation farm.
Agreed, but a chip (processor farm), that can't do a 
significant/interesting demo, isn't much of a
technology 
demonstration.  There have been many instances of this
in the
past if you have to wait for the demo and then wait
for
an implementation that does something real people lose
interest so you can say that this chip will only
work in one class of problems but if those problems
aren't of
interest then the whole technology gets dismissed.  
What is described above is the classic problem, and
one that has plagued the CPU industry for years.  This
has become a main mantra of mine, a system isn't
limited
nearly as much by MIPS as by memory bandwidth, and 
as CPU speeds increase at a rate faster than memory
speeds increase this problem grows.  The classic 
case is the Sieve which used to be a speed test
but as processor speeds increased beyond what 
memory could provide the test became useless.  As 
such processor designs now while they have faster
processor clocks every year performance is dominated
by cache size and design.  I understand that part of
the MISC concept is that Machine Forth is that much
smaller and thus faster than traditional Bloatware,
but if the chip can only run very small routines,
code or data must be load and stored and the speed
of the processor is limited by the available
bandwidth.
As you mentioned workstation farms are bandwidth
limited (with fast, wide memory and large caches, with
one, two or four processors), how is a much faster
set of 25 processors supposed to survive?  The
technology
could be proven more effectively with a better memory
bandwidth, bandwidth requirement match.  This can be
addressed with faster, wider external memories, and
more on-chip memory such that the more routines
can be stored on-chip reducing the program load
portion of the memory bandwidth equation.  60,000 MIPS
that can't be used is worthless, 20000 MIPS  (9
processors)
that can be used is worth while.  If there isn't
enough bandwidth or the requirements can't be reduced
the 60,000 MIPS don't have value.
A 36bit chip helps bandwidth, while keeping the size
small
and one chip, and the more on-chip helps reduce
requirements
buy having more on-chip code.  My suggestions are
aimed at making
the demonstration chip more viable.  Are there really
any 60,000
MIPs applications that run in 384 words and require
less than
250Mwords of data bandwidth? I can't think of any, and
without
a compelling application, no matter how powerful, this
technology will
go nowhere.  We are engineers can often think of many
things
that could be done but as much as we hate to admit it,
if nobody
wants or can use what you develop, its nothing more
than a 
paper-weight.
I have a strong belief that the future of processors
will be dominated
by the intelligent RAM concept, where you put the
realitively small
CPU and put it inside the RAM which can then be very
wide 128 or 256 bits
and center the chip on the memory availability which
will be the limiting
factor anyway.  The old if Muhammad won't go to the
mountain bring the mountain
to him concept, it sounds backwards but you need to
overcome your problems
via the simplest route.
>
>> Even running solid
>> multiplies at 125M this still leaves a large margin
>> for data transfers.  So firstly I'd trim down the
>> number of processors and might suggest looking at
>> pairing the processor 
>
>Like P21, F21, i21, and others the X18 design was
>picked to reduce the prototying cost and get a
>chip with pins that fit the prototyping constraints.
>So if someone has their own fab line and is not
>restricted by such constraints and is also not
>concerned with budget constraints the number of
>processors per die is completely variable, from
>1 to thousands.  There is interest in thousands
>of processor per chip.
>
>The width is also variable from 5 bits to whatever.  
>Chuck's designs are in columns so scaling the
>width is mostly trivial.  Chuck said that making
>a P32 from a P21 was about a day's work in OKAD.
>
>But the + and +* instructions timing is proportional 
>to bus width, so those opcodes would be slower with a
>wider bus.  Also the pin count and costs go up.  Pins
>are more expensive than silicon in high volume.  That
>is why a 60,000 MIP 25x can cost about the same thing
>as a 2400 MIP X18.
>
>> with a x36 chip instead of the
>> x18 to get two "18 bit words" per cycle and
>> effectively running the memory at 500MHz x18.
>
>It could be done, and still get 2400MIPS from
>the internal memories.  Larger amounts of
>internal memory could be put on larger more
>expensive chips if prototyping costs are not
>an issue.  But these have not been billion
>dollar type funding projects so far so 
>things have been kept small to make it possible.
>
>> Another area that I might suggest a change is the
>> memory per processor 384 words might be 1K words if
>> the number of processors is trimmed down to 16 or 9
so
>> you would be more likely to run without needing to
>> load or store data as frequently.
>
>True.  Have you a particular application in mind
where
>you have determined that twice as much on chip
>memory is needed?  I spent years doing that sort of
>thing to tweak F21 before it was fabbed.  
>
>I suggest than anyone with a particular idea simulate
>it extensively to be able to tweak the design to
>do what you really find best suited to your needs.
>Chuck is in the custom silicon business.  He can
>make it work in many ways depending on what the
client
>wants.  It is a little like picking items from a
>menu.  Chuck would love to make many custom versions.
>But he would also really like to make a production
>run and get some chips into some product somewhere.
>It is sort of a key element that hasn't happened.
>
>> I might be interested in contributing to such an
>> effort, I would need to know more about Chuck's
>> experience and how likely the first try is likely
to
>> work (Murphy's Law and all).  I bought one of the
>> original P21 chips and I beleive that those didn't
>> function untill the 8th run so this is never a slam
>> dunk especially if 0.18 and TSMC are new to his
>> techniques.
>
>It did take 8 tries to get P21 completely working.
>It had the thermal bug like all conventional chips
>but at only 100Mhz in 1.2u Chuck didn't bump into
>it and didn't find it.  When he scaled down to .8u
>and went to 500Mhz he discovered a bug in the
>transitor model.  There were almost thirty
>prototypes made at iTV and four by UltraTechnology.
>The modeling in OKAD got closer and closer to
>what the fabs actually produced.
-- other Chip history comments deleted for space
It seems a little misleading to say that the
prototyping
cost with Mosis is $14K when it may take 2, 4 or even 
8 tries to get things working.  If it really takes 8
tries
the prototyping cost is $112K and 32 Months, this
doesn't 
sound that attractive.  Chuck's models and thus
experience
have been (as far as I know) at 0.8um and while his
software
may be getting better he will have a whole new set of
issues
to deal with as the geometry gets smaller.  This
transition
has been pretty difficult for the tradition CAD
software 
vendors.  The term deep sub-micron refers to the
probelms
that are seen as geormetries drop below 0.3um and the
gate delays that defined performance historically,
stop
being dominant.  At 0.35um gate delays rule, and wire
delays can be ignored. At 0.25um gate delays and wire
delays are near equal and both must be considered.  At
0.18um wires dominate and gated delays can't be
ignored
but placement and thus wire lengths now become the
detirming 
factor.  As Chuck's transistors are faster and he
isn't playing
the safe must work technology game that the
traditional
EDA firms are he will see these issues in a different
fashion
but still these problems will exist and the nature
will
change with geormeties.  So his software may have
inproved
with Chuck's understanding of the issues but he will
need multiple tries to calibrate his technology when
operating with his new geormtries.
Given the above his best attack maybe to put the 
processor design to the side for a moment and build
a test chip with variuos transistor and gate designs
and use this to calibrate his designs before trying
a new processor on a new techology.  He could try 
various parameters and find either which line up with
his models or tune his models to work with the given
transistors once his models are correct getting a
processor to work should be much easier (Murphy's 
Law still appilies unfortunately).
This said, while I would like to see Chuck succeed,
it doesn't seem like it would be easy find investors
to contribute to a techology that requires significant
tuning through multiple iterations to work.  The MISC
ideas are very powerful and it seems that 
__________________________________________________
Do You Yahoo!?
Get personalized email addresses from Yahoo! Mail
http://personal.mail.yahoo.com/
------------------------
To Unsubscribe from this list, send mail to Mdaemon@xxxxxxxxxxxxxxxxxx with:
unsubscribe NOSC
as the first and only line within the message body
Problems   -   List-Admin@xxxxxxxxxxxxxxxxxx
Main 4th site   -   http://www.