home .. forth .. misc mail list archive ..

Re: MeshSP vs. P21



First I am sorry for the confusing way I began my previous post.  It appeared as
though I had misattributed what had been said to Christophe Lavarenne.  I should
have been more precise.

Second, I seem to be more of a critic than a contributor.  Maybe that is what I
do best.  It certainly is easier.  But I do think that it is worthwhile, if not
for others, at least to have things clarified for me.

Finally, my comments on what Jeff Fox has written.
> == Jeff Fox
>> == Me


>>How about...
>>"Only a small class of problems are cost-effectively parallel."
                                 ^^^
>How about "have been" instead of "are"?  Parallel machines are
>historically expensive so that has limited them to only a very small
>range of things that our tax dollars are not paying for.  Just
>because the only parallel machines have been "grand challenge"
>type state of the art national debt sort of platforms doesn't
>really mean that that kind of expense is inherent in parallel
>approaches.

My take on cost-effective parallel computing.
An n-processor parallel processor will cost n times as much as a serial
machine.

There are many factors which make this untrue.  An n-processor parallel machine
will not have n of everything (n monitors, n disks).  An n-processor parallel 
machine may also not have as much RAM per CPU, and memory for a 16 Meg PC will cost
more than the CPU.

However, there are other overheads of parallel computers that offset this reduction
in cost.  Fast parallel computers require efficient means of communication, latency
and bandwidth demands are increased with respect to the serial computer.

Given a linear, or near-linear cost-up for a parallel computer, I would seek a
linear, or near-linear speedup for my parallel applications (unless I were a 
special customer for whom speed at almost any cost was acceptable, or I had a
problem that was too large to solve on a serial machine).  
 
My cynicism about the wonders of parallel computing is that linear, or near-linear
speedups are difficult to achieve, and become increasingly more difficult to 
achieve as the number of processors is increased.  Communication between processors
is more inefficient than serial processing, and improving the communication is
expensive.

The weakness in my argument is my assumption of a near-linear cost-up for parallel
computers in the future (while we have had superlinear cost-up in the past and
present).

>I see this as a consequence of the fact that the parallel machines
>were designed for grand challange problems.  It is expensive to
>port and run on these machines and that is what limits the range
>of applications to those where money is no object.  I wince when
>I hear things like "our entry level machine with two processors
>is really inexpensive, only $500,000.00!"

This addresses another issue, and perhaps the primary issue on the cost of
parallel computing.  The software cost is much more expensive than the hardware
cost.  A machine that is more expensive than a cluster of F21s, but which supports
F90 and HPF and PVM may be a less expensive solution for a parallel processing
customer when the costs of porting applications to a new system are considered.

>This I would agree with.  This is what currently one of the things
>that distinguishes F21 from the other chips Chuck is working on.
>It is designed for SMP.

What features have been added to support SMP?   or,
What features have been added to support efficient communication between CPUs?

>>Generalization:
>>High-performance parallel systems are built by wiring together
>>high-performance scalar processors.

>This has been the trend.  It has been shown that just by running
>software on their already in place networks of workstations that
>institutions can get an order of magnitude better price performance
>on their large parallel applications than they can on their big
>super computers.  It has been generally accepted that many problems
>can be solved quite well on systems that are really nothing more
>than workstations wired together.  Sure some grand challange
>computing projects really need those hundreds of megabytes of
>shared memory on the big machines, but hey we have a rather big
>federal budget deficit already guys!

For structured floating point intensive scientific codes, networks of 
workstations cannot approach the performance of vector supercomputers.

Networks of workstations impersonating parallel computers can efficiently
solve problems requiring infrequent commmunication.  These do not make
up the majority of parallel codes.  For parallel computing to become
ubiquitous, it must be useful for applications that require frequent
communication (so all of the traditional serial applications can be
coverted).

>Most super computing today is being done on workstation farms
>connected via eithernet.  There is no special provision for memory
>latency, sychronization, and cache coherency problems in hardware.
>This is just done in software.  Many times even big machines like
>Cray machines are also connected on these very slow (not high-speed)
>communication backbones.

I would not call a PVM cluster of workstations a supercomputer.  A
Cray is a supercomputer.  A CM-5 was a supercomputer.

Having no provision for high-bandwidth and low-latency communication
between nodes greatly limits the problems that such a system can solve.
The communication between nodes over ethernet will be measured in 
milliseconds.  Such a high latency will restrict both the size of the
cluster and the set of solvable problems. 

>F21 should provide a better high-speed communication than this.
>In combination with the fact that the interconnect is virtually
>free you get a very low price per node.  Just like a workstation
>farm, but instead of $25,000 to $100,000 per node you can pay
>$25 to $100 per node and get one to two orders of magnitude
>improvement in the interconnect speed!

How is the interconnect free?  Or am I getting what I pay for?
I am assuming that ethernet is the interconnect.
Clusters of workstations are popular because people have workstations
sitting around.  They would have these workstations even if they
were not running PVM, and a lot of the workstations have idle cycles.

The interconnect on this cluster is also free, since the workstations
required ethernet.

How is a cluster of F21s going to compete against something that was
already free.  The cluster won't be any faster since it uses the
same interconnect (ethernet)?

>You also don't have the problem of resolving cache coherency issues
>like you do on a network of Alpha with three levels of cache at
>each node since F21 does not use cache memory.

Does this mean that the F21 cluster won't provide shared memory?  
Shared memory is an important model for many parallel programmers. 
They don't all want to write message passing code.

>>Why?

>For the reasons above. Nodes are cheap and scaled integer math may be
>fast enough and interconnect should be pretty fast.

I don't think there is much of a market for an F21 cluster if it will 
only compete with PVM style clusters.  That is why I have asked about
SMP specific features:
 * cache-coherency
 * high-bandwidth communication
 * elimination of interconnect and memory hot spots
 * low-latency communication 

The memory latency problem for parallel processors is even worse than
for serial processors, but no mention has been made about how this 
problem will be addressed.

>>       Are there quantitative reasons (SPEC ratings, simulations of F21
>>clusters demonstrating scalable speedups,

>No SPEC marks, but of course there have been many simulations.  There
>are people like Penio, Eugene, and Michael who are running parallel
>apps and are familiar with the effects of parameters on performace.
>If you can show similar performance per node and reduce the cost
>per node by factor of 100 or 1000 and increase the interconnect speed
>this is pretty strong quantitative evidence to me.

Yes, that would be strong quantitative evidence, but...

* you have not shown similar performance per node
  (I am assuming that the F21 is being compared to a modern commodity CPU)
  How can this be done without SPECmarks?
  This cannot be done by comparing peak MIPS.
  (I am suggesting that you have done so)
  
* you have not shown how this system will provide low-latency, high-bandwidth
  communication between CPUs.   This is not necessarily a feature of the CPU,
  the interconnect/memory system is equally important.  What type of interconnect
  is beging designed for the F21 cluster? 

While current and past parallel machines are not the only way to design parallel
machines, they do point out problems inherent in parallel computing, and these
problems have not been addressed:

* efficient synchronization methods
* high-bandwith communication
* low-latency communication
* elimination of interconnect and memory hot-spots
* the non-scalability of broadcast networks for parallel machines
* the difficulty of writing large message passing applications

>You can call it a belief, but unless you think that the only problems
>are grand challange problems, or unless you think like Bill Gates that
>your toaster needs to be running MS WINDOWS you would see that for
>many things modern (high end) processors are wasteful.  I think there
>is some pretty strong quantitative evidence for this.  Do you really
>need a 90 Mhz Pentium with 32M of ram to read an i/o device 150 times
>per second?  Modern processors are also defined by what people are
>doing with them, and a lot of that is wasteful.

>This is not a religious argument.  It has no more to do with belief
>than any other opinion.  All opinions must be based on some metaphysical
>belief system.      

My use of a workstation is a waste when there is an acceptable alternative.
Is the F21 an alternative for a workstation or parallel computing?
If so, then it is fair to ask for a quantitative analysis of it to compare
it to what I currently use.  It is also fair to ask how it addresses problems
that I have with my current systems.

It becomes a religious argument when the alternative is promoted as better
with no means of evaluation against my current system.  

>As for the Forth model of computation being the right one, well I
>think there is strong evidence that the Forth model is a good one
>for this architecture.  I am not convinced that it is the only or
>even the best.  I will be happy to be convinced that other approaches
>can produce high quality software as well.  It may be to some extent
>that these chips use will be restricted to the Forth model unless
>people show that you can effectively apply other models to these
>fairly unusual processors.

We would not be communicating with one another if it were not for other
models of computation.  I find it very amusing when C and the design of
CPUs are demeaned on comp.lang.forth.  The simplest way to show value
is by demonstration.  Of course, I hate it when dos lovers use this
argument to show the utility of their systems.  Then I liken its large
use to a virus.

>such concept.  This has always been one of my buttons.  I like to
>point out that the people who tell me that I am a (Forth) religious
>zealot usually also tell me that although we cannot understand
>the inner workings of modern chips and C compilers that we have
>"faith" in these things.  

More important to me is that we can evaluate compilers and architectures
and implementations of architectures.  I brought up 'quantitative' versus
'qualitative' because the MISC chips were being judged to be superior
solely on qualitative terms.

>I was able to get Russ Hersh to update his information about the Forth
>language in the Microcontroller FAQ listed in a number of newsgroups
>on the internet last year.  He even included a section on the MuP21
>and F21 even though they are not really microcontrollers. (just priced
>like them)  But most people on the net who have requested information
>on processors have declined to include MuP21 or F21 in their lists
>of compiled information.  This is not because they simply don't believe
>the data, but it is meaningless to them.  They typically say, "I will
>include information when you supply SPECint, SPECfp, Winmarks, etc,
>until then the information is not useful."  If the only evidence that
>is meaningful is "how fast will it run my MS WINDOWS?" then no one
>may ever even notice F21 or P32 or whatever.  It doesn't matter if
>you can deliver 1000 mips for $1 to most people if it is not politically
>correct most people will not ever hear about it or consider it.

I am a user of workstations.  The SPEC benchmarks are a reasonable 
(not perfect) means for me to judge the performance of a workstation.
It reflects a usage similar to how I may use the workstation.

Others are consumers of software.  They have much money invested directly
(in purchase costs) and indirectly (in training) in Windows.  Spending
$1 for an F21 or $400 for a 486 is minor compared to the cost of the 
rest of the system.

Even if performance were the only metric used to evaluate a system, 
MIPS per dollar is still irrelevant.  What is relevant is execution time
for applications that I want to run.

Heck, there is a product out now that translates binaries from SPARCs to
Alphas.  Workstation are not sold on CPU performance alone.

>If some of the people who consider F21 actually do something and
>we can publish the results then some people may notice.  Most people
>will only recognize a product.  They have no idea that it has an
>embedded computer in it at all, let alone what model or microprocessor
>is making it work.

I cannot talk about the embedded systems market.  I am a consumer in the
workstation/academic market.  I want to see SPEC benchmark ratings if
MISC chips are going to be evaluated in this area.  I also want to hear
about your interconnect, and SMP specific features if MISC chips are going
to be proposed as a parallel system.  Nothing about MISC chips will be 
accepted by a referee for publishing in this area without quantitative
information such as this.

mark