home .. forth .. misc mail list archive ..

Re: MeshSP vs. P21



Some comments on...
Some comments on the last post of Cristophe Lavarenne:

The post makes an interesting point.  But, there are some
generalizations and vague terms that I would like to point out.
This is an issue that I have discussed in the past.  I think
it is important to discuss the issue as quantitatively as
possible. 

>the MeshSP is the very antithesis of P21/F21. There are a lot
>of problems merely limited by the memory bandwidth while having
>no or very low data locality, while being highly parallelizable.
>
>In fact, most algorithms can be written in this way, scientific
>and real-world as well. Only a very tiny class of problems is
>intrinsically sequential. These are nonparallizable.

Point:
Memory bandwidth is becoming an increasingly more significant factor
for processor performance, and future CPU enhancements may be wasted
if the memory bandwidth is not improved.

Generalization:
Only a tiny class of problems is intrinsically serial?

How about...
"Only a small class of problems are cost-effectively parallel."

How is a program classified as not intrinsically serial?  
  a) can the algorithm be encoded on a parallel machine      
  b) can I afford to run the algorithm on a parallel machine
 
The second definition is a more concrete means of classifying an
application as parallel or serial.  If it is cost effective for
me to run my application on a parallel machine, the application
is parallel. The cost effectiveness will be determined, in large
part, by the speedup that I can obtain on the parallel machine.

I will use a parallel machine rather than a serial machine when
I can:
a) solve larger problems
b) solve problems faster

and the cost of using the parallel machine is offset by any
gain from the faster solution, or a solution to a larger problem.

Parallelizing applications is an open research problem, and  
a successful parallelization of an important application, or 
an improvement of a previous parallelization is usually worthy
publishing.

The difficulty of developing parallel versions from sequential
applications, and the difficulty of achieving an acceptable
speedup limits many applications from being run on parallel
machines.

A low-cost parallel machine would make more applications 
cost-effectively parallel.  But the performance of this system
(a cluster of F21s perhaps) must be demonstrated.

>Hence, for most problems, even a single F21 is no slower than 
>a Pentium, an Alpha AXP or whatever, if adequately programmed. 
>Even better: using mid-grain (1-2 MByte/node) multinode desktop 
>machine we can have 1-2 orders of magnitude the performance of 
>a big machine (a workstation) at the same hardware price.

Generalization:
An F21 is no slower than a Pentium or an Alpha AXP.

What is the basis for the comparison between the F21 and the AXP?
I can look up the SPEC ratings for the AXP and see how it performs
for integer and FP codes, and I can compare the ratings for the AXP
to the ratings for other processors.  Currently I cannot do this
for the F21.  I was unaware that the F21 is being proposed as a
general purpose processor that will compete with the Alpha AXP and
Pentium.

Generalization:
High-performance parallel systems are built by wiring together
high-performance scalar processors.

Building a parallel machine is not simply a matter of wiring CPUs
together.  Memory latency, synchronization, memory coherency, and
high-speed communication must be supported efficiently.
What features does the F21 provide in this area?

> _Especially_ for scientific number crunching, a F21 cluster is great.
 
Why?  Are there quantitative reasons (SPEC ratings, simulations of F21
clusters demonstrating scalable speedups, architectural features that
will improve its performance for scalar or parallal processing -
features that other architectures do not provide)
or are the reasons qualitative (a belief that modern processors are
wasteful, a belief that the Forth model of computation is the right one).

>Of course, an off shelf Fortran compiler won't run on it.
>Some adaptation on the side of programmer will be needed. 
>Alas, most scientists are nonprogrammers and _very_ 
>conservative. Bad luck.

If it is fast enough, they will use it.

mark