home .. forth .. misc mail list archive ..

networked processors


Dear MISC readers:

sagalore wrote:

> I know there are several topologies that could be used - token ring,
> parallel, star... what do you think should be used for the best 
> performance gain?  What overhead is involved?

It depends on the problem and how it is parallel.  The hardware
and software topology should ideally match the parallelism
in the problem.

Even for a ring topology I describe two different of methods of
operation of the software for different kinds of problems.  One
type of problem is when the same computation is being applied
to a large data set.  The data set is split up so that nodes
get subsets of data and all perform more or less the same operation
and the the results get collected somewhere.  For this kind of
problem it makes sense to use general purpose node addressing
on the ring and arbitrate control so that only one processor
writes to the network at one time and control is passed.

A different class of problems require that a number of stages
of calculations be made to data.  If the logical operation is
that the data is passed down a series of processing stages where
it is different data after each stage of the processing then
the problem has a higher communciation to processing bandwidth
ratio and has a different kind of parallelism than the first
class of problem.  For this class of problem it would be most
efficient to operate the same ring topology in a different
way. In this configuration each node performs a step in the
computation and passes the data on to the next node.  This
allows 1/2 of the processors to be transmitting at any
time.  Packets would just from one processor to the next
rather than around the ring.

For the first class of problem if the number of nodes is N
and the network bandwidth is B then each node will only
see B/N transmission bandwidth on the network but can
provide a high degree of symetric parallelism.  For the
second class of problem each node will see B/2 transmission
bandwidth on the network.  In the first case the total network
bandwidth is still B.  (B/N * N = B) In the second case the
total network bandwidth is NB/2.

Likewise parallelism may be expressed in a way that is most
efficiently matched by a combination of general purpose symetric
parallelism and specialized dedicated processing nodes with
complex multi-dimensional network topologies that closely
match the processing needed.

It depends on the problem.  With F21 one can configure a network
topology and configure software support to get the maximum
possible efficiency for a given problem.  Happily enough most
problems are parallel at many levels making it possible to 
slice them up many different ways to match machines that do
not offer a lot of flexibility in the way network nodes
connect or how they are supported in software.

We have discussed various possible changes to the network
coprocessor for an f21e.  Chuck likes the idea of making
it hardware compatible with Ethernet. I like the idea of
simplifying the internals of the network coprocessor by
making a simpler active message router.  A possible client
likes the idea of multiple ports capable do dealing with
their overwhelming data streams.

The hardware issue is that F21 can broadcast to multiple
nodes in one packet.  But the timing will depend on how
many hops there are between the source and destination
nodes.  (and the length of the wires between nodes.)

In a star one packet can go to N nodes that
are all only one step away from where the broadcast
took place.  However going the other way, N nodes
all doing output to one node will mean that they 
have to take turns.  They will have to take turns on
a ring in general purpose symetric parallel mode of
operation anyway but as I described above that
same ring can provide N/2 increase in total
transmission bandwidth using different software to
provide a different mode of operation of the hardware.

A problem will be able to be sliced up in different
ways.  These will all have slightly different computation
requirements, communcation requirements (both speed and
bandwidth), and computation and communcation overhead.
Similarly different possible network hardware topologies
with different software schemes to drive it will each
have a different balance of these same computation and
communication capabilites.  When the two match up well
you will get the architecture to be as efficient as
possible (depending on your definitions) as the
architecture can be for that problem.

Jeff Fox