home .. forth .. misc mail list archive ..

here 'they' go again


This is interesting, in that their well-funded results may be more cheaply
and more quickly implemented with a MISC processor. The author works for IBM.
===========================================
Newsgroups: comp.arch
Subject: Re: Blue Gene (IBM research project)
Date: 9 Dec 1999 16:22:51 GMT
From: Del Cecchi <cecchi@signa.rchland.ibm.com>

Here is some information about the new project, from an internal web site.

Sorry for the press release tone.  I edited some of the puffery, as denoted
by the .....

del cecchi

......

For this $100 million research initiative, IBM will build a supercomputer
-- nicknamed "Blue Gene" -- 500 times more powerful than today's fastest
supercomputers. Approximately 50 researchers throughout the division will
work on Blue Gene and the protein folding Grand Challenge. 

When completed, the new supercomputer will be capable of more than one
quadrillion operations per second (one petaflop), making it 1,000 times
more powerful than the Deep Blue machine that beat Garry Kasparov, and 2
million times more powerful than today's typical desktop PCs.  ....  Blue
Gene's performance will be made possible by a new approach to computer
architecture -- SMASH, which stands for Simple, Many and Self-Healing.
"Simple" refers to the elemental architecture; "Many" includes Blue Gene's
one million processors working in parallel; and "Self-Healing" means
greater fault-tolerance and system stability. This simplified architecture
will allow for more processors in less space using less power, resulting
in more overall operations. This may be the first major revolution in how
computers are built since the mid-1980's. 

Blue Gene's one million processors will each be capable of one billion
floating point operations per second (1 gigaflop). Thirty-two of these
ultra-fast processors will be placed on a single chip (32 gigaflops). A
compact two-foot by two-foot board containing 64 of these chips will be
capable of 2 teraflops, making it as powerful as the 8000-square foot ASCI
Blue computers. 

Eight of these boards will be placed in 6-foot-high racks (16 teraflops),
and the final 2000 square foot machine will consist of 64 racks linked
together to achieve the one petaflop performance. 

.....

The details on SMASH 

This approach will require researchers to explore significant departures
from traditional computer design, including: 

Embedded memory: The dynamic random access memory (DRAM) will be on the
chips, making the memory much more accessible to the processors and
radically improving access time and bandwidth. Tightly integrating the
logic and memory like this also significantly reduces power requirements. 

Minimalist design: By simplifying the overall architecture and using so
many processors in parallel, Blue Gene will achieve incredibly fast system
performance.  Also, because the entire system will be built by replicating
one chip 32,000 times, the overall project complexity is greatly reduced. 

Multi-threading: Each processor is like a cook preparing eight recipes at
once -- the cook starts one dish mixing, then moves to the next recipe,
and so on. By the time the cook gets the eighth recipe started, the first
is ready for its next step. In computer design, this is called
multi-threading, and a primary goal for computer scientists is to keep the
"cook" as busy as possible. Blue Gene will have one million of these
"cooks," working on eight million concurrent "recipes" or threads. 

High communication bandwidth: Blue Gene will have extremely high
communication bandwidth among its chips, allowing them to exchange massive
amounts of data faster than ever before. With six channels on each chip,
each channel sending data at two gigabytes per second, the total
communication is 300 terabytes per second. In fact, the aggregate
communication bandwidth of Blue Gene is roughly equal to every one of the
six billion people in the world operating four ISDN modems simultaneously.
If you could harness Blue Gene's bandwidth to download the entire contents
of the Internet -- all 100 terabytes -- it would take less than a second. 

Self-Healing: The self-healing aspect of Blue Gene's architecture is one
of the biggest challenges the research team faces. The hardware -- with
its tremendous redundancy in processor and communication paths -- makes
this concept feasible since it provides many routes to access a processor,
and many processors over which to distribute calculations. Self-management
is a huge challenge for a machine of this scale. IBM's software
researchers are exploring advanced technologies for distributed control
and recovery to insure that Blue Gene runs continuously.