DATE: |
2006-05-22 13:30 - 15:00 |
PLACE: |
KEK Computing Center Meeting Room 1 |
TITLE: |
Personal Lessons from Parallel Computing |
CONTACT: |
Takashi Sasaki |
LANGUAGE: |
English |
ABSTRACT: |
One of the lessons of the last two decades of parallel computing has been
the immense influence of commodity components. This often seems to create
a world turned upside down: a world in which technologically superior
computers sometimes fail in the marketplace, while a Sony or Nintendo video
game machine may have more computing power than scientific workstations.
This is a world in which scientists sometimes buy a high end video card for
their workstation, and then run their CPU-intensive computations on the video
boards instead of using the CPU. (Note, for example, Stanford University's
Brook GPU general purpose compiler for the Graphics Processing Unit.)
In one lesson from my own work, I found that the following excerpt of code
from my parallel shared memory program was running too slowly.
Object * Z[n], Y[k];
int X[n];
for (int i=0; i < n; i++) Z[i] = Y[X[i]];
This led my student and me to invent a new and faster algorithm for the above
problem, which runs twice as fast as the code above. The key observation
leading to the new algorithm was that the nature of commodity components had
changed. For this program, faster CPUs provide no advantage. The running
time: the above program is limited purely by the speed of RAM.
In a second lesson from my work, I wished to use a cluster to execute
a certain orbit computation from computational algebra, for a mathematical
group called the Baby Monster group. Prior algorithms already required many
thousands of CPU hours to enumerate the 173,067,389 points of an orbit of
the J4 group. The orbit of the Baby Monster group is almost a hundred times
as large: 13,571,955,000 points. Yet, my student and I enumerated this larger
orbit in only 750 CPU hours. The solution was to use disk instead of RAM
for the primary data structures. The key observation was that the problem
allowed for a space-time tradeoff. By using the larger space provided
by disks, we could correspondingly lower the time requirements.
As time permits, I will then discuss my own favorite parallel software
architecture, task-oriented parallelism, and its large benefits for certain
types of applications. |
|