Distributed computing for the KDI project

Until recently, Allard and Trangenstein have used the Cray T3E, pictured on the right, to perform parallel computations. This machine has a custom network but uses commodity Alpha chips for computation. To perform the work we propose to do on the KDI project we will require substantial computational resources. Runs of modest sized two dimensional simulations on machines with several processors can require days to complete, and we will find it necessary to run many such simulations and to analyze them statistically. For this reason, it is necessary for us to have at our disposal a dedicated high performance computing facility.

There are two types of platforms which are reasonable to use - a symmetric multiprocessor (SMP) or a network of workstations (NOW). An SMP is far easier to program and has, at least at the time of this writing, better node performance than most NOWs; however, the cost per node can be high and the machines do not scale past 16 or 32 nodes. On the other hand, a NOW is more difficult to program and the node performance can be disappointing; moreover, until recently, the communication bandwidth between nodes was not sufficient for many applications including those we propose to do. Nonetheless, we believe that the NOW architecture has a very bright future and that a NOW is a better solution for us; this is because gigabit connections between nodes are now available and affordable and because the price-performance ratio of the nodes themselves continues to decline. At this time it is very like we will opt for the Alpha 21264/EV6 architecture for our nodes and connect them with gigabit plus networks network interfaces supported by worm-hole routing switches provided by Myricom.

We have developed software to support distributed computing. The Deferred Execution Tool is designed to handle communication and execution depending on communication in a distributed memory machine in a fashion that is both extremely efficient and which facilitates conversion of of serial codes for certain applications like adaptive mesh refinement and molecular dynamics so that they run on distributed machines. In particular, we have used the Deferred Execution Tool to achieve very satisfactory speedup of an adaptive mesh refinement code running on a Cray T3E; see On the Performance of a Distributed Object Oriented Adaptive Mesh Refinement Code (with John Trangenstein). The fact that the Deferred Execution Tool is written in C++ has allowed us to use the tool on a variety of platforms and applications. Myricom provides GM package which achieves very good performance on its hardware. It is possible, though, that even better performance may be achieved by building on the Trapeze system developed here at Duke. It turns out that the Deferred Execution Tool is compatible with this custom software because, by design, it relies on extremely low level communications primitives.