Distributed-memory programming The ``art" of distributed-memory programming If you have not already discovered this, you will probably soon realize that there are significant differences in programming a distributed-memory (DM) machine compared to a conventional machine. In fact, some might say there is a real "art" to DM programming, and a way of thinking that just is not required elsewhere. The primary reason DM machines are more difficult to use is the fact that, not only is the data in memory distributed, but, in general, the programmer is responsible for ensuring that data is in the right spot at the right time , typically by using a message passing library to send and receive data across a network to and from processing nodes in the machine. (A notable exception, of course, is virtual shared-memory machines, such as the KSR, which have operating systems designed to manage distributed data without explicit user control.) This responsibility on the shoulders of the user is far from trivial, particularly considering the fact that data movement across a network doesn't always behave predictably. When data messages are delayed due to backlog on the network, for example, program synchronization becomes an issue, and a given program may not behave deterministically - a characteristic that many programmers have always taken for granted and counted on as an indisputable fact. Where does the "art" come in? Primarily in finding the right way to view an application so that a data distribution which maximizes efficiency comes to the fore. It's likely that with enough effort, virtually any distribution of data across a machine can be made to work. However, if the goal is to have a program that actually runs | |
|