Automatic parallelization
- 221 pages
- 8 hours of reading
Distributed-memory multiprocessing systems (DMS) like Intel's hypercubes, the Paragon, Thinking Machine's CM-5, and the Meiko Computing Surface have gained popularity due to their potential to tackle significant challenges in Science and Engineering. These systems are relatively cost-effective and scalable, allowing for a large number of processors. However, programming them presents challenges due to non-uniform memory access, which makes local data access faster than non-local data transfer through message-passing. This necessitates the exploitation of algorithm locality for optimal performance. Effective data management is crucial, aiming to balance computational workload while minimizing delays from waiting for non-local data. When parallelizing code manually, programmers must allocate work and data among processors. A common strategy leverages the regularity found in numerical computations, known as the Single Program Multiple Data (SPMD) model. In this approach, data arrays from the original program are distributed across processors, creating an ownership relation, allowing computations on data items to be executed by the processors that own them.
