Compilation Techniques for PGAS Languages
Kathy Yelick
EECS Department, UC Berkeley and
Computational Research Division, Lawrence Berkeley National Lab

Partitioned global address space (PGAS) languages have emerged as a viable alternative to message passing programming models for large-scale parallel machines and clusters. They also offer an alternative to shared memory programming models (such as threads and OpenMP) and the possibility of a single programming model that will work well across a wide range of shared and distributed memory platforms. Several of these languages, including UPC, CAF and Titanium, are based on a static model of parallelism, which gives programmers direct control over the underlying processor resources. In this talk I will describe some of the analysis and optimizations techniques used in the Berkeley UPC and Titanium compilers, both of which source-to-source translators based on a common runtime system. Both compilers are publicly released and run on most serial, parallel, and cluster platforms. Building on the strong typing of the underlying Java language, the Titanium compiler includes several forms of type-based analyses for both error detection and to enable code transformations. The Berkeley UPC compiler extends the Open64 analysis framework on which it is built to handle the language features of UPC. Both compilers perform communication optimizations to overlap, aggregate, and schedule communication, as well as pointer localization, and other optimizations on parallelism constructs in the languages.

The restricted nature of the static parallelism model in these three languages has advantages in terms of implementation simplicity, analyzability, and performance transparency, but some applications demand a more dynamic execution model, similar to that of Charm++ or the recently developed HPCS languages (X10, Chapel, and Fortress). These languages offer opportunities for expressiveness and suggest new open questions related to compiler and runtime support, especially as machines scale towards a petaflop. I will describe some of experience working with such applications in UPC, and some of the challenges and opportunities that exist for these managed runtime systems in general.