The material covered in this section applies to all four parallel processing models for Linux.
I am primarily known as a compiler researcher, so I'd like to be able to say that there are lots of really great compilers automatically generating efficient parallel code for Linux systems. Unfortunately, the truth is that it is hard to beat the performance obtained by expressing your parallel program using various explicit communication and other parallel operations within C code that is compiled by GCC.
The following language/compiler projects represent some of the best efforts toward producing reasonably efficient code from high-level languages. Generally, each is reasonably effective for the kinds of programming tasks it targets, but none is the powerful general-purpose language and compiler system that will make you forever stop writing C programs to compile with GCC... which is fine. Use these languages and compilers as they were intended, and you'll be rewarded with shorter development times, easier debugging and maintenance, etc.
There are plenty of languages and compilers beyond those listed here (in alphabetical order). A list of freely available compilers (most of which have nothing to do with Linux parallel processing) is at http://www.idiom.com/free-compilers/.
At least in the scientific computing community, there will always be Fortran. Of course, now Fortran doesn't mean the same thing it did in the 1966 ANSI standard. Basically, Fortran 66 was pretty simple stuff. Fortran 77 added tons of features, the most noticeable of which were the improved support for character data and the change of
DO loop semantics. PCF (Parallel Computing Forum) Fortran attempted to add a variety of parallel processing support features to 77. Fortran 90 is a fully-featured modern language, essentially adding C++-like object-oriented programming features and parallel array syntax to the 77 language. HPF (High-Performance Fortran, http://www.crpc.rice.edu/HPFF/home.html), which has itself gone through two versions (HPF-1 and HPF-2), is essentially the enhanced, standardized, version of what many of us used to know as CM Fortran, MasPar Fortran, or Fortran D; it extends Fortran 90 with a variety of parallel processing enhancements, largely focussed on specifying data layouts. Finally, Fortran 95 represents a relatively minor enhancement and refinement of 90.
What works with C generally can also work with
g77 (a nice Linux-specific overview is at http://linux.uni-regensburg.de/psi_linux/gcc/html_g77/g77_91.html), or the commercial Fortran 90/95 products from http://extweb.nag.co.uk/nagware/NCNJNKNM.html. This is because all of these compilers eventually come down to the same code-generation used in the back-end of GCC.
Commercial Fortran parallelizers that can generate code for SMPs are available from http://www.kai.com/ and http://www.psrv.com/vast/vast_parallel.html. It is not clear if these compilers will work for SMP Linux, but it should be possible given that the standard POSIX threads (i.e., LinuxThreads) work under SMP Linux.
The Portland Group, http://www.pgroup.com/, has commercial parallelizing HPF Fortran (and C, C++) compilers that generate code for SMP Linux; they also have a version targeting clusters using MPI or PVM. FORGE/spf/xHPF products at http://www.apri.com/ might also be useful for SMPs or clusters.
Freely available parallelizing Fortrans that might be made to work with parallel Linux systems include:
I'm sure that I have omitted many potentially useful compilers for various dialects of Fortran, but there are so many that it is difficult to keep track. In the future, I would prefer to list only those compilers known to work with Linux. Please email comments and/or corrections to email@example.com.
GLU (Granular Lucid) is a very high-level programming system based on a hybrid programming model that combines intensional (Lucid) and imperative models. It supports both PVM and TCP sockets. Does it run under Linux? More information is available at http://www.csl.sri.com/GLU.html.
Jade is a parallel programming language that extends C to exploit coarse-grain concurrency in sequential, imperative programs. It assumes a distributed shared memory model, which is implemented by SAM for workstation clusters using PVM. More information is available at http://suif.stanford.edu/~scales/sam.html.
Mentat is an object-oriented parallel processing system that works with workstation clusters and has been ported to Linux. Mentat Programming Language (MPL) is an object-oriented programming language based on C++. The Mentat run-time system uses something vaguely resembling non-blocking remote procedure calls. More information is available at http://www.cs.virginia.edu/~mentat/.
Legion http://www.cs.virginia.edu/~legion/ is built on top on Mentat, providing the appearance of a single virtual machine across wide-area networked machines.
Not to be confussed with Mentat's MPL, this language was originally developed as the native parallel C dialect for the MasPar SIMD supercomputers. Well, MasPar isn't really in that business any more (they are now NeoVista Solutions, http://www.neovista.com, a data mining company), but their MPL compiler was built using GCC, so it is still freely available. In a joint effort between the University of Alabama at Huntsville and Purdue University, MasPar's MPL has been retargeted to generate C code with AFAPI calls (see section 3.6), and thus runs on both Linux SMPs and clusters. The compiler is, however, somewhat buggy... see http://www.math.luc.edu/~laufer/mspls/papers/cohen.ps.
Myrias is a company selling a software product called PAMS (Parallel Application Management System). PAMS provides very simple directives for virtual shared memory parallel processing. Networks of Linux machines are not yet supported. See http://www.myrias.com/ for more information.
Parallaxis-III is a structured programming language that extends Modula-2 with "virtual processors and connections" for data parallelism (a SIMD model). The Parallaxis software comprises compilers for sequential and parallel computer systems, a debugger (extensions to the gdb and xgbd debugger), and a large variety of sample algorithms from different areas, especially image processing. This runs on sequential Linux systems... an old version supported various parallel targets, and the new version also will (e.g., targeting a PVM cluster). More information is available at http://www.informatik.uni-stuttgart.de/ipvr/bv/p3/p3.html.
pC++/Sage++ is a language extension to C++ that permits data-parallel style operations using "collections of objects" from some base "element" class. It is a preprocessor generating C++ code that can run under PVM. Does it run under Linux? More information is available at http://www.extreme.indiana.edu/sage/.
SR (Synchronizing Resources) is a concurrent programming language in which resources encapsulate processes and the variables they share; operations provide the primary mechanism for process interaction. SR provides a novel integration of the mechanisms for invoking and servicing operations. Consequently, all of local and remote procedure call, rendezvous, message passing, dynamic process creation, multicast, and semaphores are supported. SR also supports shared global variables and operations.
It has been ported to Linux, but it isn't clear what parallelism it can execute with. More information is available at http://www.cs.arizona.edu/sr/www/index.html.
ZPL is an array-based programming language intended to support engineering and scientific applications. It generates calls to a simple message-passing interface called IronMan, and the few functions which constitute this interface can be easily implemented using nearly any message-passing system. However, it is primarily targeted to PVM and MPI on workstation clusters, and Linux is supported. More information is available at http://www.cs.washington.edu/research/projects/orca3/zpl/www/.
There are a lot of people who spend a lot of time benchmarking particular motherboards, network cards, etc., trying to determine which is the best. The problem with that approach is that by the time you've been able to benchmark something, it is no longer the best available; it even may have been taken off the market and replaced by a revised model with entirely different properties.
Buying PC hardware is like buying orange juice. Usually, it is made with pretty good stuff no matter what company name is on the label. Few people know, or care, where the components (or orange juice concentrate) came from. That said, there are some hardware differences that you should pay attention to. My advice is simply that you be aware of what you can expect from the hardware under Linux, and then focus your attention on getting rapid delivery, a good price, and a reasonable policy for returns.
An excellent overview of the different PC processors is given in http://www.pcguide.com/ref/cpu/fam/; in fact, the whole WWW site http://www.pcguide.com/ is full of good technical overviews of PC hardware. It is also useful to know a bit about performance of specific hardware configurations, and the Linux Benchmarking HOWTO http://sunsite.unc.edu/LDP/HOWTO/Benchmarking-HOWTO.html is a good place to start.
The Intel IA32 processors have many special registers that can be used to measure the performance of a running system in exquisite detail. Intel VTune, http://developer.intel.com/design/perftool/vtune/, uses the performance registers extensively in a very complete code-tuning system... that unfortunately doesn't run under Linux. A loadable module device driver, and library routines, for accessing the Pentium performance registers is available from http://www.cs.umd.edu/users/akinlar/driver.html. Keep in mind that these performance registers are different on different IA32 processors; this code works only with Pentium, not with 486, Pentium Pro, Pentium II, K6, etc.
Another comment on performance is appropriate, especially for those of you who want to build big clusters and put them in small spaces. At least some modern processors incorporate thermal sensors and circuits that are used to slow the internal clock rate if operating temperature gets too high (an attempt to reduce heat output and improve reliability). I'm not suggesting that everyone should go buy a peltier device (heat pump) to cool each CPU, but you should be aware that high operating temperature does not just shorten component life - it also can directly reduce system performance. Do not arrange your computers in physical configurations that block airflow, trap heat within confined areas, etc.
Finally, performance isn't just speed, but also reliability and availability. High reliability means that your system almost never crashes, even when components fail... which generally requires special features like redundant power supplies and hot-swap motherboards. That usually isn't cheap. High availability refers to the concept that your system is available for use nearly all the time... the system may crash when components fail, but the system is quickly repaired and rebooted. There is a High-Availability HOWTO that discusses many of the basic issues. However, especially for clusters, high availablity can be achieved simply by having a few spares. I recommend at least one spare, and prefer to have at least one spare for every 16 machines in a large cluster. Discarding faulty hardware and replacing it with a spare can yield both higher availability and lower cost than a maintenance contract.
So, is anybody doing parallel processing using Linux? Yes!
It wasn't very long ago that a lot of people were wondering if the death of many parallel-processing supercomputer companies meant that parallel processing was on its way out. I didn't think it was dead then (see http://dynamo.ecn.purdue.edu/~hankd/Opinions/pardead.html for a fun overview of what I think really happened), and it seems quite clear now that parallel processing is again on the rise. Even Intel, which just recently stopped making parallel supercomputers, is proud of the parallel processing support in things like MMX and the upcoming IA64 EPIC (Explicitly Parallel Instruction Computer).
If you search for "Linux" and "parallel" with your favorite search engine, you'll find quite a few places are involved in parallel processing using Linux. In particular, Linux PC clusters seem to be popping-up everywhere. The appropriateness of Linux, combined with the low cost and high performance of PC hardware, have made parallel processing using Linux a popular approach to supercomputing for both small, budget-constrained, groups and large, well-funded, national research laboratories.
Various projects listed elsewhere in this document maintain lists of "kindred" research sites that have similar parallel Linux configurations. However, at http://yara.ecn.purdue.edu/~pplinux/Sites/, there is a hypertext document intended to provide photographs, descriptions, and contact information for all the various sites using Linux systems for parallel processing. To have information about your site posted there:
There are 14 clusters in the current listing, but we are aware of at least several dozen Linux clusters world-wide. Of course, listing does not imply any endorsement, etc.; our hope is simply to increase awareness, research, and collaboration involving parallel processing using Linux.