The spectral element method is often characterized as a coarse-grained
algorithm since heavy computations need to be performed before
inter-element communication is required. An order of
magnitude analysis shows that the speed-up in a typical
two-dimensional application is:
P T_c 1
S= ----------------, and ------- ~ O ( ----------- )
1 + T_c/(K T_o) K T_o sqrt(K) N^2
where P is the number of processors, K the number of spectral
elements within each processor, N the spectral truncation,
To
the computational time, and Tc the computational
time. The above
formula shows that the speed-up deteriorates as the second term in the
denominator increases. This second term decreases quadratically with
the spectral truncation, and as the square root of the number of
elements in the partition. The formula also shows the
distinguishing property of the spectral element method which gives it
its coarse grain character: the communication cost increases only
linearly with the order of the method while its computational cost
increases cubically, yielding a quadratic ratio between the two.
High-order finite difference methods, by contrast, show a
quadratic increase of the communication cost with the order, since the
halo of points needed to be passed between processors increases.
The good scalability characteristics of SEOM are illustrated in the
speed-up and MFLOPS curves shown in Figure 1. The
computations consist of simulating the oceanic circulation in the
NorthAtlantic Basin using a grid with 388 elements in the horizontal
with a 7th order polynomial, and 8 elements in the vertical with a 3rd
order interpolation.
These performance curves were obtained on a CRAY-T3E, after removing
the initialization cost. The peak measure floating point operation
count is 2.7 GFLOPS on 40 processors, reflecting a per-processor speed
of about 67.5 MFLOPS.
|
SEOM's parallelization is based on a static decomposition of the grid which is then distributed to the processors. Due to the irregularity of the grid and the decomposition, we have adopted a message passing implementation that allows precise control over the data decomposition and the overlapping of communications and computations. The explicit message passing approach also frees us from relying on compiler sophistication.
For further information on the parallelization please refer to
the article:
E. N. Curchitser, M. Iskandarani, and D. B. Haidvogel,
``A spectral element solution of the shallow water
equations on multiprocessor computers'',
Journal of Atmospheric and Oceanographic Technology ,
15 No 2, 510--521, 1998.