ITERATIVE(3S)ITERATIVE(3S)NAME
DIterative, DIterative_DropTol, DIterative_DropStorage - Parallel sparse
iterative linear system solver
SYNOPSIS
Fortran synopsis:
SUBROUTINE DITERATIVE (n, pointers, indices, values, storage, x,
b, method, precond, maxiters, convtol, iters, finalres)
INTEGER n, storage, method, precond, maxiters, iters
INTEGER pointers(*), indices(*)
DOUBLE PRECISION values(*), x(*), b(*)
DOUBLE PRECISION convtol, finalres
SUBROUTINE DITERATIVE_DROPTOL (DropTolerance)
DOUBLE PRECISION DropTolerance
SUBROUTINE DITERATIVE_DROPSTORAGE (Storage_Multiplier)
DOUBLE PRECISION Storage_Multiplier
C/C++ synopsis:
#include <scsl_sparse.h>
void DIterative (int n, int pointers[], int indices[], double
values[], int storage, double x[], double b[], int method, int
precond, int maxiters, double convtol, int *iters, double *finalres)
void DIterative_DropTol (double drop_tolerance );
void DIterative_DropStorage (double storage_multiplier );
IMPLEMENTATION
These routines are part of the SCSL Scientific Library and can be loaded
using either the -lscs or the -lscs_mp option. The -lscs_mp option
directs the linker to use the multi-processor version of the library.
When linking to SCSL with -lscs or -lscs_mp, the default integer size is
4 bytes (32 bits). Another version of SCSL is available in which integers
are 8 bytes (64 bits). This version allows the user access to larger
memory sizes and helps when porting legacy Cray codes. It can be loaded
by using the -lscs_i8 option or the -lscs_i8_mp option. A program may
use only one of the two versions; 4-byte integer and 8-byte integer
library calls cannot be mixed.
The C and C++ prototypes shown above are appropriate for the 4-byte
integer version of SCSL. When using the 8-byte integer version, the
variables of type int become long long and the <scsl_sparse_i8.h> header
file should be included.
Page 1
ITERATIVE(3S)ITERATIVE(3S)DESCRIPTION
DITERATIVE uses iterative techniques to solve the sparse system of
equations
A x = b.
Four different parallel preconditioned iterative solvers can be chosen:
conjugate gradient (CG) and conjugate residual (CR) for symmetric
systems, and conjugate gradient squared (CGS) and BiCGSTAB, a variant of
CGS with smoother convergence properties, for unsymmetric systems.
Several preconditioning schemes are available: Jacobi, symmetric
successive over-relaxation (SSOR), incomplete LU (ILU) by pattern, also
known as no-fill ILU, and incomplete LU by value, also known as
thresholded ILU. In this release, the incomplete LU preconditioners are
only available for symmetric matrices, specifically, they are incomplete
LDLT (ILDLT) preconditioners. The ILDLT by value currently does not run
in parallel.
Two additional routines control parameters for the LDLT by value
preconditioner:
* DIterative_DropTol() allows the user to set the drop tolerance for
the incomplete factorization.
* DIterative_DropStorage() allows the user to control the amount of
storage used for the incomplete factor.
Sparse Matrix Format
Sparse matrix A must be input to DIterative in Compressed Sparse Column
Storage format (CSC) (also known as Harwell-Boeing format) or Compressed
Sparse Row Storage format (CSR).
The matrix is held in three arrays: pointers, indices, and values. In CSC
format, the indices array contains the row indices of the non-zeros in A.
The values array holds the corresponding non-zero values. The pointers
array contains the index in indices for the first non-zero in each column
of A. Thus, the row indices for the non-zeros in column i can be found
in locations indices[pointers[i]] through indices[pointers[i+1]-1]. The
corresponding values can be found in location values[pointers[i]] through
values[pointers[i+1]-1].
For a symmetric matrix A, the user must input either the lower or upper
triangle of A, but not both. Non-zeroes within a column of A can be
stored in any order.
In the following example, the symmetric matrix:
1.0
0.0 3.0
2.0 0.0 5.0
0.0 4.0 0.0 6.0
Page 2
ITERATIVE(3S)ITERATIVE(3S)
stored in CSC format would be represented in FORTRAN as follows:
INTEGER pointers(5), indices(6), i
DOUBLE PRECISION values(6)
DATA (pointers(i), i = 1, 5) / 1, 3, 5, 6, 7 /
DATA (indices(i), i = 1, 6) / 1, 3, 2, 4, 3, 4 /
DATA (values(i), i = 1, 6) / 1.0, 2.0, 3.0, 4.0, 5.0, 6.0 /
Zero-based indexing is used in C, so the pointers and indices arrays
would instead contain the following:
int pointers[] = {0, 2, 4, 5, 6}
int indices[] = {0, 2, 1, 3, 2, 3}
double values[] = {1.0, 2.0, 3.0, 4.0, 5.0, 6.0}
In CSR format, the indices array contains the column indices of the non-
zeros in A. The values array holds the corresponding non-zero values.
The pointers array contains the index in indices for the first non-zero
in each row of A. Thus, the colunm indices for the non-zeros in row i
can be found in locations indices[pointers[i]] through
indices[pointers[i+1]-1]. The corresponding values can be found in
location values[pointers[i]] through values[pointers[i+1]-1].
Using the same symmetric matrix as in the above example:
1.0
0.0 3.0
2.0 0.0 5.0
0.0 4.0 0.0 6.0
the corresponding CSR format would be represented in FORTRAN as follows:
INTEGER pointers(5), indices(6), i
DOUBLE PRECISION values(6)
DATA (pointers(i), i = 1, 5) / 1, 2, 3, 5, 7 /
DATA (indices(i), i = 1, 6) / 1, 2, 1, 3, 2, 4 /
DATA (values(i), i = 1, 6) / 1.0, 3.0, 2.0, 5.0, 4.0, 6.0 /
Zero-based indexing is used in C, so the pointers and indices arrays
would instead contain the following:
int pointers[] = {0, 1, 2, 4, 6}
int indices[] = {0, 1, 0, 2, 1, 3}
double values[] = {1.0, 3.0, 2.0, 5.0, 4.0, 6.0}
Page 3
ITERATIVE(3S)ITERATIVE(3S)
These routines have the following arguments:
n (input) Integer. The number of rows and columns in the matrix
A. n>=0.
pointers, indices, values
(input) The pointers and indices arrays store the non-zero
structure of sparse input matrix A in Compressed Sparse Column
(CSC) or Compressed Sparse Row (CSR) format.
In CSC format, the pointers array stores n+1 integers, where
pointers[i] gives the index in indices of the first non-zero in
column i of A. The indices array stores the row indices of the
non-zeros in A. The values array stores the non-zero values in
the matrix A.
storage (input) An integer. Specifies if the matrix is stored by
columns or by rows. If storage=0, the matrix is stored by
columns (CSC), if storage=1, the matrix is stored by rows
(CSR).
x (input/output) The initial guess and final solution vector.
b (input) The right-hand-side vector in a DIterative call.
method (input) An integer specifying the iterative method used.
method = 0: conjugate gradient
method = 1: conjugate residual
method = 10: conjugate gradient squared
method = 11: BiCGSTAB
precond (input) An integer specifying the preconditioner used. 0 <=
method <= 3.
precond = 0: use Jacobi preconditioner. This option is not yet
supported for unsymmetric matrices stored in CSC format.
precond = 1: use SSOR preconditioner
precond = 2: use no-fill ILU preconditioner
precond = 3: use thresholded ILU preconditioner
maxiters (input) An integer. The solver terminates once it has performed
maxiters iterations.
Page 4
ITERATIVE(3S)ITERATIVE(3S)
convtol (input) A double precision number. The solver terminates when
the norm of the residual relative to the norm of the right-
hand-side is less than convtol.
iters (output) The number of iterations performed by DIterative.
finalres (output) A double precision number containing the final 2-norm
of the residual.
drop_tolerance
(input) A double precision argument to DIterative_DropTol. In
thresholded factorization, an entry is discarded if it is
smaller than drop_tolerance times the corresponding diagonal
element.
storage_multiplier
(input) A double precision argument to DIterative_DropStorage.
In thresholded factorization, the drop tolerance is
automatically increased if the incomplete factor matrix
contains more than storage_multiplier times the number of non-
zero values in A.
ENVIRONMENT VARIABLES
Environment variables can control various run-time features:
* ITERATIVE_VERBOSE prints messages about steps taken during
factorization.
* ITERATIVE_DUMP prints the matrix into the file "ppcr.mat" in CSC
(Harwell-Boeing) format.
* ITERATIVE_RCM controls matrix reordering. Ordering of the matrix is
NOT done be default. This variable, if defined, must be set to one of
0 no reordering of the matrix is done (equivalent to NOT setting
ITERATIVE_RCM).
1 trimmed down version of Cuthill-McKee with only search for
peripheral nodes and level sets is done.
-1 reverse Cuthill-McKee reordering is performed.
* OMP_NUM_THREADS determines the number of processors that are used by
the iterative solver.
NOTES
These routines are optimized and parallelized for the SGI R8000, R10000,
R12000 and R14000 platforms.
Page 5
ITERATIVE(3S)ITERATIVE(3S)SEE ALSOINTRO_SCSL(3S), INTRO_SOLVERS(3S)
Page 6