INTRO_SCSL(3S)INTRO_SCSL(3S)NAMEINTRO_SCSL - Introduction to Scientific Computing Software Library (SCSL)
routines
IMPLEMENTATION
See individual man pages for implementation details
DESCRIPTION
The SGI Scientific Computing Software Library (SCSL) contains the
following routines:
* Signal processing routines (see INTRO_FFT(3S) introductory man page)
- Fast Fourier Transform (FFT) routines
- Convolution routines
- Correlation routines
* Direct linear equation solvers for real and complex sparse systems
with symmetric non-zero structure, and iterative solvers for real
sparse systems with arbitrary structure (see the INTRO_SOLVERS(3S)
introductory man page)
* 64-bit thread-safe parallel random number generators (see the
SRAND64(3S) man page)
* Vector-vector linear algebra subprograms (see INTRO_BLAS1(3S)
introductory man page)
- Level 1 Basic Linear Algebra Subprograms (Level 1 BLAS)
* Matrix-vector linear algebra subprograms (see INTRO_BLAS2(3S)
introductory man page)
- Level 2 Basic Linear Algebra Subprograms (Level 2 BLAS)
* Matrix-matrix linear algebra subprograms (see INTRO_BLAS3(3S)
introductory man page)
- Level 3 Basic Linear Algebra Subprograms (Level 3 BLAS)
* LAPACK routines (see the INTRO_LAPACK(3S) introductory man page)
The SCSL routines can be loaded by using the -lscs option or the -lscs_mp
option. The -lscs_mp option directs the linker to use the multi-
processor version of the library.
The multi-processor version of SCSL, libscs_mp, is a Shared Memory (SMP)
version that is based on libmp. libmp uses IRIX lightweight processes
(sproc) to implement parallel execution. POSIX threads (pthreads) are
Page 1
INTRO_SCSL(3S)INTRO_SCSL(3S)
incompatible with sproc calls. Pthreads and sproc calls have
fundamentally different characteristics that prevent coexistence, such as
process identity, memory, and parent-child relationships. Therefore, a
program that uses the POSIX threads cannot use the multi-processor
version of SCSL.
When linking to SCSL with -lscs or -lscs_mp, the default integer size is
4 bytes (32 bits). Another version of SCSL is available in which integers
are 8 bytes (64 bits). This version allows the user access to larger
memory sizes and helps when porting legacy Cray codes. It can be loaded
by using the -lscs_i8 option or the -lscs_i8_mp option. A program may use
only one of the two versions; 4-byte integer and 8-byte integer library
calls cannot be mixed.
NOTES
Many of the Scientific Library routines are multitasked or multithreaded.
This means that a program that calls a multitasked routine will run in
parallel mode and take advantage of multiple processors whenever
possible, even if the program has not specifically requested
multitasking. If a significant percentage of time is spent in the
routine, this feature can significantly reduce wall-clock time.
The following lists show the routines that are multitasked. In many
cases, a real variable (single-precision) routine is paired with its
complex variable equivalent.
LAPACK routines are not listed. Most LAPACK routines do not perform
multiprocessing, but almost all LAPACK routines call Level 2 BLAS and
Level 3 BLAS that do multiprocessing.
The following are the multitasked Level 2 BLAS routines:
SGEMV DGEMV CGEMV ZGEMV
SGBMV DGBMV CGBMV ZGBMV
CHEMV ZHEMV
CHBMV ZHBMV
CHPMV ZHPMV
SSPMV DSPMV
STRSV DTRSV CTRSV ZTRSV
The following are the multitasked Level 3 BLAS routines:
SGEMM DGEMM CGEMM ZGEMM
CGEMM3M ZGEMM3M
STRMM DTRMM ZTRMM
STRSM DTRSM CTRSM ZTRSM
CHERK ZHERK
Page 2
INTRO_SCSL(3S)INTRO_SCSL(3S)
The following are the GEMM-based Level 3 BLAS:
SSYMM DSYMM CSYMM ZSYMM
CHEMM ZHEMM
SSYRK DSYRK
CHERK ZHERK
SSYR2K DSYR2K CSYR2K ZSYR2K
CHER2K ZHER2K
All FFT routines are multithreaded for problem sizes in which
parallelization provides a performance benefit. Single one-dimensional
FFTs run in parallel only if the data size exceeds the size of the L2
cache. Convolution and correlation routines having two-dimensional input
sequences are also multithreaded. See INTRO_FFT(3S) for a list of all
signal processing routines.
The direct sparse solver routines perform multithreaded factorizations
and solves of linear systems of equations; the iterative sparse solver is
also parallelized. All solver routines are thread-safe, so they will
operate correctly and use only a single thread if called from a parallel
region of an OpenMP or libmp program.
Multiple-routine Man Pages
The following data types are used in these routines:
* Single precision: Fortran "real" data type, C/C++ "float" data type,
32-bit floating point; these routine names begin with S.
* Single precision complex: Fortran "complex" data type, C/C++
"scsl_complex" data type (defined in <scsl_blas.h>), C++ STL
"complex<float>" data type (defined in <complex.h>), two 32-bit
floating point reals; these routine names begin with C.
* Double precision: Fortran "double precision" data type, C/C++
"double" data type, 64-bit floating point; these routine names begin
with D.
* Double precision complex: Fortran "double complex" data type, C/C++
"scsl_zomplex" data type (defined in <scsl_blas.h>), C++ STL
"complex<double>" data type (defined in <complex.h>), two 64-bit
floating point doubles; these routine names begin with Z.
Often little or no difference exists between these versions, other than
the data types of some inputs and outputs. In this case, the routines
are described on the same man page, and that man page is named after the
real or complex routine.
The man(1) command can find a man page online by either the real,
complex, double precision, or double complex name.
Page 3
INTRO_SCSL(3S)INTRO_SCSL(3S)
The following table describes the naming conventions for these routines:
-------------------------------------------------------------
Single Double
Single Double Precision Precision
Precision Precision Complex Complex
-------------------------------------------------------------
form: Sname Dname Cname Zname
example: SGEMM DGEMM CGEMM ZGEMM
-------------------------------------------------------------
NOTES
SCSL does not currently support reshaped arrays.
SEE ALSO
The introductory man pages for each topic: INTRO_FFT(3S),
INTRO_SOLVERS(3S), INTRO_BLAS(3S), INTRO_BLAS1(3S), INTRO_BLAS2(3S),
INTRO_BLAS3(3S), INTRO_CBLAS(3S), INTRO_LAPACK(3S)
Page 4