intro_shmem man page on IRIX

intro_shmem man page on IRIX
Man page or keyword search:
man Server 31559 pages
apropos Keyword Search (all sections)
Output format

INTRO_SHMEM(3)						     INTRO_SHMEM(3)

NAME
     intro_shmem - Introduction to shared memory access routines

DESCRIPTION
     The shared memory access (SHMEM) routines provide low-latency, high-
     bandwidth communication for use in highly parallelized scalable
     programs. The routines in the SHMEM  application programming interface
     (API) provide a programming model for exchanging data between
     cooperating parallel processes. The resulting programs are similar in
     style to Message Passing Interface (MPI) programs. The SHMEM API can
     be used either alone or in combination with MPI routines in the same
     parallel program.

     A SHMEM program is SPMD (single program, multiple data) in style.	The
     SHMEM processes, called processing elements or PEs, all start at the
     same time, and they all run the same program.  Usually the PEs perform
     computation on their own subdomains of the larger problem, and
     periodically communicate with other PEs to exchange information on
     which the next communication phase depends.

     The SHMEM routines minimize the overhead associated with data transfer
     requests, maximize bandwidth, and minimize data latency.  Data latency
     is the period of time that starts when a PE initiates a transfer of
     data and ends when a PE can use the data.

     SHMEM routines support remote data transfer through put operations,
     which transfer data to a different PE, get operations, which transfer
     data from a different PE, and remote pointers, which allow direct
     references to data objects owned by another PE. Other operations
     supported are collective broadcast and reduction, barrier
     synchronization, and atomic memory operations.  An atomic memory
     operation is an atomic read-and-update operation, such as a
     fetch-and-increment, on a remote or local data object.

   SHMEM Routines
      This section lists the significant SHMEM message-passing routines.

     *	PE queries:

	  C/C++ only:	      _num_pes(3I), _my_pe(3I)

	  Fortran only:	      NUM_PES(3I), MY_PE(3I)

     *	Elemental data put routines:

	  C/C++ only:	      shmem_double_p, shmem_float_p, shmem_int_p,
			      shmem_long_p, shmem_short_p

     *	Block data put routines:

	  C/C++ and Fortran:  shmem_put32, shmem_put64, shmem_put128

	  C/C++ only:	      shmem_double_put, shmem_float_put,
			      shmem_int_put, shmem_long_put,
			      shmem_short_put

	  Fortran only:	      shmem_complex_put, shmem_integer_put,
			      shmem_logical_put, shmem_real_put

     *	Elemental data get routines:

	  C/C++ only:	      shmem_double_g, shmem_float_g, shmem_int_g,
			      shmem_long_g, shmem_short_g

     *	Block data get routines:

	  C/C++ and Fortran:  shmem_get32, shmem_get64, shmem_get128

	  C/C++ only:	      shmem_double_get, shmem_float_get,
			      shmem_int_get, shmem_long_get,
			      shmem_short_get

	  Fortran only:	      shmem_complex_get, shmem_integer_get,
			      shmem_logical_get, shmem_real_get

     *	Strided put routines:

	  C/C++ and Fortran:  shmem_iput32, shmem_iput64, shmem_iput128

	  C/C++ only:	      shmem_double_iput, shmem_float_iput,
			      shmem_int_iput, shmem_long_iput,
			      shmem_short_iput

	  Fortran only:	      shmem_complex_iput, shmem_integer_iput,
			      shmem_logical_iput, shmem_real_iput

     *	Strided get routines:

	  C/C++ and Fortran:  shmem_iget32, shmem_iget64, shmem_iget128

	  C/C++ only:	      shmem_double_iget, shmem_float_iget,
			      shmem_int_iget, shmem_long_iget,
			      shmem_short_iget

	  Fortran only:	      shmem_complex_iget, shmem_integer_iget,
			      shmem_logical_iget, shmem_real_iget

     *	Point-to-point synchronization routines:

	  C/C++ only:	      shmem_int_wait, shmem_int_wait_until,
			      shmem_long_wait, shmem_long_wait_until,
			      shmem_longlong_wait,
			      shmem_longlong_wait_until, shmem_short_wait,
			      shmem_short_wait_until

	  Fortran:	      shmem_int4_wait, shmem_int4_wait_until,
			      shmem_int8_wait, shmem_int8_wait_until

     *	Barrier synchronization routines:

	  C/C++ and Fortran:  shmem_barrier_all, shmem_barrier

     *	Atomic memory fetch-and-operate (fetch-op) routines:

	  C/C++ and Fortran:  shmem_swap

     *	Reduction routines:

	  C/C++ only:	      shmem_int_and_to_all, shmem_long_and_to_all,
			      shmem_longlong_and_to_all,
			      shmem_short_and_to_all,
			      shmem_double_max_to_all,
			      shmem_float_max_to_all, shmem_int_max_to_all,
			      shmem_long_max_to_all,
			      shmem_longlong_max_to_all,
			      shmem_short_max_to_all,
			      shmem_double_min_to_all,
			      shmem_float_min_to_all, shmem_int_min_to_all,
			      shmem_long_min_to_all,
			      shmem_longlong_min_to_all,
			      shmem_short_min_to_all,
			      shmem_double_sum_to_all,
			      shmem_float_sum_to_all, shmem_int_sum_to_all,
			      shmem_long_sum_to_all,
			      shmem_longlong_sum_to_all,
			      shmem_short_sum_to_all,
			      shmem_double_prod_to_all,
			      shmem_float_prod_to_all,
			      shmem_int_prod_to_all,
			      shmem_long_prod_to_all,
			      shmem_longlong_prod_to_all,
			      shmem_short_prod_to_all, shmem_int_or_to_all,
			      shmem_long_or_to_all,
			      shmem_longlong_or_to_all,
			      shmem_short_or_to_all, shmem_int_xor_to_all
			      shmem_long_xor_to_all
			      shmem_longlong_xor_to_all
			      shmem_short_xor_to_all,

	  Fortran only:	      shmem_int4_and_to_all, shmem_int8_and_to_all,
			      shmem_real4_max_to_all,
			      shmem_real8_max_to_all,
			      shmem_int4_max_to_all, shmem_int8_max_to_all,
			      shmem_real4_min_to_all,
			      shmem_real8_min_to_all,
			      shmem_int4_min_to_all, shmem_int8_min_to_all,
			      shmem_real4_sum_to_all,
			      shmem_real8_sum_to_all,
			      shmem_int4_sum_to_all, shmem_int8_sum_to_all,
			      shmem_real4_prod_to_all,
			      shmem_real8_prod_to_all,
			      shmem_int4_prod_to_all,
			      shmem_int8_prod_to_all, shmem_int4_or_to_all,
			      shmem_int8_or_to_all, shmem_int4_xor_to_all,
			      shmem_int8_xor_to_all

     *	Broadcast routines:

	  C/C++ and Fortran:  shmem_broadcast32, shmem_broadcast64

     *	Generalized barrier synchronization routine:

	  C/C++ and Fortran:  shmem_barrier

     *	Cache management routines:

	  C/C++ and Fortran:  shmem_udcflush, shmem_udcflush_line

     *	Byte-granularity block put routines:

	  C/C++ and Fortran:  shmem_putmem and shmem_getmem

	  Fortran only:	      shmem_character_put and shmem_character_get

     *	Collect routines:

	  C/C++ and Fortran:  shmem_collect32, shmem_collect64,
			      shmem_fcollect32, shmem_fcollect64

     *	Atomic memory fetch-and-operate (fetch-op) routines:

	  C/C++ only:	      shmem_double_swap, shmem_float_swap,
			      shmem_int_cswap, shmem_int_fadd,
			      shmem_int_finc, shmem_int_swap,
			      shmem_long_cswap, shmem_long_fadd,
			      shmem_long_finc, shmem_long_swap,
			      shmem_longlong_cswap, shmem_longlong_fadd,
			      shmem_longlong_finc, shmem_longlong_swap

	  Fortran only:	      shmem_int4_cswap, shmem_int4_fadd,
			      shmem_int4_finc, shmem_int4_swap,
			      shmem_int8_swap, shmem_real4_swap,
			      shmem_real8_swap, shmem_int8_cswap

     *	Atomic memory operation routines:

	  Fortran only:	      shmem_int4_add, shmem_int4_inc

     *	Remote memory pointer function:

	  C/C++ and Fortran:  shmem_ptr

     *	Reduction routines:

	  C/C++ only:	      shmem_longdouble_max_to_all,
			      shmem_longdouble_min_to_all,
			      shmem_longdouble_prod_to_all,
			      shmem_longdouble_sum_to_all

	  Fortran only:	      shmem_real16_max_to_all,
			      shmem_real16_min_to_all,
			      shmem_real16_prod_to_all,
			      shmem_real16_sum_to_all

     *	Accessibility query routines:

	  C/C++ and Fortran:  shmem_pe_accessible, shmem_addr_accessible

   Symmetric Data Objects
     Consistent with SHMEM's SPMD programming style is the concept of
     symmetric data objects, which are arrays or variables that exist with
     the same size, type, and relative address on all PEs.  Another term
     for symmetric data objects is "remotely accessible data objects." In
     the interface definitions for SHMEM data transfer routines, one or
     more of the parameters are typically required to be symmetric or
     remotely accessible.

     The following kinds of data objects are symmetric:

     *	Fortran data objects in common blocks or with the SAVE attribute.
	These data objects must not be defined in a dynamic shared object
	(DSO).

     *	Non-stack C and C++ variables.	These data objects must not be
	defined in a DSO.

     *	Fortran arrays allocated with shpalloc(3F)

     *	C and C++ data allocated by shmalloc(3C)

   Collective Routines
     Some SHMEM routines, for example, shmem_broadcast(3) and
     shmem_float_sum_to_all(3), are classified as collective routines
     because they distribute work across a set of PEs.	They must be called
     concurrently by all PEs in the active set defined by the PE_start,
     logPE_stride, PE_size argument triplet.  The following man pages
     describe the SHMEM collective routines:

     *	shmem_and(3)

     *	shmem_barrier(3)

     *	shmem_broadcast(3)

     *	shmem_collect(3)

     *	shmem_max(3)

     *	shmem_min(3)

     *	shmem_or(3)

     *	shmem_prod(3)

     *	shmem_sum(3)

     *	shmem_xor(3)

   Using the Symmetric Work Array, pSync
     Multiple pSync arrays are often needed if a particular PE calls a
     SHMEM collective routine twice without intervening barrier
     synchronization.  Problems would occur if some PEs in the active set
     for call 2 arrive at call 2 before processing of call 1 is complete by
     all PEs in the call 1 active set.	You can use shmem_barrier() or
     shmem_barrier_all(3) to perform a barrier synchronization between
     consecutive calls to SHMEM collective routines.

     There are two special cases:

     *	The shmem_barrier(3) routine allows the same pSync array to be used
	on consecutive calls as long as the active PE set does not change.

     *	If the same collective routine is called multiple times with the
	same active set, the calls may alternate between two pSync arrays.
	The SHMEM routines guarantee that a first call is completely
	finished by all PEs by the time processing of a third call begins
	on any PE.

     Because the SHMEM routines restore pSync to its original contents,
     multiple calls that use the same pSync array do not require that pSync
     be reinitialized after the first call.

ENVIRONMENT VARIABLES
     This section describes the variables that specify the environment
     under which your SHMEM programs will run.	On IRIX, these also affect
     the way 64-bit MPI programs will run.  Environment variables have
     predefined values.	 You can change some variables to achieve
     particular performance objectives.

   Barrier Related Environment Variables
     The default behavior of the SHMEM barrier can be modified using the
     following environment variables:

     SMA_BAR_COUNTER (IRIX systems only)
	  Specifies the use of a simple counter barrier algorithm.

	  Default: Enabled for jobs with be PE counts less than 64

     SMA_BAR_DISSEM (IRIX systems only)
	  Specifies the use of the alternate barrier algorithm, the
	  dissemination/butterfly, within the shmem_barrier_all(3)
	  function.  This alternate algorithm provides better performance
	  on jobs with larger PE counts.

	  Default: Enabled for jobs with be PE counts of 64 or higher

     SMA_NO_FETCHOP (IRIX systems only)
	  Disables the use of hardware "fetchops" in the barrier algorithm.

	  Default:  Not enabled

   Symmetric Heap Related Environment Variables
     The default behavior of the symmetric heap	 can be modified using the
     following environment variables:

     SMA_SYMMETRIC_SIZE (Also available on SGI Altix 3000 systems)
	  Specifies the size, in bytes, of the symmetric heap memory per
	  PE.

	  Default:  On IRIX systems 67108864 bytes (64MB) per PE.  On Altix
	  it is the total machine memory divided by the number of
	  processors on the system.

     SMA_SYMMETRIC_SHATTR_OFF (IRIX systems only)
	  Starting with IRIX 6.5.18, the symmetric heap makes use of system
	  V shared memory segments with shared page tables to reduce kernel
	  memory requirements.	Setting this environment variable will
	  disable the use of this feature.

	  Default:  Not enabled for IRIX 6.5.18 and higher

     SMA_SYMMETRIC_PREATTACH (IRIX systems only)
	  Starting with IRIX 6.5.18, the symmetric heap is implemented with
	  system V shared memory segments.  To minimize kernel resource
	  requirements, these segments are normally attached only when
	  necessary.  Setting this environment variable might lead to some
	  improvement in runtime performance at the expense of longer job
	  startup and shutdown times.

	  Default:  Not enabled for IRIX 6.5.18 and higher

     SMA_SYMMETRIC_METHOD (IRIX systems only)
	  Allows for controlling the method used to implement the symmetric
	  heap.	 This environment variable can be set to one of the
	  following values:

			 Value		Action

			 mmap		With this setting, a shared
					/dev/zero mapping is used to
					implement the symmetric heap.  This
					is the default method for IRIX
					6.5.17 and older releases.

			 sysv		With this setting, system V shared
					memory segments with the shared
					page table attribute are used to
					implement the symmetric heap.  This
					is the default method for IRIX
					6.5.18 and higher.

   Static Cross Mapping Related Environment Variables
     The default behavior of the SHMEM static cross mapping procedure can
     be modified using the following environment variables:

     SMA_STATIC_PREATTACH (IRIX systems only)
	  Starting with IRIX 6.5.2, static cross mapping is implemented
	  primarily using system V shared memory segments.  To minimize
	  kernel resource requirements, these segments are normally
	  attached only when necessary.	 Setting this environment variable
	  might lead to some improvement in runtime performance at the
	  expense of significantly increased startup and shutdown times for
	  high PE count jobs.

	  Default:  Not enabled for IRIX 6.5.2 and higher

     SMA_STATIC_SHATTR_OFF (IRIX systems only)
	  Starting with IRIX 6.5.15, system V shared memory segments with
	  shared page tables are used to cross map static memory when
	  possible.  The static section must be significantly larger than
	  32 MBytes in order to use shared page tables.	 Setting this
	  environment variable disables the use of this feature.

	  Default:  Not enabled for IRIX 6.5.15 and higher

     SMA_STATIC_METHOD (IRIX systems only)
	  Allows for controlling the method used to implement the static
	  cross mapping.  This environment variable can be set to one of
	  the following values:

			 Value		Action

			 mmap		With this setting, memory mapped
					files are used to implement the
					static cross mapping.  This is the
					default for early IRIX 6.5
					releases.

			 sysv		With this setting, system V systems
					are used to implement the static
					cross mapping. This is the default
					method for IRIX 6.5.2 and higher.

     SMA_PREATTACH (IRIX systems only)
	  Setting this shell variable is equivalent to setting both
	  SMA_SYMMETRIC_PREATTACH and SMA_STATIC_PREATTACH.

	  Default:  Not enabled for IRIX 6.5.2 and higher

     SMA_SHATTR_OFF (IRIX systems only)
	  Setting this shell variable is equivalent to setting both
	  SMA_SYMMETRIC_SHATTR_OFF and SMA_STATIC_SHATTR_OFF.

	  Default:  Not enabled for IRIX 6.5.2 and higher

   Debugging Related Environment Variables
     Several environment variables are available to assist in debugging
     SHMEM applications:

     SMA_COREFILE (IRIX systems only)
	  Setting this environment variable causes the SHMEM library to
	  generate a corefile if an error is encountered at job startup.

	  Default:  Not enabled

     SMA_DEBUG (Also available on SGI Altix 3000 systems)
	  Prints out copious data at job startup and during job execution
	  about SHMEM internal operations.

	  Default:  Not enabled

     SMA_DBX (IRIX systems only)
	  Specifies the PE number to be debugged.  If you set SMA_DBX to n,
	  PE n prints a message during program startup, describing how to
	  attach to it with the DBX debugger.  PE n sleeps for seven
	  seconds.  If you set SMA_DBX to n,s, PE n will sleep for s
	  seconds.

	  Default:  Not enabled

     SMA_INFO (Also available on SGI Altix 3000 systems)
	  Prints information about environment variables that can control
	  libsma execution.

	  Default:  Not enabled

     SMA_MALLOC_DEBUG
	  Activates debug checking of the symmetric heap.  With this
	  variable set, the symmetric heap is checked for consistency upon
	  each invocation of a symmetric heap related routine.	Setting
	  this variable significantly increases the overhead associated
	  with symmetric heap management operations.

	  Default:  Not enabled

     SMA_STATIC_VERBOSE (IRIX systems only)
	  Prints out information relevant to the static cross mapping
	  procedure at job startup.

	  Default:  Not enabled

     SMA_SYMMETRIC_VERBOSE (IRIX systems only)
	  Prints out information relevant to the symmetric heap
	  initialization at job startup.

	  Default:  Not enabled

     SMA_VERBOSE (IRIX systems only)
	  Prints out additional information relevant to the SHMEM startup
	  procedure.

	  Default:  Not enabled

     SMA_VERSION (Also available on SGI Altix 3000 systems)
	  Prints the libsma library release version.

	  Default:  Not enabled

   Memory Placement Related Environment Variables
     On non-uniform memory access (NUMA) systems, such as Origin series
     systems, SHMEM start-up processing ensures that the process associated
     with a SHMEM PE executes on a processor near the memory associated
     with a SHMEM PE.

     On Altix systems, the available MPI memory placement environment
     variables should be used.

     The following environment variables allow you to control the placement
     of the SHMEM application on the system:

     Variable		 Description

     PAGESIZE_DATA  (IRIX systems only)
			 Specifies the desired page size in kilobytes for
			 program data areas.  You must specify an integer
			 value.	 On Origin series systems, supported values
			 include 16, 64, 256, 1024, and 4096.

     SMA_DPLACE_INTEROP_OFF  (IRIX systems only)
			 Disables a SHMEM/dplace interoperability feature
			 available beginning with IRIX 6.5.13.	By setting
			 this variable, you can obtain the behavior of
			 SHMEM with dplace on older releases of IRIX.  By
			 default, this variable is not enabled.

     SMA_DSM_CPULIST (IRIX systems only)
			 Specifies a list of CPUs on which to run a SHMEM
			 application. To ensure that processes are linked
			 to CPUs, this variable should be used in
			 conjunction with SMA_DSM_MUSTRUN.

			 For an explanation of the syntax for this
			 environment variable, see the section entitled
			 "Using a CPU List."

     SMA_DSM_MUSTRUN (IRIX systems only)
			 Enforces memory locality for SHMEM processes.	Use
			 of this feature ensures that each SHMEM process
			 will get a CPU and physical memory on the node to
			 which it was originally assigned.  This variable
			 has been observed to improve program performance
			 on IRIX systems running release 6.5.7 and earlier,
			 when running a program on a quiet system.  With
			 later IRIX releases, under certain circumstances,
			 setting this variable is not necessary.
			 Internally, this feature directs the library to
			 use the process_cpulink(3) function instead of
			 process_mldlink(3) to control memory placement.

			 SMA_DSM_MUSTRUN should not be used when the job is
			 submitted to miser (see miser_submit(1)) because
			 program hangs may result. By default, this
			 variable is not enabled.

			 The process_cpulink(3) function is inherited
			 across process fork(2) or sproc(2). For this
			 reason, when using mixed SHMEM/OpenMP
			 applications, it is recommended either that this
			 variable not be set, or that _DSM_MUSTRUN also be
			 set (see p_environ(5)).

     SMA_DSM_OFF (IRIX systems only)
			 When set to any value, deactivates
			 processor-memory affinity control.  When set,
			 SHMEM processes run on any available processor,
			 regardless of whether it is near the memory
			 associated with that process.

     SMA_DSM_PPM (IRIX systems only)
			 When set to an integer value, specifies the number
			 of processors to be mapped to every memory.  The
			 default is 2 on Origin 2000 systems. The default
			 is 4 on Origin 3000 systems.

     SMA_DSM_TOPOLOGY (IRIX systems only)
			 Specifies the shape of the set of hardware nodes
			 on which the PE memories are allocated.  Set this
			 variable to one of the following values:

			 Value		Action

			 cube		A group of memory nodes that form a
					perfect hypercube. NPES/SMA_DSM_PPM
					must be a power of 2.  If a perfect
					hypercube is unavailable, a less
					restrictive placement will be used.

			 cube_fixed	A group of memory nodes that form a
					perfect hypercube.
					NPES/SMA_DSM_PPM must be a power of
					2.  If a perfect hypercube is
					unavailable, the placement will
					fail, disabling NUMA placement.

			 cpucluster	Any group of memory nodes.  The
					operating system attempts to place
					the group numbers close to one
					another, taking into account nodes
					with disabled processors.  (Default
					for IRIX 6.5.11 and higher).

			 free		Any group of memory nodes.  The
					operating system attempts to place
					the group numbers close to one
					another.  (Default for IRIX 6.5.10
					and earlier releases).

     SMA_DSM_VERBOSE  (IRIX systems only)
			 When set to any value, writes information about
			 process and memory placement to stderr.

   Using a CPU List
     On IRIX systems you can manually select CPUs to use for a SHMEM
     application by setting the SMA_DSM_CPULIST shell variable.	 This is
     treated as a comma and/or hyphen delineated ordered list, specifying a
     mapping of SHMEM processes to CPUs.  The shepherd process is not
     included in this list.

     Examples:

	       Value	      CPU Assignment

	       8,16,32	      Place three SHMEM processes on CPUs 8, 16,
			      and 32.

	       32,16,8	      Place the SHMEM process rank zero on CPU 32,
			      one on 16, and two on CPU 8.

	       8-15,32-39     Place the SHMEM processes 0 through 7 on CPUs
			      8 to 15.	Place the SHMEM processes 8 through
			      15 on CPUs 32 to 39.

	       39-32,8-15     Place the SHMEM processes 0 through 7 on CPUs
			      39 to 32.	 Place the SHMEM processes 8
			      through 15 on CPUs 8 to 15.

     Note that the process rank is the value returned by _my_pe(3I).  CPUs
     are associated with the cpunum values given in the hardware
     graph(hwgraph(4)).

     The number of processors specified must equal the number of SHMEM
     processes (excluding the shepherd process) that will be used.  If an
     error occurs in processing the CPU list, the default placement policy
     is used.

   Using dplace(1) on IRIX Systems
     The environment variables described previously allow you to map SHMEM
     processes and memories with hardware processors and nodes.	 The
     dplace(1) command, which is available on Origin series systems, can
     give you additional control over application placement.

     Perform the following steps to use the dplace(1) command with SHMEM
     programs:

     *	Create file placefile with these contents:

	  threads $NPES + 1
	  memories ($NPES +1)/2 in topology cube
	  distribute threads 1:$NPES across memories

     *	Execute your program with NPES set to the number of PEs.  For
	example, to run with 4 PEs, invoke your program this way:

	  env NPES=4 dplace -place placefile a.out

NOTES
   Installing SHMEM
     The SHMEM software is packaged with the Message Passing Toolkit (MPT)
     software product. You can find installation instructions in the MPT
     relnotes. Please refer to the relnotes accompanying the toolkit.  On
     IRIX systems, type relnotes mpt.  On SGI Altix 3000 systems, see the
     README.relnotes file, which can be found by typing rpm -ql sgi-mpt |
     grep README.relnotes.

   Compiling SHMEM Programs
     The SHMEM routines reside in libsma.so.

     The following sample command lines compile programs that include SHMEM
     routines:

     *	IRIX systems:
	  cc -64 c_program.c -lsma
	  CC -64 cplusplus_program.c -lsma
	  f90 -64 -LANG:recursive=on fortran_program.f -lsma
	  f77 -64 -LANG:recursive=on fortran_program.f -lsma

     *	IRIX systems with Fortran 90 version 7.2.1 available:
	  f90 -64 -LANG:recursive=on -auto_use shmem_interface
	  fortran_program.f -lsma

     *	SGI Altix 3000 systems:
	  cc c_program.c -lsma
	  f77 fortran_program.f -lsma
	  efc fortran_program.f -lsma

     The shmem_interface module is intended for use only with the -auto_use
     option.  This module provides compile-time checking of interfaces.
     The keyword=arg actual argument format is not supported for SHMEM
     subroutines defined in the shmem_interface procedure interface module.

     The IRIX N32 ABI, selected by the -n32 compiler option, is also
     supported by SHMEM, but is recommended only for small process counts
     and program memory sizes, due to the limitation in the size of virtual
     addresses imposed by the N32 ABI.	The use of the N64 ABI, selected by
     the -64 compiler option, is recommended for most SHMEM programs.

   Running SHMEM Programs
     On IRIX systems, SHMEM programs are run with the NPES environment
     variable set to the number of processes desired, as in the following
     example:
	  env NPES=32 ./a.out

     On SGI Altix 3000 systems, SHMEM is layered on MPI infrastructure.
     Programs are started with an mpirun command, as in the following
     examples:
	  mpirun -np 32 ./a.out
	  mpirun hostA, hostB -np 16 ./a.out

   SHMEM Support for SGI Altix 3000
     On Altix systems, the SHMEM API is supported for SHMEM programs that
     run on a single host, as well as SHMEM programs that span multiple
     partitions connected via NUMAlink.	 SHMEM functions can be used to
     communicate with processes running on the same or different
     partitions.

     On Altix, SHMEM support is layered on MPI infrastructure.	The MPI
     memory mapping feature, which is enabled by default, is required for
     SHMEM support on Altix.  In addition, the xpmem kernel module must be
     installed on the system to support SHMEM.	The xpmem module is
     released with the OS.

     SHMEM programs on Altix are started with an mpirun command, which
     determines the number of processing elements (PEs) to launch.  The
     SHMEM program can still call the start_pes routine to initialize the
     PEs, but the actual number of PEs created is determined by the -np
     option on the mpirun command line.

   MPI Interoperability
     SHMEM routines can be used in conjunction with MPI message passing
     routines in the same application.	Programs that use both MPI and
     SHMEM should call MPI_Init and MPI_Finalize but omit the call to the
     start_pes routine.	 SHMEM PE numbers are equal to the MPI rank within
     the MPI_COMM_WORLD environment variable.  Note that this precludes use
     of SHMEM routines between processes in different MPI_COMM_WORLDs. MPI
     processes started using the MPI_Comm_spawn function, for example,
     cannot use SHMEM routines to communicate with their parent MPI
     processes.

     On IRIX clustered systems, or when an MPI job involves more than one
     executable file, you can use SHMEM to communicate only with processes
     running on the same host and running the same executable file.  Use
     the shmem_pe_accessible function to determine if a remote PE is
     accessible via SHMEM communication from the local PE.

     On Altix partitioned systems, when running with a single executable
     file, you can use SHMEM to communicate with processes running on the
     same or different partitions.  Use the shmem_pe_accessible function to
     determine if a remote PE is accessible via SHMEM communication from
     the local PE.

     When running an MPI application involving multiple executable files on
     Altix, one can use SHMEM to communicate with processes running from
     the same or different executable files, provided that the
     communication is limited to symmetric data objects.  It is important
     to note that static memory, such as a Fortran common block or C global
     variable, is symmetric between processes running from the same
     executable file, but is not symmetric between processes running from
     different executable files.  Data allocated from the symmetric heap
     (shmalloc or shpalloc) is symmetric across the same or different
     executable files.	Use the shmem_addr_accessible function to determine
     if a local address is accessible via SHMEM communication from a remote
     PE.

     Note that on Altix, the shmem_pe_accessible function returns TRUE only
     if the remote PE is a process running from the same executable file as
     the local PE, indicating that full SHMEM support (static memory and
     symmetric heap) is available.

     When using SHMEM within MPI, one should use the MPI memory placement
     environment variables when using non-default memory placement options.

   SHMEM and Thread Safety
     None of the SHMEM communication routines, including shmem_ptr should
     be considered to be thread safe.  When used in a multithreaded
     environment, the programmer should take steps to ensure that multiple
     threads in a PE cannot simultaneously invoke SHMEM communication
     routines.

   SHMEM and Cache Coherency
     The SHMEM library was originally developed for systems that had
     limited cache coherency memory architectures.  On those architectures,
     it is at times necessary to handle cache coherency within the
     application.  This is not required on IRIX or Altix systems because
     cache coherency is handled by the hardware.

     The SHMEM cache management functions were retained for ease in porting
     from these legacy platforms.  However, their use is no longer
     required.

     Note that cache coherency does not imply memory ordering, particularly
     with respect to put operations.  In cases in which the ordering of put
     operations is important, one must use either the memory ordering
     functions shmem_fence or shmem_quiet, or one of the various barrier
     functions.

   SHMEM Program Start-up and Memory Usage
     On IRIX, starting with SHMEM 4.0 (distributed as part of MPT 1.7)
     substantial changes have been made to the procedures for making static
     memory remotely accessible (static cross mapping) and for managing the
     symmetric heap.  These changes impact the way in which SHMEM
     applications interact with IRIX.

     To reduce application startup times, certain procedures that were
     previously done at job startup time are now deferred until required by
     the application's SHMEM communication requests.  Thus, the first
     invocation of a communication request for a particular PE might be
     relatively slow compared to subsequent requests associated with this
     PE.  If this behavior is undesirable, the SMA_STATIC_PREATTACH
     environment variable can be set.  If the symmetric heap is employed by
     the application, the SMA_SYMMETRIC_PREATTACH environment variable can
     also be set.

     A second consequence of delaying these procedures until required by
     the application is the apparent resident set size of a PE.	 This is
     related to the manner in which IRIX accounts for memory usage
     associated with system V shared memory objects.  When page table
     sharing is enabled for the object, a process attaching to the object
     is charged only for a fraction of the memory actually associated with
     the object.  However, if page tables are not shared, the process'
     resident set size increases by the number of pages currently
     associated with the shared memory object, regardless of whether this
     process has accessed these pages.	If no pages have yet been faulted
     in at the time of attachment, there is no significant increase in the
     resident set size of the attaching process.

     As a consequence of this accounting procedure, a PE might appear to
     have a very large resident set size when the following conditions
     occur:

     *	Data in the static region of the application is used as the target
	of a shmem communication routine.

     *	The target PE has initialized a substantial portion of its static
	region.

     *	Because of size and alignment constraints, the static region cannot
	use shared page tables (less than 32 MB).

     *	A many-to-many or all-to-all communication pattern is used.

     Although the resident set size for each PE can grow to be very large
     when all four of these conditions are met, this should generally not
     be a problem.  If for some reason, this apparent large resident set
     size is undesirable, the SMA_STATIC_PREATTACH environment variable can
     be set.  However, this might substantially increase job startup time.

     Older versions of SHMEM use large mapped files to render static memory
     remotely accessible on IRIX systems.  This is also the case for this
     version of SHMEM if the SMA_STATIC_METHOD environment variable is set
     to mmap.  The result is that enough file space must be available in
     /var/tmp to accommodate a file of size npes * staticsz, where npes is
     the number of PEs and staticsz is the size of the program's static
     data area.	 Static data includes Fortran common blocks and C/C++
     static data.

     If a SHMEM program's memory requirements exceed available file space
     in /var/tmp, a SHMEM run-time error message is generated.	You can use
     the TMPDIR environment variable to select a directory in a file system
     with sufficient file space.

     To minimize SHMEM program start-up time, use symmetric memory
     allocated by the SHPALLOC(3F) or shmalloc(3C) routines instead of
     static memory.  Memory allocated by these routines does not require a
     corresponding file space allocation in /var/tmp.  This avoids problems
     when file space is low.  Starting with IRIX 6.5.18, using symmetric
     heap memory is also preferred as shared page tables can be used.  This
     reduces kernel memory requirements and avoids some of the potential
     problems with PE resident set size discussed above.

     An additional consequence of techniques used by SHMEM to render static
     memory remotely accessible relates to logical swap requirements for
     SHMEM jobs.  With the current static cross mapping procedure, the
     logical swap reservation required for an NPES SHMEM job is
     approximately 2 *	npes * staticsz where staticsz is the size of the
     program's static data area.  Additional logical swap reservation might
     be required for the symmetric heap.

EXAMPLES
     Example 1.	 The following Fortran SHMEM program directs all PEs to sum
     simultaneously the numbers in the VALUES variable across all PEs:

	  PROGRAM REDUCTION
	  REAL VALUES, SUM
	  COMMON /C/ VALUES
	  REAL WORK
	  CALL START_PES(0)
	  VALUES = MY_PE()
	  CALL SHMEM_BARRIER_ALL       ! Synchronize all PEs
	  SUM = 0.0
	  DO I = 0,NUM_PES()-1
	     CALL SHMEM_REAL_GET(WORK, VALUES, 1, I)   ! Get next value
	     SUM = SUM + WORK			       ! Sum it
	  ENDDO
	  PRINT*,'PE ',MY_PE(),' COMPUTED	SUM=',SUM
	  CALL SHMEM_BARRIER_ALL
	  END

     Example 2.	 The following C SHMEM program transfers an array of 10
     longs from PE 0 to PE 1:

	  #include <mpp/shmem.h>
	  main()
	  {
	     long source[10] = { 1, 2, 3, 4, 5,
				 6, 7, 8, 9, 10 };
	  static long target[10];
	  start_pes(0);
	  if (_my_pe() == 0) {
	     /* put 10 elements into target on PE 1 */
	     shmem_long_put(target, source, 10, 1);
	  }
	  shmem_barrier_all();	/* sync sender and receiver */
	  if (_my_pe() == 1)
	     printf("target[0] on PE %d is %d\n", _my_pe(), target[0]);
	  }

SEE ALSO
     dplace(1)

     The following man pages also contain information on SHMEM routines.
     See the specific man pages for implementation information.

     shmem_add(3), shmem_and(3), shmem_barrier(3), shmem_barrier_all(3),
     shmem_broadcast(3), shmem_cache(3), shmem_collect(3), shmem_cswap(3),
     shmem_fadd(3), shmem_fence(3), shmem_finc(3), shmem_get(3),
     shmem_iget(3), shmem_inc(3), shmem_iput(3), shmem_lock(3),
     shmem_max(3), shmem_min(3), shmem_my_pe(3), shmem_or(3),
     shmem_prod(3), shmem_put(3), shmem_quiet(3), shmem_short_g(3)
     shmem_short_p(3), shmem_sum(3), shmem_swap(3), shmem_wait(3),
     shmem_xor(3), shmem_pe_accessible(3), shmem_addr_accessible(3),
     start_pes(3)

     shmalloc(3C)

     shpalloc(3F)

     MY_PE(3I), NUM_PES(3I)

     For information on using SHMEM routines with message passing routines,
     see the Message Passing Toolkit: MPI Programmer's Manual.
[top]

List of man pages available for IRIX

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome