#include <sys/types.h> #include <sys/procfs.h> #include <sys/fault.h> #include <sys/syscall.h> #include <signal.h> #include <fcntl.h>
Standard system call interfaces are used to access /proc files: open(2), close(2), read(2), and write(2). Most files describe process state and can only be open for reading. ctl (control) and lwpctl files permit manipulation of process state and can only be open for writing. as (address space) files contain the image of the running process and can be open for both reading and writing. An open for writing allows process control; a read-only open allows inspection, but not control. (For purposes of this discussion we refer to the process as open for reading or writing if any of its associated /proc files is open respectively for reading or writing.) In general more than one process can open the same /proc file at the same time. Exclusive open is an advisory mechanism provided to allow cooperating controlling processes to avoid collisions. A process can obtain exclusive control of a target process, with respect to other cooperating processes, if it successfully opens any /proc file in the target process for writing (the as or ctl files, or the lwpctl file of any LWP) while specifying O_EXCL in the open. Such an open will fail if the target process is already open for writing (that is, if a ctl, as, or lwpctl file is open for writing). There can be any number of concurrent read-only opens; O_EXCL is ignored on opens for reading.
Data may be transferred from or to any locations in the address space of the traced process by applying lseek(2) to position the as file at the virtual address of interest followed by read or write. The address-map file /proc/pid/map can be examined to determine the accessible areas (mappings) of the address space. I/O transfers may span contiguous mappings. An I/O request extending into an unmapped area is truncated at the boundary. A write request beginning at an unmapped virtual address fails with errno set to EIO; a read request beginning at an unmapped virtual address returns zero (that is, an end-of-file indication).
Information and control operations are provided through additional files. sys/procfs.h contains definitions of data structures and message formats used with these files. Some of these definitions involve the use of sets of flags. The set types sigset_t, fltset_t, and sysset_t correspond, respectively, to signal, fault, and system call enumerations defined in sys/signal.h, sys/fault.h, and sys/syscall.h. Each set type is large enough to hold flags for its own enumeration. Although they are of different sizes, they have a common structure and can be manipulated by these macros:
prfillset(&set); /* turn on all flags in set */ premptyset(&set); /* turn off all flags in set */ praddset(&set, flag); /* turn on the specified flag */ prdelset(&set, flag); /* turn off the specified flag */ r = prismember(&set, flag); /* != 0 iff flag is turned on */
One of prfillset or premptyset must be used to initialize set before it is used in any other operation. flag must be a member of the enumeration corresponding to set.
Every active process contains at least one Light Weight Process (LWP). Each LWP represents a flow of execution that is independently scheduled by the operating system. See intro(2) for an explanation of LWPs and their relationship to the threads library. All LWPs in a process share address space as well as many other attributes. Using ctl files, described below, it is possible to affect individual LWPs in a process or to affect all of them at once (depending on the operation).
Although process state and consequently the contents of /proc files can change from instant to instant, a single read(2) of a /proc file is guaranteed to return a ``sane'' representation of state, that is, the read will be an atomic snapshot of the state of the process. No such guarantee applies to successive reads applied to a /proc file for a running process. In addition, atomicity is specifically not guaranteed for any I/O applied to the as (address-space) file; the contents of any process's address space might be concurrently modified by an LWP of that process or any other process in the system.
Multiple structure definitions are used to describe the files. Unless explicitly stated otherwise, these definitions may be incomplete: the file may contain additional information. More specifically, these structures may grow between releases of the system and programs should not assume that they will not.
long pr_flags; /* Flags */ ushort_t pr_nlwp; /* Number of lwps in the process */ sigset_t pr_sigpend; /* Set of process pending signals */ vaddr_t pr_brkbase; /* Address of the process heap */ ulong_t pr_brksize; /* Size of the process heap, in bytes */ vaddr_t pr_stkbase; /* Address of the process stack */ ulong_t pr_stksize; /* Size of the process stack, in bytes */ pid_t pr_pid; /* Process id */ pid_t pr_ppid; /* Parent process id */ pid_t pr_pgid; /* Process group id */ pid_t pr_sid; /* Session id */ timestruc_t pr_utime; /* Process user cpu time */ timestruc_t pr_stime; /* Process system cpu time */ timestruc_t pr_cutime; /* Sum of children's user times */ timestruc_t pr_cstime; /* Sum of children's system times */ sigset_t pr_sigtrace; /* Mask of traced signals */ fltset_t pr_flttrace; /* Mask of traced faults */ sysset_t pr_sysentry; /* Mask of system calls traced on entry */ sysset_t pr_sysexit; /* Mask of system calls traced on exit */ lwpstatus_t pr_lwp; /* "representative" LWP */
pr_flags is a bit-mask holding these flags:
pr_nlwp is the total number of LWPs in the process.
pr_brkbase is the virtual address of the process heap and pr_brksize is its size in bytes. The address formed by the sum of these values is the process break (see brk(2)). pr_stkbase and pr_stksize are, respectively, the virtual address of the process stack and its size in bytes. (Each LWP runs on a separate stack; the process stack is distinguished in that the operating system will grow it as necessary.)
pr_pid, pr_ppid, pr_pgid, and pr_sid are, respectively, the process ID, parent process ID, process group ID, and session ID of the process.
pr_utime, pr_stime, pr_cutime, and pr_cstime are, respectively, the user CPU and system CPU time consumed by the process, and the cumulative user CPU and system CPU time consumed by the process's children, in seconds and nanoseconds.
pr_sigtrace and pr_flttrace contain, respectively, the set of signals and the set of hardware faults that are being traced (see PCSTRACE and PCSFAULT).
pr_sysentry and pr_sysexit contain, respectively, the sets of system calls being traced on entry and exit (see PCSENTRY and PCSEXIT).
If the process is not a zombie, pr_lwp contains an lwpstatus_t structure describing a representative LWP. The contents of this structure have the same meaning as if it were read from an lwpstatus file (see below).
When the process has more than one LWP, its representative LWP is chosen by the /proc implementation. The chosen LWP is a stopped LWP only if all the process's LWPs are stopped, is stopped on an event of interest only if all the LWPs are so stopped, or is in a PR_REQUESTED stop only if there are no other events of interest to be found. The chosen LWP remains fixed as long as all the LWPs are stopped on events of interest and PCRUN is not applied to any of them.
When applied to the process control file, every /proc control operation that must act on an LWP uses the same algorithm to choose which LWP to act on. Together with synchronous stopping (see PCSET), this enables an application to control a multiple-LWP process using only the process-level status and control files if it so chooses. More fine-grained control can be achieved using the LWP-specific files.
ulong_t pr_flag; /* process flags */ ulong_t pr_nlwp; /* number of LWPs in process */ uid_t pr_uid; /* real user id */ gid_t pr_gid; /* real group id */ pid_t pr_pid; /* unique process id */ pid_t pr_ppid; /* process id of parent */ pid_t pr_pgid; /* pid of process group leader */ pid_t pr_sid; /* session id */ caddr_t pr_addr; /* internal address of process */ long pr_size; /* size of process image in pages */ long pr_rssize; /* resident set size in pages */ timestruc_t pr_start; /* process start time, time since epoch */ timestruc_t pr_time; /* usr+sys cpu time for this process */ dev_t pr_ttydev; /* controlling tty device (or PRNODEV)*/ char pr_fname[PRFNSZ]; /* last component of exec()ed pathname*/ char pr_psargs[PRARGSZ]; /* initial characters of arg list */ struct lwpsinfo pr_lwp; /* "representative" LWP */
Some of the entries in psinfo, such as pr_flag and pr_addr, refer to internal kernel data structures and should not be expected to retain their meanings across different versions of the operating system. They have no meaning to a program and are only useful for manual interpretation by a user aware of the implementation details.
psinfo is still accessible even after a process becomes a zombie.
pr_lwp describes the representative LWP chosen as described under the pstatus file above. If the process is a zombie, pr_nlwp and pr_lwp.pr_lwpid are zero and the other fields of pr_lwp are undefined.
caddr_t pr_vaddr; /* Virtual address */ ulong_t pr_size; /* Size of mapping in bytes */ char pr_mapname[32]; /* Name in /proc/pid/object */ off_t pr_off; /* Offset into mapped object, if any */ long pr_mflags; /* Protection and attribute flags */ long pr_filler[9]; /* For future use */
pr_vaddr
is the virtual address of the
mapping within the traced process and pr_size is its size in bytes.
If pr_mapname
does not contain an empty string then it holds
the name of a file in the object directory
that can be opened for reading
to yield a file descriptor for the object
to which the virtual address is mapped.
pr_off
is the offset within the mapped object (if any)
to which the virtual address is mapped.
pr_mflags
is a bit-mask of protection and attribute flags:
A contiguous area of the address space having the same underlying mapped object may appear as multiple mappings because of varying read, write, execute, and shared attributes. The underlying mapped object does not change over the range of a single mapping. An I/O operation to a mapping marked MA_SHARED fails if applied at a virtual address not corresponding to a valid page in the underlying mapped object. Reads and writes to private mappings always succeed. Reads and writes to unmapped addresses always fail.
uid_t pr_euid; /* Effective user id */ uid_t pr_ruid; /* Real user id */ uid_t pr_suid; /* Saved user id (from exec) */ gid_t pr_egid; /* Effective group id */ gid_t pr_rgid; /* Real group id */ gid_t pr_sgid; /* Saved group id (from exec) */ uint_t pr_ngroups; /* Number of supplementary groups */ gid_t pr_groups[1]; /* Array of supplementary groups */
The list of associated supplementary groups in pr_groups
is of variable length;
pr_ngroups
specifies the number of groups.
The object directory makes it possible for a controlling process to get access to the object file and any shared libraries (and consequently the symbol tables)--in general, any mapped files--without having to know the specific path names of those files.
long pr_flags; /* Flags */ short pr_why; /* Reason for stop (if stopped) */ short pr_what; /* More detailed reason */ lwpid_t pr_lwpid; /* Specific LWP identifier */ short pr_cursig; /* Current signal */ siginfo_t pr_info; /* Info associated with signal or fault */ struct sigaction pr_action; /* Signal action for current signal */ sigset_t pr_lwppend; /* Set of LWP pending signals */ stack_t pr_altstack; /* Alternate signal stack info */ short pr_syscall; /* System call number (if in syscall) */ short pr_nsysarg; /* Number of arguments to this syscall */ long pr_sysarg[PRSYSARGS];/* Arguments to this syscall */ char pr_clname[PRCLSZ]; /* Scheduling class name */ ucontext_t pr_context; /* LWP context */ pfamily_t pr_family; /* Processor family-specific information */
pr_flags
is a bit-mask holding these flags:
pr_why
and pr_what
together describe,
for a stopped LWP, the reason for the stop.
Possible values of pr_why
:
pr_what
is unused in this case.
pr_what
holds the signal number that caused the stop
(for a newly-stopped LWP, the same value is in pr_cursig
).
pr_what
holds the fault number that caused the stop.
pr_what
holds the system call number.
pr_what
holds the stopping signal number.
pr_lwpid
names the specific LWP described.
pr_cursig
names the current signal, that is,
the next signal to be delivered to the LWP.
pr_info
,
when the LWP is in a PR_SIGNALLED or PR_FAULTED stop,
contains additional information pertinent to the particular signal or fault
(see sys/siginfo.h).
pr_action
contains the signal action information about
the current signal (see
sigaction(2));
it is undefined if pr_cursig
is zero.
pr_lwppend
identifies any synchronously-generated or LWP-directed
signals pending for the LWP.
It does not include signals pending at the process level.
pr_altstack
contains the alternate signal stack information
for the LWP (see
sigaltstack(2)).
pr_syscall
is the number of the system call,
if any, being executed by the LWP;
it is non-zero if and only if the LWP is stopped
on PR_SYSENTRY or PR_SYSEXIT,
or is asleep within a system call (PR_ASLEEP is set).
If pr_syscall
is non-zero,
pr_nsysarg
is the number of arguments to the system call
and the pr_sysarg
array contains the arguments.
pr_clname
contains the name of the scheduling class of the LWP.
pr_context
contains the user context of the LWP, as if it had
called
getcontext(2).
If the LWP is not stopped, all context values are undefined.
pr_family
contains the CPU-family specific information about the LWP.
Use of this field is not portable across different architectures.
ulong_t pr_flag; /* LWP flags */ lwpid_t pr_lwpid; /* LWP id */ caddr_t pr_addr; /* internal address of LWP */ caddr_t pr_wchan; /* wait addr for sleeping LWP */ uchar_t pr_stype; /* synchronization event type */ uchar_t pr_state; /* numeric scheduling state */ char pr_sname; /* printable character representing pr_state */ uchar_t pr_nice; /* nice for cpu usage */ int pr_pri; /* priority, high value = high priority */ timestruc_t pr_time; /* usr+sys cpu time for this LWP */ char pr_clname[8]; /* Scheduling class name */ char pr_name[PRFNSZ]; /* name of system LWP */ processorid_t pr_onpro; /* processor on which LWP is running */ processorid_t pr_bindpro; /* processor to which LWP is bound */ processorid_t pr_exbindpro; /* processor to which LWP is exbound */
Some of the entries in lwpsinfo, such as pr_flag
,
pr_addr
, pr_state
, pr_stype
, pr_wchan
, and
pr_name
, refer to internal kernel data structures and should not be
expected to retain their meanings across different versions of the
operating system.
They have no meaning to a program and are only useful
for manual interpretation by a user aware of the implementation details.
Descriptions of the allowable control messages follow. Note that writing a message to a control file for a process or LWP that has exited elicits the error ENOENT.
An event of interest is either a PR_REQUESTED stop or a stop that has been specified in the process's tracing flags (set by PCSTRACE, PCSFAULT, PCSENTRY, and PCSEXIT). A PR_JOBCONTROL stop is specifically not an event of interest. (An LWP may stop twice because of a stop signal; first showing PR_SIGNALLED if the signal is traced and again showing PR_JOBCONTROL if the LWP is set running without clearing the signal.) If PCSTOP or PCDSTOP is applied to an LWP that is stopped, but not on an event of interest, the stop directive takes effect when the LWP is restarted by the competing mechanism; at that time the LWP enters a PR_REQUESTED stop before executing any user-level code.
A write of a control message that blocks is interruptible by a signal so that, for example, an alarm(2) can be set to avoid waiting forever for a process or LWP that may never stop on an event of interest. If PCSTOP is interrupted, the LWP stop directives remain in effect even though the write returns an error.
A system process (indicated by the PR_ISSYS flag) never executes at user level, has no user-level address space visible through /proc, and cannot be stopped. Applying PCSTOP, PCDSTOP, or PCWSTOP to a system process or any of its LWPs elicits the error EBUSY.
When applied to an LWP control file PCRUN makes the specific LWP runnable. The operation fails (EBUSY) if the specific LWP is not stopped on an event of interest.
When applied to the process control file an LWP is chosen for the operation as described for /proc/pid/status. The operation fails (EBUSY) if the chosen LWP is not stopped on an event of interest. If PRSTEP or PRSTOP were requested, the chosen LWP is made runnable; otherwise, the chosen LWP is marked PR_REQUESTED. If as a result all LWPs are in the PR_REQUESTED stop state, they are all made runnable.
Once an LWP has been made runnable by PCRUN, it is no longer stopped on an event of interest even if, because of a competing mechanism, it remains stopped.
If a signal that is included in a held signal set of an LWP is sent to the LWP, the signal is not received and does not cause a stop until it is removed from the held signal set, either by the LWP itself or by setting the held signal set with PCSHOLD or the PRSHOLD option of PCRUN.
When not traced, a fault normally results in the posting of a signal to the LWP that incurred the fault. If an LWP stops on a fault, the signal is posted to the LWP when execution is resumed unless the fault is cleared by PCCFAULT or by the PRCFAULT option of PCRUN. FLTPAGE is an exception; no signal is posted. There may be additional processor-specific faults like this. The pr_info field in /proc/pid/status or in /proc/pid/lwp/lwpid/lwpstatus identifies the signal to be sent and contains machine-specific information about the fault.
When entry to a system call is being traced, an LWP stops after having begun the call to the system but before the system call arguments have been fetched from the LWP. When exit from a system call is being traced, an LWP stops on completion of the system call just before checking for signals and returning to user level. At this point all return values have been stored into the LWP's registers.
If an LWP is stopped on entry to a system call (PR_SYSENTRY) or when sleeping in an interruptible system call (PR_ASLEEP is set), it may be instructed to go directly to system call exit by specifying the PRSABORT flag in a PCRUN control message. Unless exit from the system call is being traced the LWP returns to user level showing error EINTR.
It is an error (EINVAL) to specify flags other than those described above or to apply these operations to a system process. The current modes are reported in the pr_flags field of /proc/pid/status.
/proc | directory (list of processes) |
/proc/nnnnn | directory for process nnnnn |
/proc/nnnnn/status | status of process nnnnn |
/proc/nnnnn/ctl | control file for process nnnnn |
/proc/nnnnn/psinfo | ps info for process nnnnn |
/proc/nnnnn/as | address space of process nnnnn |
/proc/nnnnn/map | as map info for process nnnnn |
/proc/nnnnn/object | directory for objects for process nnnnn |
/proc/nnnnn/sigact | signal actions for process nnnnn |
/proc/nnnnn/lwp/lll | directory for LWP lll |
/proc/nnnnn/lwp/lll/lwpstatus | status of LWP lll |
/proc/nnnnn/lwp/lll/lwpctl | control file for LWP lll |
/proc/nnnnn/lwp/lll/lwpsinfo | ps info for LWP lll |
To wait for any of a set of processes or LWPs to stop, /proc file descriptors can be used in a poll(2) system call instead of writing a PCWSTOP message. Descriptors for /proc/pid/ctl /proc/pid/lwp/lwpid/lwpctl can be used for this purpose. When requested and returned, the polling event POLLWRNORM shows that the process or LWP stopped on an event of interest. Although they cannot be requested, the polling events POLLHUP, POLLERR and POLLNVAL may be returned. POLLHUP shows that the process or LWP has exited. POLLERR shows that the file descriptor has become invalid. POLLNVAL is returned immediately if POLLWRNORM is requested on a file descriptor referring to a system process (see PCSTOP).
For security reasons, except for the privileged user, an open of a /proc file fails unless both the user-ID and group-ID of the caller match those of the traced process and the process's object file is readable by the caller. Files corresponding to setuid and setgid processes can be opened only by the privileged user. Even if held by the privileged user, an open process or LWP file descriptor becomes invalid if the traced process performs an exec of a setuid/setgid object file or an object file that it cannot read. Any operation performed on an invalid file descriptor, except close(2), fails with EBADF. In this case, if any tracing flags are set and the process or any LWP file is open for writing, the process will have been directed to stop and its run-on-last-close flag will have been set (see PCSET). This enables a controlling process (if it has permission) to reopen the process file to get new valid file descriptors, close the invalid file descriptors, and proceed. Just closing the invalid file descriptors causes the traced process to resume execution with no tracing flags set. Any process not currently open for writing by /proc that has left-over tracing flags from a previous open and that execs a setuid/setgid or unreadable object file will not be stopped but will have all its tracing flags cleared.
For reasons of symmetry and efficiency there are more control operations than strictly necessary.
The lwpstatus structure has been enhanced to handle floating point context changes on Pentium III processors; see ``Pentium III extended floating point support'' in New features for more information.