PERLGUTS(1)PERLGUTS(1)NAMEperlguts - Perl's Internal Functions
DESCRIPTION
This document attempts to describe some of the internal functions of
the Perl executable. It is far from complete and probably contains
many errors. Please refer any questions or comments to the author
below.
Datatypes
Perl has three typedefs that handle Perl's three main data types:
SV Scalar Value
AV Array Value
HV Hash Value
Each typedef has specific routines that manipulate the various data
types.
What is an "IV"?
Perl uses a special typedef IV which is large enough to hold either an
integer or a pointer.
Perl also uses two special typedefs, I32 and I16, which will always be
at least 32-bits and 16-bits long, respectively.
Working with SV's
An SV can be created and loaded with one command. There are four types
of values that can be loaded: an integer value (IV), a double (NV), a
string, (PV), and another scalar (SV).
The four routines are:
SV* newSViv(IV);
SV* newSVnv(double);
SV* newSVpv(char*, int);
SV* newSVsv(SV*);
To change the value of an *already-existing* scalar, there are five
routines:
void sv_setiv(SV*, IV);
void sv_setnv(SV*, double);
void sv_setpvn(SV*, char*, int)
void sv_setpv(SV*, char*);
void sv_setsv(SV*, SV*);
Notice that you can choose to specify the length of the string to be
assigned by using sv_setpvn or newSVpv, or you may allow Perl to
calculate the length by using sv_setpv or specifying 0 as the second
argument to newSVpv. Be warned, though, that Perl will determine the
string's length by using strlen, which depends on the string
terminating with a NUL character.
To access the actual value that an SV points to, you can use the
macros:
SvIV(SV*)
SvNV(SV*)
SvPV(SV*, STRLEN len)
which will automatically coerce the actual scalar type into an IV,
double, or string.
In the SvPV macro, the length of the string returned is placed into the
variable len (this is a macro, so you do not use &len). If you do not
care what the length of the data is, use the global variable na.
Remember, however, that Perl allows arbitrary strings of data that may
both contain NUL's and not be terminated by a NUL.
If you simply want to know if the scalar value is TRUE, you can use:
SvTRUE(SV*)
Although Perl will automatically grow strings for you, if you need to
force Perl to allocate more memory for your SV, you can use the macro
SvGROW(SV*, STRLEN newlen)
which will determine if more memory needs to be allocated. If so, it
will call the function sv_grow. Note that SvGROW can only increase,
not decrease, the allocated memory of an SV.
If you have an SV and want to know what kind of data Perl thinks is
stored in it, you can use the following macros to check the type of SV
you have.
SvIOK(SV*)
SvNOK(SV*)
SvPOK(SV*)
You can get and set the current length of the string stored in an SV
with the following macros:
SvCUR(SV*)
SvCUR_set(SV*, I32 val)
But note that these are valid only if SvPOK() is true.
If you want to append something to the end of string stored in an SV*,
you can use the following functions:
void sv_catpv(SV*, char*);
void sv_catpvn(SV*, char*, int);
void sv_catsv(SV*, SV*);
The first function calculates the length of the string to be appended
by using strlen. In the second, you specify the length of the string
yourself. The third function extends the string stored in the first SV
with the string stored in the second SV. It also forces the second SV
to be interpreted as a string.
If you know the name of a scalar variable, you can get a pointer to its
SV by using the following:
SV* perl_get_sv("varname", FALSE);
This returns NULL if the variable does not exist.
If you want to know if this variable (or any other SV) is actually
defined, you can call:
SvOK(SV*)
The scalar undef value is stored in an SV instance called sv_undef.
Its address can be used whenever an SV* is needed.
There are also the two values sv_yes and sv_no, which contain Boolean
TRUE and FALSE values, respectively. Like sv_undef, their addresses
can be used whenever an SV* is needed.
Do not be fooled into thinking that (SV *) 0 is the same as &sv_undef.
Take this code:
SV* sv = (SV*) 0;
if (I-am-to-return-a-real-value) {
sv = sv_2mortal(newSViv(42));
}
sv_setsv(ST(0), sv);
This code tries to return a new SV (which contains the value 42) if it
should return a real value, or undef otherwise. Instead it has
returned a null pointer which, somewhere down the line, will cause a
segmentation violation, or just weird results. Change the zero to
&sv_undef in the first line and all will be well.
To free an SV that you've created, call SvREFCNT_dec(SV*). Normally
this call is not necessary. See the section on MORTALITY.
What's Really Stored in an SV?
Recall that the usual method of determining the type of scalar you have
is to use Sv*OK macros. Since a scalar can be both a number and a
string, usually these macros will always return TRUE and calling the
Sv*V macros will do the appropriate conversion of string to
integer/double or integer/double to string.
If you really need to know if you have an integer, double, or string
pointer in an SV, you can use the following three macros instead:
SvIOKp(SV*)
SvNOKp(SV*)
SvPOKp(SV*)
These will tell you if you truly have an integer, double, or string
pointer stored in your SV. The "p" stands for private.
In general, though, it's best to just use the Sv*V macros.
Working with AV's
There are two ways to create and load an AV. The first method just
creates an empty AV:
AV* newAV();
The second method both creates the AV and initially populates it with
SV's:
AV* av_make(I32 num, SV **ptr);
The second argument points to an array containing num SV*'s. Once the
AV has been created, the SV's can be destroyed, if so desired.
Once the AV has been created, the following operations are possible on
AV's:
void av_push(AV*, SV*);
SV* av_pop(AV*);
SV* av_shift(AV*);
void av_unshift(AV*, I32 num);
These should be familiar operations, with the exception of av_unshift.
This routine adds num elements at the front of the array with the undef
value. You must then use av_store (described below) to assign values
to these new elements.
Here are some other functions:
I32 av_len(AV*); /* Returns highest index value in array */
SV** av_fetch(AV*, I32 key, I32 lval);
/* Fetches value at key offset, but it stores an undef value
at the offset if lval is non-zero */
SV** av_store(AV*, I32 key, SV* val);
/* Stores val at offset key */
Take note that these two functions return SV**'s, not SV*'s.
void av_clear(AV*);
/* Clear out all elements, but leave the array */
void av_undef(AV*);
/* Undefines the array, removing all elements */
If you know the name of an array variable, you can get a pointer to its
AV by using the following:
AV* perl_get_av("varname", FALSE);
This returns NULL if the variable does not exist.
Working with HV's
To create an HV, you use the following routine:
HV* newHV();
Once the HV has been created, the following operations are possible on
HV's:
SV** hv_store(HV*, char* key, U32 klen, SV* val, U32 hash);
SV** hv_fetch(HV*, char* key, U32 klen, I32 lval);
The klen parameter is the length of the key being passed in. The val
argument contains the SV pointer to the scalar being stored, and hash
is the pre-computed hash value (zero if you want hv_store to calculate
it for you). The lval parameter indicates whether this fetch is
actually a part of a store operation.
Remember that hv_store and hv_fetch return SV**'s and not just SV*. In
order to access the scalar value, you must first dereference the return
value. However, you should check to make sure that the return value is
not NULL before dereferencing it.
These two functions check if a hash table entry exists, and deletes it.
bool hv_exists(HV*, char* key, U32 klen);
SV* hv_delete(HV*, char* key, U32 klen, I32 flags);
And more miscellaneous functions:
void hv_clear(HV*);
/* Clears all entries in hash table */
void hv_undef(HV*);
/* Undefines the hash table */
Perl keeps the actual data in linked list of structures with a typedef
of HE. These contain the actual key and value pointers (plus extra
administrative overhead). The key is a string pointer; the value is an
SV*. However, once you have an HE*, to get the actual key and value,
use the routines specified below.
I32 hv_iterinit(HV*);
/* Prepares starting point to traverse hash table */
HE* hv_iternext(HV*);
/* Get the next entry, and return a pointer to a
structure that has both the key and value */
char* hv_iterkey(HE* entry, I32* retlen);
/* Get the key from an HE structure and also return
the length of the key string */
SV* hv_iterval(HV*, HE* entry);
/* Return a SV pointer to the value of the HE
structure */
SV* hv_iternextsv(HV*, char** key, I32* retlen);
/* This convenience routine combines hv_iternext,
hv_iterkey, and hv_iterval. The key and retlen
arguments are return values for the key and its
length. The value is returned in the SV* argument */
If you know the name of a hash variable, you can get a pointer to its
HV by using the following:
HV* perl_get_hv("varname", FALSE);
This returns NULL if the variable does not exist.
The hash algorithm, for those who are interested, is:
i = klen;
hash = 0;
s = key;
while (i--)
hash = hash * 33 + *s++;
Creating New Variables
To create a new Perl variable, which can be accessed from your Perl
script, use the following routines, depending on the variable type.
SV* perl_get_sv("varname", TRUE);
AV* perl_get_av("varname", TRUE);
HV* perl_get_hv("varname", TRUE);
Notice the use of TRUE as the second parameter. The new variable can
now be set, using the routines appropriate to the data type.
There are additional bits that may be OR'ed with the TRUE argument to
enable certain extra features. Those bits are:
0x02 Marks the variable as multiply defined, thus preventing the
"Indentifier <varname> used only once: possible typo" warning.
0x04 Issues a "Had to create <varname> unexpectedly" warning if
the variable didn't actually exist. This is useful if
you expected the variable to already exist and want to propagate
this warning back to the user.
If the C<varname> argument does not contain a package specifier, it is
created in the current package.
References
References are a special type of scalar that point to other data types
(including references).
To create a reference, use the following command:
SV* newRV((SV*) thing);
The thing argument can be any of an SV*, AV*, or HV*. Once you have a
reference, you can use the following macro to dereference the
reference:
SvRV(SV*)
then call the appropriate routines, casting the returned SV* to either
an AV* or HV*, if required.
To determine if an SV is a reference, you can use the following macro:
SvROK(SV*)
To actually discover what the reference refers to, you must use the
following macro and then check the value returned.
SvTYPE(SvRV(SV*))
The most useful types that will be returned are:
SVt_IV Scalar
SVt_NV Scalar
SVt_PV Scalar
SVt_PVAV Array
SVt_PVHV Hash
SVt_PVCV Code
SVt_PVMG Blessed Scalar
XSUB's and the Argument Stack
The XSUB mechanism is a simple way for Perl programs to access C
subroutines. An XSUB routine will have a stack that contains the
arguments from the Perl program, and a way to map from the Perl data
structures to a C equivalent.
The stack arguments are accessible through the ST(n) macro, which
returns the n'th stack argument. Argument 0 is the first argument
passed in the Perl subroutine call. These arguments are SV*, and can
be used anywhere an SV* is used.
Most of the time, output from the C routine can be handled through use
of the RETVAL and OUTPUT directives. However, there are some cases
where the argument stack is not already long enough to handle all the
return values. An example is the POSIX tzname() call, which takes no
arguments, but returns two, the local timezone's standard and summer
time abbreviations.
To handle this situation, the PPCODE directive is used and the stack is
extended using the macro:
EXTEND(sp, num);
where sp is the stack pointer, and num is the number of elements the
stack should be extended by.
Now that there is room on the stack, values can be pushed on it using
the macros to push IV's, doubles, strings, and SV pointers
respectively:
PUSHi(IV)PUSHn(double)
PUSHp(char*, I32)
PUSHs(SV*)
And now the Perl program calling tzname, the two values will be
assigned as in:
($standard_abbrev, $summer_abbrev) = POSIX::tzname;
An alternate (and possibly simpler) method to pushing values on the
stack is to use the macros:
XPUSHi(IV)XPUSHn(double)
XPUSHp(char*, I32)
XPUSHs(SV*)
These macros automatically adjust the stack for you, if needed.
For more information, consult the perlapi manpage.
Mortality
In Perl, values are normally "immortal" -- that is, they are not freed
unless explicitly done so (via the Perl undef call or other routines in
Perl itself).
Add cruft about reference counts.
In the above example with tzname, we needed to create two new SV's to
push onto the argument stack, that being the two strings. However, we
don't want these new SV's to stick around forever because they will
eventually be copied into the SV's that hold the two scalar variables.
An SV (or AV or HV) that is "mortal" acts in all ways as a normal
"immortal" SV, AV, or HV, but is only valid in the "current context".
When the Perl interpreter leaves the current context, the mortal SV,
AV, or HV is automatically freed. Generally the "current context"
means a single Perl statement.
To create a mortal variable, use the functions:
SV* sv_newmortal()
SV* sv_2mortal(SV*)
SV* sv_mortalcopy(SV*)
The first call creates a mortal SV, the second converts an existing SV
to a mortal SV, the third creates a mortal copy of an existing SV.
The mortal routines are not just for SV's -- AV's and HV's can be made
mortal by passing their address (and casting them to SV*) to the
sv_2mortal or sv_mortalcopy routines.
From Ilya: Beware that the sv_2mortal() call is eventually equivalent
to svREFCNT_dec(). A value can happily be mortal in two different
contexts, and it will be svREFCNT_dec()ed twice, once on exit from
these contexts. It can also be mortal twice in the same context. This
means that you should be very careful to make a value mortal exactly as
many times as it is needed. The value that go to the Perl stack should
be mortal.
You should be careful about creating mortal variables. It is possible
for strange things to happen should you make the same value mortal
within multiple contexts.
Stashes and Objects
A stash is a hash table (associative array) that contains all of the
different objects that are contained within a package. Each key of the
stash is a symbol name (shared by all the different types of objects
that have the same name), and each value in the hash table is called a
GV (for Glob Value). This GV in turn contains references to the
various objects of that name, including (but not limited to) the
following:
Scalar Value
Array Value
Hash Value
File Handle
Directory Handle
Format
Subroutine
Perl stores various stashes in a separate GV structure (for global
variable) but represents them with an HV structure. The keys in this
larger GV are the various package names; the values are the GV*'s which
are stashes. It may help to think of a stash purely as an HV, and that
the term "GV" means the global variable hash.
To get the stash pointer for a particular package, use the function:
HV* gv_stashpv(char* name, I32 create)
HV* gv_stashsv(SV*, I32 create)
The first function takes a literal string, the second uses the string
stored in the SV. Remember that a stash is just a hash table, so you
get back an HV*.
The name that gv_stash*v wants is the name of the package whose symbol
table you want. The default package is called main. If you have
multiply nested packages, pass their names to gv_stash*v, separated by
:: as in the Perl language itself.
Alternately, if you have an SV that is a blessed reference, you can
find out the stash pointer by using:
HV* SvSTASH(SvRV(SV*));
then use the following to get the package name itself:
char* HvNAME(HV* stash);
If you need to return a blessed value to your Perl script, you can use
the following function:
SV* sv_bless(SV*, HV* stash)
where the first argument, an SV*, must be a reference, and the second
argument is a stash. The returned SV* can now be used in the same way
as any other SV.
For more information on references and blessings, consult the perlref
manpage.
Magic
[This section still under construction. Ignore everything here. Post
no bills. Everything not permitted is forbidden.]
# Version 6, 1995/1/27
Any SV may be magical, that is, it has special features that a normal
SV does not have. These features are stored in the SV structure in a
linked list of struct magic's, typedef'ed to MAGIC.
struct magic {
MAGIC* mg_moremagic;
MGVTBL* mg_virtual;
U16 mg_private;
char mg_type;
U8 mg_flags;
SV* mg_obj;
char* mg_ptr;
I32 mg_len;
};
Note this is current as of patchlevel 0, and could change at any time.
Assigning Magic
Perl adds magic to an SV using the sv_magic function:
void sv_magic(SV* sv, SV* obj, int how, char* name, I32 namlen);
The sv argument is a pointer to the SV that is to acquire a new magical
feature.
If sv is not already magical, Perl uses the SvUPGRADE macro to set the
SVt_PVMG flag for the sv. Perl then continues by adding it to the
beginning of the linked list of magical features. Any prior entry of
the same type of magic is deleted. Note that this can be overriden,
and multiple instances of the same type of magic can be associated with
an SV.
The name and namlem arguments are used to associate a string with the
magic, typically the name of a variable. namlem is stored in the mg_len
field and if name is non-null and namlem >= 0 a malloc'd copy of the
name is stored in mg_ptr field.
The sv_magic function uses how to determine which, if any, predefined
"Magic Virtual Table" should be assigned to the mg_virtual field. See
the "Magic Virtual Table" section below.
The obj argument is stored in the mg_obj field of the MAGIC structure.
If it is not the same as the sv argument, the reference count of the
obj object is incremented. If it is the same, or if the how argument
is "#", or if it is a null pointer, then obj is merely stored, without
the reference count being incremented.
Magic Virtual Tables
The mg_virtual field in the MAGIC structure is a pointer to a MGVTBL,
which is a structure of function pointers and stands for "Magic Virtual
Table" to handle the various operations that might be applied to that
variable.
The MGVTBL has five pointers to the following routine types:
int (*svt_get)(SV* sv, MAGIC* mg);
int (*svt_set)(SV* sv, MAGIC* mg);
U32 (*svt_len)(SV* sv, MAGIC* mg);
int (*svt_clear)(SV* sv, MAGIC* mg);
int (*svt_free)(SV* sv, MAGIC* mg);
This MGVTBL structure is set at compile-time in perl.h and there are
currently 19 types (or 21 with overloading turned on). These different
structures contain pointers to various routines that perform additional
actions depending on which function is being called.
Function pointer Action taken
----------------------------
svt_get Do something after the value of the SV is retrieved.
svt_set Do something after the SV is assigned a value.
svt_len Report on the SV's length.
svt_clear Clear something the SV represents.
svt_free Free any extra storage associated with the SV.
For instance, the MGVTBL structure called vtbl_sv (which corresponds to
an mg_type of '\0') contains:
{ magic_get, magic_set, magic_len, 0, 0 }
Thus, when an SV is determined to be magical and of type '\0', if a get
operation is being performed, the routine magic_get is called. All the
various routines for the various magical types begin with magic_.
The current kinds of Magic Virtual Tables are:
mg_type MGVTBL Type of magicalness
--------------------------------
\0 vtbl_sv Regexp???
A vtbl_amagic Operator Overloading
a vtbl_amagicelem Operator Overloading
c 0 Used in Operator Overloading
B vtbl_bm Boyer-Moore???
E vtbl_env %ENV hash
e vtbl_envelem %ENV hash element
g vtbl_mglob Regexp /g flag???
I vtbl_isa @ISA array
i vtbl_isaelem @ISA array element
L 0 (but sets RMAGICAL) Perl Module/Debugger???
l vtbl_dbline Debugger?
P vtbl_pack Tied Array or Hash
p vtbl_packelem Tied Array or Hash element
q vtbl_packelem Tied Scalar or Handle
S vtbl_sig Signal Hash
s vtbl_sigelem Signal Hash element
t vtbl_taint Taintedness
U vtbl_uvar ???
v vtbl_vec Vector
x vtbl_substr Substring???
* vtbl_glob GV???
# vtbl_arylen Array Length
. vtbl_pos $. scalar variable
~ Reserved for extensions, but multiple extensions may clash
When an upper-case and lower-case letter both exist in the table, then
the upper-case letter is used to represent some kind of composite type
(a list or a hash), and the lower-case letter is used to represent an
element of that composite type.
Finding Magic
MAGIC* mg_find(SV*, int type); /* Finds the magic pointer of that type */
This routine returns a pointer to the MAGIC structure stored in the SV.
If the SV does not have that magical feature, NULL is returned. Also,
if the SV is not of type SVt_PVMG, Perl may core-dump.
int mg_copy(SV* sv, SV* nsv, char* key, STRLEN klen);
This routine checks to see what types of magic sv has. If the mg_type
field is an upper-case letter, then the mg_obj is copied to nsv, but
the mg_type field is changed to be the lower-case letter.
Double-Typed SV's
Scalar variables normally contain only one type of value, an integer,
double, pointer, or reference. Perl will automatically convert the
actual scalar data from the stored type into the requested type.
Some scalar variables contain more than one type of scalar data. For
example, the variable $! contains either the numeric value of errno or
its string equivalent from either strerror or sys_errlist[].
To force multiple data values into an SV, you must do two things: use
the sv_set*v routines to add the additional scalar type, then set a
flag so that Perl will believe it contains more than one type of data.
The four macros to set the flags are:
SvIOK_on
SvNOK_on
SvPOK_on
SvROK_on
The particular macro you must use depends on which sv_set*v routine you
called first. This is because every sv_set*v routine turns on only the
bit for the particular type of data being set, and turns off all the
rest.
For example, to create a new Perl variable called "dberror" that
contains both the numeric and descriptive string error values, you
could use the following code:
extern int dberror;
extern char *dberror_list;
SV* sv = perl_get_sv("dberror", TRUE);
sv_setiv(sv, (IV) dberror);
sv_setpv(sv, dberror_list[dberror]);
SvIOK_on(sv);
If the order of sv_setiv and sv_setpv had been reversed, then the macro
SvPOK_on would need to be called instead of SvIOK_on.
Calling Perl Routines from within C Programs
There are four routines that can be used to call a Perl subroutine from
within a C program. These four are:
I32 perl_call_sv(SV*, I32);
I32 perl_call_pv(char*, I32);
I32 perl_call_method(char*, I32);
I32 perl_call_argv(char*, I32, register char**);
The routine most often used is perl_call_sv. The SV* argument contains
either the name of the Perl subroutine to be called, or a reference to
the subroutine. The second argument consists of flags that control the
context in which the subroutine is called, whether or not the
subroutine is being passed arguments, how errors should be trapped, and
how to treat return values.
All four routines return the number of arguments that the subroutine
returned on the Perl stack.
When using any of these routines (except perl_call_argv), the
programmer must manipulate the Perl stack. These include the following
macros and functions:
dSP
PUSHMARK()
PUTBACK
SPAGAIN
ENTER
SAVETMPS
FREETMPS
LEAVE
XPUSH*()
For more information, consult the perlcall manpage.
Memory Allocation
It is strongly suggested that you use the version of malloc that is
distributed with Perl. It keeps pools of various sizes of unallocated
memory in order to more quickly satisfy allocation requests. However,
on some platforms, it may cause spurious malloc or free errors.
New(x, pointer, number, type);
Newc(x, pointer, number, type, cast);
Newz(x, pointer, number, type);
These three macros are used to initially allocate memory. The first
argument x was a "magic cookie" that was used to keep track of who
called the macro, to help when debugging memory problems. However, the
current code makes no use of this feature (Larry has switched to using
a run-time memory checker), so this argument can be any number.
The second argument pointer will point to the newly allocated memory.
The third and fourth arguments number and type specify how many of the
specified type of data structure should be allocated. The argument
type is passed to sizeof. The final argument to Newc, cast, should be
used if the pointer argument is different from the type argument.
Unlike the New and Newc macros, the Newz macro calls memzero to zero
out all the newly allocated memory.
Renew(pointer, number, type);
Renewc(pointer, number, type, cast);
Safefree(pointer)
These three macros are used to change a memory buffer size or to free a
piece of memory no longer needed. The arguments to Renew and Renewc
match those of New and Newc with the exception of not needing the
"magic cookie" argument.
Move(source, dest, number, type);
Copy(source, dest, number, type);
Zero(dest, number, type);
These three macros are used to move, copy, or zero out previously
allocated memory. The source and dest arguments point to the source
and destination starting points. Perl will move, copy, or zero out
number instances of the size of the type data structure (using the
sizeof function).
AUTHOR
Jeff Okamoto <okamoto@corp.hp.com>
With lots of help and suggestions from Dean Roehrich, Malcolm Beattie,
Andreas Koenig, Paul Hudson, Ilya Zakharevich, Paul Marquess, Neil
Bowers, Matthew Green, Tim Bunce, and Spider Boardman.
DATE
Version 19: 1995/4/26
3rd Berkeley DistributionPERLGUTS(1)