DIS(6)DIS(6)NAMEdis - Dis object file
DESCRIPTION
A Dis object file contains the executable form of a single module, and
conventionally uses the file suffix .dis.
The following names are used in the description of the file encoding.
byte An unsigned 8-bit byte.
word A 32-bit integer value represented in exactly 4 bytes.
long A 64-bit integer value represented in exactly 8 bytes.
operand An integer stored in a compact variable-length encoding
selected by the two most significant bits as follows:
0x signed 7 bits, 1 byte
10 signed 14 bits, 2 bytes
11 signed 30 bits, 4 bytes
string A variable length sequence of bytes terminated by a zero
byte. Names thus represented are in utf(6) format.
All integers are encoded in two's complement format, most significant
byte first.
Every object file has a header followed by five sections containing
code, data, and several sorts of descriptors:
header code-section type-section data-section module-name link-
section
Each section is described in turn below.
Header
The header contains a magic number, a digital signature, a flag word,
sizes of the other sections, and a description of the entry point. It
has the following format:
header:
magic signatureopt runflag stack-extent
code-size data-size type-size link-size entry-pc entry-type
magic, runflag:
operand
stack-extent, code-size, data-size, type-size, link-size:
operand
entry-pc, entry-type:
operand
The magic number is defined as 819248 (symbolically XMAGIC), for mod‐
ules that have not been signed cryptographically, and 923426 (symboli‐
cally SMAGIC), for modules that contain a signature. The symbolic
names XMAGIC and SMAGIC are defined by the C include file
/include/isa.h and by the Limbo module dis(2).
The signature is present only if the magic number is SMAGIC. It has
the form:
signature:
length signature-data
length:
operand
signature-data:
byte ...
A digital signature is defined by a length, followed by an array of
bytes of that length that contain the signature in some unspecified
format. Data within the signature should identify the signing author‐
ity, algorithm, and data to be signed.
Runflag is a bit mask that selects various execution options for a Dis
module. The flags currently defined are:
MUSTCOMPILE (1<<0)
The module must be compiled into native instructions for
execution (using a just-in-time compiler); if the imple‐
mentation cannot do that, the load instruction should
given an error.
DONTCOMPILE (1<<1)
The module should not be compiled into native instruc‐
tions, when that is the default for the runtime environ‐
ment, but should be interpreted. This flag may be set to
allow debugging or to save memory.
SHAREMP (1<<2)
Each instance of the module should use the same module
data for all instances of the module. There is no
implicit synchronisation between threads using the shared
data.
HASLDT (1<<4)
The dis file contains a separate import section. On older
versions of the system, this section was within the data
section.
HASEXCEPT (1<<5)
The dis file contains an exception handler section.
Stack-extent, if non-zero, gives the number of bytes by which the
thread stack of this module should be extended in the event that proce‐
dure calls exhaust the allocated stack. While stack extension is
transparent to programs, increasing this value may improve the effi‐
ciency of execution at the expense of using more memory.
Code-size, type-size and link-size give the number of entries (instruc‐
tions, type descriptors, linkage directives) in the corresponding sec‐
tions.
Data-size is the size in bytes of the module's global data area (not
the number of items in data-section).
Entry-pc is an integer index into the instruction stream that is the
default entry point for this module. It should point to the first
instruction of a function. Instructions are numbered from a program
counter value of zero.
Entry-type is the index of the type descriptor in the type section that
corresponds to the function entry point set by entry-pc.
Code Section
The code section describes a sequence of instructions for the virtual
machine. There are code-size instructions. An instruction is encoded
as follows:
instruction:
opcode address-mode middle-dataopt source-dataopt dest-dataopt
opcode, address-mode:
byte
middle-data:
operand
source-data, dest-data:
operand operandopt
The one byte opcode specifies the instruction to execute; opcodes are
defined by the virtual machine specification.
The address-mode byte specifies the addressing mode of each of the
three operands: middle, source and destination. The source and destina‐
tion operands are encoded by three bits and the middle operand by two
bits. The bits are packed as follows:
bit 7 6 5 4 3 2 1 0
m1 m0 s2 s1 s0 d2 d1 d0
The following definitions are used in the description of addressing
modes:
OP 30 bit integer operand
SO 16 bit unsigned small offset from register
SI 16 bit signed immediate value
LO 30 bit signed large offset from register
The middle operand is encoded as follows:
00 none
no middle operand
01 $SI
small immediate
10 SO(FP)
small offset indirect from FP
11 SO(MP)
small offset indirect from MP
The middle-data field is present only if the middle operand specifier
of the address-mode is not `none'. If the field is present it is
encoded as an operand.
The source and destination operands are encoded as follows:
000 LO(MP)
offset indirect from MP
001 LO(FP)
offset indirect from FP
010 $OP
30 bit immediate
011 none
no operand
100 SO(SO(MP))
double indirect from MP
101 SO(SO(FP))
double indirect from FP
110 reserved
111 reserved
The source-data and dest-data fields are present only when the corre‐
sponding address-mode field is not `none'. For offset indirect and
immediate modes the field contains a single operand value. For double
indirect modes the values are encoded as two operands: the first is the
register indirect offset, and the second is the final indirect offset.
The offsets for double indirect addressing cannot be larger than 16
bits.
Type Section
The type section contains type-size type descriptors describing the
layout of pointers within data types. The format of each descriptor is:
type-descriptor:
desc-number memsize mapsize map
desc-number, memsize, mapsize:
operand
map:
byte ...
The desc-number is a small integer index used to identify the descrip‐
tor to instructions such as new. Memsize is the size in bytes of the
memory described by this type.
The mapsize field gives the size in bytes of the following map array.
Map is an array of bytes representing a bit map where each bit corre‐
sponds to a word in memory. The most significant bit corresponds to
the lowest address. For each bit in the map, the word at the corre‐
sponding offset in the type is a pointer iff the bit is set to 1.
Data Section
The data section encodes the contents of the data segment for the mod‐
ule, addressed by MP at run-time. The section contains a sequence of
items of the following form:
data-item:
code countopt offset data-value ...
code:
byte
count, offset:
operand
Each item contains an offset into the section, followed by one or more
data values in a machine-independent encoding. As each value is placed
in the data segment, the offset is incremented by the size of the
datum.
The code byte has two 4-bit fields. The bottom 4 bits of code gives
the number of data-values if there are between 1 and 15; if there are
more than 15, the low-order field is zero, and a following operand
gives the count.
The top 4 bits of code encode the type of each data-value in the item,
which determines its encoding. The defined values are:
0001 8 bit bytes
0010 32 bit integers, one word each
0011 string value encoded by utf(6) in count bytes
0100 real values in IEEE754 canonical representation, 8 bytes
each
0101 Array, represented by two words giving type and length
0110 Set base for data items: one word giving an array index
0111 Restore base for data items: no operands
1000 64 bit big, 8 bytes each
The loader maintains a current base address and a stack of addresses.
Each item's value is stored at the address formed by adding the current
offset to the current base address. That address initially is the base
of the module's data segment. The `set base' operation immediately
follows an `array' data-item. It stacks the current base address and
sets the current base address to the address of the array element
selected by its operand. The `restore base' operation sets the current
base address to the address on the top of the stack, and pops the
stack.
Module name
The module name immediately follows the data section. It contains the
name of the module implemented by the object file, as a sequence of
bytes in UTF encoding, terminated by a zero byte.
Link Section
The link section contains an array of link-size external linkage items,
listing the functions exported by this module. Each variable-length
item contains the following:
link-item:
pc desc sig fn-name
pc, desc:
operand
sig:
word
fn-name:
string
Fn-name is the name of an exported function. Adt member functions
appear with their full names: the member name qualified by the adt
name, in the form adt-name.member-name, for instance Iobuf.gets.
Pc is the instruction number of its entry point. Desc is an index
value that selects a type descriptor in the type section, which gives
the type of the function's stack frame. Sig is an integer hash of the
type signature of the function, used in type checking.
Import Section
The optional import section lists all those functions imported from
other modules. This allows type checking at load time. The size of the
section in bytes is given at the start in operand form. For each module
imported there is a list of functions imported from that module. For
each function, its type signature (a word) is followed by a 0 termi‐
nated list of bytes representing its name.
Handler Section
The final optional section lists all exception handlers declared in the
module. The number of such handlers is given at the start of the sec‐
tion in operand form. For each one, its format is:
handler:
offset pc1 pc2 desc nlab exc-tab
offset, pc1, pc2, desc, nlab:
operand
exc-tab:
exc-name pc ... exc-name pc pc
exc-name:
string
pc:
operand
Each handler specifies the frame offset of its exception structure, the
range of pc values it covers (from pc1 up to but not including pc2),
the type descriptor of any memory that needs destroying by the handler
(or -1 if none), the number of exceptions in the handler and then the
exception table itself. The latter consists of a list of exception
names and the corresponding pc to jump to when this exception is
raised. This is then followed by the pc to jump to in any wildcard (*)
case or -1 if this is not applicable.
SEE ALSOasm(1), dis(2), sbl(6)
``The Dis Virtual Machine Specification'', Volume 2
DIS(6)