MDOC(3) BSD Library Functions Manual MDOC(3)NAME
mdoc, mdoc_alloc, mdoc_endparse, mdoc_free, mdoc_meta, mdoc_node,
mdoc_parseln, mdoc_reset — mdoc macro compiler library
SYNOPSIS
#include <mandoc.h>
#include <mdoc.h>
extern const char * const * mdoc_macronames;
extern const char * const * mdoc_argnames;
int
mdoc_addspan(struct mdoc *mdoc, const struct tbl_span *span);
struct mdoc *
mdoc_alloc(struct regset *regs, void *data, mandocmsg msgs);
int
mdoc_endparse(struct mdoc *mdoc);
void
mdoc_free(struct mdoc *mdoc);
const struct mdoc_meta *
mdoc_meta(const struct mdoc *mdoc);
const struct mdoc_node *
mdoc_node(const struct mdoc *mdoc);
int
mdoc_parseln(struct mdoc *mdoc, int line, char *buf);
int
mdoc_reset(struct mdoc *mdoc);
DESCRIPTION
The mdoc library parses lines of mdoc(7) input into an abstract syntax
tree (AST).
In general, applications initiate a parsing sequence with mdoc_alloc(),
parse each line in a document with mdoc_parseln(), close the parsing ses‐
sion with mdoc_endparse(), operate over the syntax tree returned by
mdoc_node() and mdoc_meta(), then free all allocated memory with
mdoc_free(). The mdoc_reset() function may be used in order to reset the
parser for another input sequence.
Types
struct mdoc
An opaque type. Its values are only used privately within the library.
struct mdoc_node
A parsed node. See Abstract Syntax Tree for details.
Functions
If mdoc_addspan(), mdoc_parseln(), or mdoc_endparse() return 0, calls to
any function but mdoc_reset() or mdoc_free() will raise an assertion.
mdoc_addspan()
Add a table span to the parsing stream. Returns 0 on failure, 1 on suc‐
cess.
mdoc_alloc()
Allocates a parsing structure. The data pointer is passed to msgs.
Always returns a valid pointer. The pointer must be freed with
mdoc_free().
mdoc_reset()
Reset the parser for another parse routine. After its use,
mdoc_parseln() behaves as if invoked for the first time. If it returns
0, memory could not be allocated.
mdoc_free()
Free all resources of a parser. The pointer is no longer valid after
invocation.
mdoc_parseln()
Parse a nil-terminated line of input. This line should not contain the
trailing newline. Returns 0 on failure, 1 on success. The input buffer
buf is modified by this function.
mdoc_endparse()
Signals that the parse is complete. Returns 0 on failure, 1 on success.
mdoc_node()
Returns the first node of the parse.
mdoc_meta()
Returns the document's parsed meta-data.
Variables
mdoc_macronames
An array of string-ified token names.
mdoc_argnames
An array of string-ified token argument names.
Abstract Syntax Tree
The mdoc functions produce an abstract syntax tree (AST) describing input
in a regular form. It may be reviewed at any time with mdoc_nodes();
however, if called before mdoc_endparse(), or after mdoc_endparse() or
mdoc_parseln() fail, it may be incomplete.
This AST is governed by the ontological rules dictated in mdoc(7) and
derives its terminology accordingly. "In-line" elements described in
mdoc(7) are described simply as "elements".
The AST is composed of struct mdoc_node nodes with block, head, body,
element, root and text types as declared by the type field. Each node
also provides its parse point (the line, sec, and pos fields), its posi‐
tion in the tree (the parent, child, nchild, next and prev fields) and
some type-specific data, in particular, for nodes generated from macros,
the generating macro in the tok field.
The tree itself is arranged according to the following normal form, where
capitalised non-terminals represent nodes.
ROOT ← mnode+
mnode ← BLOCK | ELEMENT | TEXT
BLOCK ← HEAD [TEXT] (BODY [TEXT])+ [TAIL [TEXT]]
ELEMENT ← TEXT*
HEAD ← mnode*
BODY ← mnode* [ENDBODY mnode*]
TAIL ← mnode*
TEXT ← [[:printable:],0x1e]*
Of note are the TEXT nodes following the HEAD, BODY and TAIL nodes of the
BLOCK production: these refer to punctuation marks. Furthermore,
although a TEXT node will generally have a non-zero-length string, in the
specific case of ‘.Bd -literal’, an empty line will produce a zero-length
string. Multiple body parts are only found in invocations of ‘Bl
-column’, where a new body introduces a new phrase.
Badly-nested Blocks
The ENDBODY node is available to end the formatting associated with a
given block before the physical end of that block. It has a non-null end
field, is of the BODY type, has the same tok as the BLOCK it is ending,
and has a pending field pointing to that BLOCK's BODY node. It is an
indirect child of that BODY node and has no children of its own.
An ENDBODY node is generated when a block ends while one of its child
blocks is still open, like in the following example:
.Ao ao
.Bo bo ac
.Ac bc
.Bc end
This example results in the following block structure:
BLOCK Ao
HEAD Ao
BODY Ao
TEXT ao
BLOCK Bo, pending -> Ao
HEAD Bo
BODY Bo
TEXT bo
TEXT ac
ENDBODY Ao, pending -> Ao
TEXT bc
TEXT end
Here, the formatting of the ‘Ao’ block extends from TEXT ao to TEXT ac,
while the formatting of the ‘Bo’ block extends from TEXT bo to TEXT bc.
It renders as follows in -Tascii mode:
<ao [bo ac> bc] end
Support for badly-nested blocks is only provided for backward compatibil‐
ity with some older mdoc(7) implementations. Using badly-nested blocks
is strongly discouraged: the -Thtml and -Txhtml front-ends are unable to
render them in any meaningful way. Furthermore, behaviour when encoun‐
tering badly-nested blocks is not consistent across troff implementa‐
tions, especially when using multiple levels of badly-nested blocks.
EXAMPLES
The following example reads lines from stdin and parses them, operating
on the finished parse tree with parsed(). This example does not error-
check nor free memory upon failure.
struct regset regs;
struct mdoc *mdoc;
const struct mdoc_node *node;
char *buf;
size_t len;
int line;
bzero(®s, sizeof(struct regset));
line = 1;
mdoc = mdoc_alloc(®s, NULL, NULL);
buf = NULL;
alloc_len = 0;
while ((len = getline(&buf, &alloc_len, stdin)) >= 0) {
if (len && buflen[len - 1] = '\n')
buf[len - 1] = '\0';
if ( ! mdoc_parseln(mdoc, line, buf))
errx(1, "mdoc_parseln");
line++;
}
if ( ! mdoc_endparse(mdoc))
errx(1, "mdoc_endparse");
if (NULL == (node = mdoc_node(mdoc)))
errx(1, "mdoc_node");
parsed(mdoc, node);
mdoc_free(mdoc);
To compile this, execute
% cc main.c libmdoc.a libmandoc.a
where main.c is the example file.
SEE ALSOmandoc(1), mdoc(7)AUTHORS
The mdoc library was written by Kristaps Dzonsons ⟨kristaps@bsd.lv⟩.
BSD January 7, 2011 BSD