cfg(3) Configuration Parsing cfg(3)NAME
OSSP cfg - Configuration Parsing
VERSION
OSSP cfg 0.9.11 (10-Aug-2006)
SYNOPSIS
API Header:
cfg.h
API Types:
cfg_t, cfg_rc_t, cfg_node_type_t, cfg_node_t, cfg_node_attr_t,
cfg_fmt_t, cfg_data_t, cfg_data_ctrl_t, cfg_data_cb_t,
cfg_data_attr_t
API Functions:
cfg_create, cfg_destroy, cfg_error, cfg_version, cfg_import,
cfg_export, cfg_node_create, cfg_node_destroy, cfg_node_clone,
cfg_node_set, cfg_node_get, cfg_node_root, cfg_node_select,
cfg_node_find, cfg_node_apply, cfg_node_cmp, cfg_node_link,
cfg_node_unlink, cfg_data_set, cfg_data_get, cfg_data_ctrl
DESCRIPTION
OSSP cfg is a ISO-C library for parsing arbitrary C/C++-style configu‐
ration files. A configuration is sequence of directives. Each directive
consists of zero or more tokens. Each token can be either a string or
again a complete sequence. This means the configuration syntax has a
recursive structure and this way allows to create configurations with
arbitrarily nested sections.
Additionally the configuration syntax provides complex single/dou‐
ble/balanced quoting of tokens, hexadecimal/octal/decimal character
encodings, character escaping, C/C++ and Shell-style comments, etc. The
library API allows importing a configuration text into an Abstract Syn‐
tax Tree (AST), traversing the AST and optionally exporting the AST
again as a configuration text.
CONFIGURATION SYNTAX
The configuration syntax is described by the following context-free
(Chomsky-2) grammar:
sequence ::= empty
⎪ directive
⎪ directive SEP sequence
directive ::= token
⎪ token directive
token ::= OPEN sequence CLOSE
⎪ string
string ::= DQ_STRING # double quoted string
⎪ SQ_STRING # single quoted string
⎪ FQ_STRING # flexible quoted string
⎪ PT_STRING # plain text string
The other contained terminal symbols are defined itself by the follow‐
ing set of grammars production (regular sub-grammars for character
sequences given as Perl-style regular expressions "/regex/"):
SEP ::= /;/
OPEN ::= /{/
CLOSE ::= /}/
DQ_STRING ::= /"/ DQ_CHARS /"/
DQ_CHARS ::= empty
⎪ DQ_CHAR DQ_CHARS
DQ_CHAR ::= /\\"/ # escaped quote
⎪ /\\x\{[0-9a-fA-F]+\}/ # hex-char group
⎪ /\\x[0-9a-fA-F]{2}/ # hex-char
⎪ /\\[0-7]{1,3}/ # octal character
⎪ /\\[nrtbfae]/ # special character
⎪ /\\\n[ \t]*/ # line continuation
⎪ /\\\\/ # escaped escape
⎪ /./ # any other char
SQ_STRING ::= /'/ SQ_CHARS /'/
SQ_CHARS ::= empty
⎪ SQ_CHAR SQ_CHARS
SQ_CHAR ::= /\\'/ # escaped quote
⎪ /\\\n[ \t]*/ # line contination
⎪ /\\\\/ # escaped escape
⎪ /./ # any other char
FQ_STRING ::= /q/ FQ_OPEN FQ_CHARS FQ_CLOSE
FQ_CHARS ::= empty
⎪ FQ_CHAR FQ_CHARS
FQ_CHAR ::= /\\/ FQ_OPEN # escaped open
⎪ /\\/ FQ_CLOSE # escaped close
⎪ /\\\n[ \t]*/ # line contination
⎪ /./ # any other char
FQ_OPEN ::= /[!"#$%&'()*+,-./:;<=>?@\[\\\]^_`{⎪}~]/
FQ_CLOSE ::= << FQ_OPEN or corresponding closing char
('}])>') if FQ_OPEN is a char of '{[(<' >>
PT_STRING ::= PT_CHAR PT_CHARS
PT_CHARS ::= empty
⎪ PT_CHAR PT_STRING
PT_CHAR ::= /[^ \t\n;{}"']/ # none of specials
Additionally, white-space WS and comment CO tokens are allowed at any
position in the above productions of the previous grammar part.
WS ::= /[ \t\n]+/
CO ::= CO_C # style of C
⎪ CO_CXX # style of C++
⎪ CO_SH # style of /bin/sh
CO_C ::= /\/\*([^*]⎪\*(?!\/))*\*\//
CO_CXX ::= /\/\/[^\n]*/
CO_SH ::= /#[^\n]*/
Finally, any configuration line can have a trailing backslash character
(\) just before the newline character for simple line continuation.
The backslash, the newline and (optionally) the leading whitespaces on
the following line are silently obsorbed and as a side-effect continue
the first line with the contents of the second lines.
CONFIGURATION EXAMPLE
A more intuitive description of the configuration syntax is perhaps
given by the following example which shows all features at once:
/* single word */
foo;
/* multi word */
foo bar quux;
/* nested structure */
foo { bar; baz } quux;
/* quoted strings */
'foo bar'
"foo\x0a\t\n\
bar"
APPLICATION PROGRAMMING INTERFACE (API)
...
NODE SELECTION SPECIFICATION
The cfg_node_select function takes a node selection specification
string select for locating the intended nodes. This specification is
defined as:
select ::= empty
⎪ select-step select
select-step ::= select-direction
select-pattern
select-filter
select-direction ::= "./" # current node
⎪ "../" # parent node
⎪ "..../" # anchestor nodes
⎪ "-/" # previous sibling node
⎪ "--/" # preceeding sibling nodes
⎪ "+/" # next sibling node
⎪ "++/" # following sibling nodes
⎪ "/" # child nodes
⎪ "//" # descendant nodes
select-pattern ::= /</ regex />/
⎪ token
select-filter ::= empty
⎪ /\[/ filter-range /\]/
filter-range ::= num # short for: num,num
⎪ num /,/ # short for: num,-1
⎪ /,/ num # short for: 1,num
⎪ num /,/ num
num ::= /^[+-]?[0-9]+/
regex ::= << Regular Expression (PCRE-based) >>
token ::= << Plain-Text Token String >>
IMPLEMENTATION ISSUES
Goal: non-hardcoded syntax tokens, only hard-coded syntax structure
Goal: time-efficient parsing Goal: space-efficient storage Goal: repre‐
sentation of configuration as AST Goal: manipulation (annotation, etc)
of AST via API Goal: dynamic syntax verification
HISTORY
OSSP cfg was implemented in lots of small steps over a very long time.
The first ideas date back to the year 1995 when Ralf S. Engelschall
attended his first compiler construction lessons at university. But it
was first time finished in summer 2002 by him for use in the OSSP
project.
AUTHOR
Ralf S. Engelschall
rse@engelschall.com
www.engelschall.com
10-Aug-2006 OSSP cfg 0.9.11 cfg(3)