gmond.conf(5) Ganglia Monitoring System gmond.conf(5)NAMEgmond.conf - configuration file for ganglia monitoring daemon (gmond)
DESCRIPTION
The gmond.conf file is used to configure the ganglia monitoring daemon
(gmond) which is part of the Ganglia Distributed Monitoring System.
SECTIONS AND ATTRIBUTES
All sections and attributes are case-insensitive. For example, name or
NAME or Name or NaMe are all equivalent.
Some sections can be included in the configuration file multiple times
and some sections are singular. For example, you can have only one
cluster section to define the attributes of the cluster being
monitored; however, you can have multiple udp_recv_channel sections to
allow gmond to receive message on multiple UDP channels.
cluster
There should only be one cluster section defined. This section
controls how gmond reports the attributes of the cluster that it is
part of.
The cluster section has four attributes: name, owner, latlong and url.
For example,
cluster {
name = "Millennium Cluster"
owner = "UC Berkeley CS Dept."
latlong = "N37.37 W122.23"
url = "http://www.millennium.berkeley.edu/"
}
The name attributes specifies the name of the cluster of machines. The
owner tag specifies the administrators of the cluster. The pair
name/owner should be unique to all clusters in the world.
The latlong attribute is the latitude and longitude GPS coordinates of
this cluster on earth. Specified to 1 mile accuracy with two decimal
places per axis in decimal.
The url for more information on the cluster. Intended to give purpose,
owner, administration, and account details for this cluster.
There directives directly control the XML output of gmond. For
example, the cluster configuration example above would translate into
the following XML.
<CLUSTER NAME="Millennium Cluster" OWNER="UC Berkeley CS Dept."
LATLONG="N37.37 W122.23" URL="http://www.millennium.berkeley.edu/">
...
</CLUSTER>
host
The host section provides information about the host running this
instance of gmond. Currently only the location string attribute is
supported. Example:
host {
location = "1,2,3"
}
The numbers represent Rack, Rank and Plane respectively.
globals
The globals section controls general characteristics of gmond such as
whether is should daemonize, what user it should run as, whether is
should send/receive date and such. The globals section has the
following attributes: daemonize, setuid, user, debug_level, mute, deaf,
allow_extra_data, host_dmax, host_tmax, cleanup_threshold, gexec,
send_metadata_interval and module_dir.
For example,
globals {
daemonize = true
setuid = true
user = ganglia
host_dmax = 3600
host_tmax = 40
}
The daemonize attribute is a boolean. When true, gmond will daemonize.
When false, gmond will run in the foreground.
The setuid attribute is a boolean. When true, gmond will set its
effective UID to the uid of the user specified by the user attribute.
When false, gmond will not change its effective user.
The debug_level is an integer value. When set to zero (0), gmond will
run normally. A debug_level greater than zero will result in gmond
running in the foreground and outputting debugging information. The
higher the debug_level the more verbose the output.
The mute attribute is a boolean. When true, gmond will not send data
regardless of any other configuration directives.
The deaf attribute is a boolean. When true, gmond will not receive
data regardless of any other configuration directives.
The allow_extra_data attribute is a boolean. When false, gmond will
not send out the EXTRA_ELEMENT and EXTRA_DATA parts of the XML. This
might be useful if you are using your own frontend to the metric data
and will like to save some bandwith.
The host_dmax value is an integer with units in seconds. When set to
zero (0), gmond will never delete a host from its list even when a
remote host has stopped reporting. If host_dmax is set to a positive
number then gmond will flush a host after it has not heard from it for
host_dmax seconds. By the way, dmax means "delete max".
The host_tmax value is an integer with units in seconds. This value
represents the maximum amount of time that gmond should wait between
updates from a host. As messages may get lost in the network, gmond
will consider the host as being down if it has not received any
messages from it after 4 times this value. For example, if host_tmax is
set to 20, the host will appear as down after 80 seconds with no
messages from it. By the way, tmax means "timeout max".
The cleanup_threshold is the minimum amount of time before gmond will
cleanup any hosts or metrics where tn > dmax a.k.a. expired data.
The gexec boolean allows you to specify whether gmond will announce the
hosts availability to run gexec jobs. Note: this requires that gexecd
is running on the host and the proper keys have been installed.
The send_metadata_interval establishes an interval in which gmond will
send or resend the metadata packets that describe each enabled metric.
This directive by default is set to 0 which means that gmond will only
send the metadata packets at startup and upon request from other gmond
nodes running remotely. If a new machine running gmond is added to a
cluster, it needs to announce itself and inform all other nodes of the
metrics that it currently supports. In multicast mode, this isn't a
problem because any node can request the metadata of all other nodes in
the cluster. However in unicast mode, a resend interval must be
established. The interval value is the minimum number of seconds
between resends.
The override_hostname and override_ip parameters allow an arbitrary
hostname and/or IP (hostname can be optionally specified without IP) to
use when identifying metrics coming from this host.
The module_dir is an optional parameter indicating the directory where
the DSO modules are to be located. If absent, the value to use is set
at configure time with the --with-moduledir option which will default
if omitted to the a subdirectory named "ganglia" in the directory where
libganglia will be installed.
For example, in a 32-bit Intel compatible Linux host that is usually:
/usr/lib/ganglia
udp_send_channel
You can define as many udp_send_channel sections as you like within the
limitations of memory and file descriptors. If gmond is configured as
mute this section will be ignored.
The udp_send_channel has a total of seven attributes: mcast_join,
mcast_if, host, port, ttl, bind and bind_hostname. bind and
bind_hostname are mutually exclusive.
For example, the 2.5.x version gmond would send on the following single
channel by default...
udp_send_channel {
mcast_join = 239.2.11.71
port = 8649
}
The mcast_join and mcast_if attributes are optional. When specified
gmond will create the UDP socket and join the mcast_join multicast
group and send data out the interface specified by mcast_if.
You can use the bind attribute to bind to a particular local address to
be used as the source for the multicast packets sent or let gmond
resolve the default hostname if bind_hostname = yes.
If only a host and port are specified then gmond will send unicast UDP
messages to the hosts specified.
You could specify multiple unicast hosts for redundancy as gmond will
send UDP messages to all UDP channels.
Be careful though not to mix multicast and unicast attributes in the
same udp_send_channel definition.
For example...
udp_send_channel {
host = host.foo.com
port = 2389
}
udp_send_channel {
host = 192.168.3.4
port = 2344
}
would configure gmond to send messages to two hosts. The host
specification can be an IPv4/IPv6 address or a resolvable hostname.
The ttl attribute lets you modify the Time-To-Live (TTL) of outgoing
messages (unicast or multicast).
udp_recv_channel
You can specify as many udp_recv_channel sections as you like within
the limits of memory and file descriptors. If gmond is configured deaf
this attribute will be ignored.
The udp_recv_channel section has following attributes: mcast_join,
bind, port, mcast_if, family, retry_bind and buffer. The
udp_recv_channel can also have an acl definition (see ACCESS CONTROL
LISTS below).
For example, the 2.5.x gmond ran with a single udp receive channel...
udp_recv_channel {
mcast_join = 239.2.11.71
bind = 239.2.11.71
port = 8649
}
The mcast_join and mcast_if should only be used if you want to have
this UDP channel receive multicast packets the multicast group
mcast_join on interface mcast_if. If you do not specify multicast
attributes then gmond will simply create a UDP server on the specified
port.
You can use the bind attribute to bind to a particular local address.
The family address is set to inet4 by default. If you want to bind the
port to an inet6 port, you need to specify that in the family
attribute. Ganglia will not allow IPV6=>IPV4 mapping (for portability
and security reasons). If you want to listen on both inet4 and inet6
for a particular port, explicitly state it with the following:
udp_recv_channel {
port = 8666
family = inet4
}
udp_recv_channel {
port = 8666
family = inet6
}
If you specify a bind address, the family of that address takes
precedence. f your IPv6 stack doesn't support IPV6_V6ONLY, a warning
will be issued but gmond will continue working (this should rarely
happen).
Multicast Note: for multicast, specifying a bind address with the same
value used for mcast_join will prevent unicast UDP messages to the same
port from being processed.
The sFlow protocol (see http://www.sflow.org) can be used to collect a
standard set of performance metrics from servers. For servers that
don't include embedded sFlow agents, an open source sFlow agent is
available on SourceForge (see http://host-sflow.sourceforge.net).
To configure gmond to receive sFlow datagrams, simply add a
udp_recv_channel with the port set to 6343 (the IANA registered port
for sFlow):
udp_recv_channel {
port = 6343
}
Note: sFlow is unicast protocol, so don't include mcast_join join.
Note: To use some other port for sFlow, set it here and then specify
the port in an sflow section (see below).
gmond will fail to run if it can't bind to all defined
udp_recv_channels. Sometimes, on machines configured by DHCP, for
example, the gmond daemon starts before a network address is assigned
to the interface. Consequently, the bind fails and the gmond daemon
does not run. To assist in this situation, the boolean parameter
retry_bind can be set to the value true and then the daemon will not
abort on failure, it will enter a loop and repeat the bind attempt
every 60 seconds:
udp_recv_channel {
port = 6343
retry_bind = true
}
If you have a large system with lots of metrics, you might experience
UDP drops. This happens when gmond is not able to process the UDP fast
enough from the network. In this case you might consider changing your
setup into a more distributed setup using aggregator gmond hosts.
Alternatively you can choose to create a bigger receive buffer:
udp_recv_channel {
port = 6343
buffer = 10485760
}
B<buffer> is specified in bytes, i.e.: 10485760 will allow 10MB UDP
to be buffered in memory.
Note: increasing buffer size will increase memory usage by gmond
tcp_accept_channel
You can specify as many tcp_accept_channel sections as you like within
the limitations of memory and file descriptors. If gmond is configured
to be mute, then these sections are ignored.
The tcp_accept_channel has the following attributes: bind, port,
interface, family and timeout. A tcp_accept_channel may also have an
acl section specified (see ACCESS CONTROL LISTS below).
For example, 2.5.x gmond would accept connections on a single TCP
channel.
tcp_accept_channel {
port = 8649
}
The bind address is optional and allows you to specify which local
address gmond will bind to for this channel.
The port is an integer than specifies which port to answer requests for
data.
The family address is set to inet4 by default. If you want to bind the
port to an inet6 port, you need to specify that in the family
attribute. Ganglia will not allow IPV6=>IPV4 mapping (for portability
and security reasons). If you want to listen on both inet4 and inet6
for a particular port, explicitly state it with the following:
tcp_accept_channel {
port = 8666
family = inet4
}
tcp_accept_channel {
port = 8666
family = inet6
}
If you specify a bind address, the family of that address takes
precedence. If your IPv6 stack doesn't support IPV6_V6ONLY, a warning
will be issued but gmond will continue working (this should rarely
happen).
The timeout attribute allows you to specify how many microseconds to
block before closing a connection to a client. The default is set to
-1 (blocking IO) and will never abort a connection regardless of how
slow the client is in fetching the report data.
The interface is not implemented at this time (use bind).
collection_group
You can specify as many collection_group section as you like within the
limitations of memory. A collection_group has the following
attributes: collect_once, collect_every and time_threshold. A
collection_group must also contain one or more metric sections.
The metric section has the following attributes: (one of name or
name_match; name_match is only permitted if pcre support is compiled
in), value_threshold and title. For a list of available metric names,
run the following command:
% gmond -m
Here is an example of a collection group for a static metric...
collection_group {
collect_once = yes
time_threshold = 1800
metric {
name = "cpu_num"
title = "Number of CPUs"
}
}
This collection_group entry would cause gmond to collect the cpu_num
metric once at startup (since the number of CPUs will not change
between reboots). The metric cpu_num would be send every 1/2 hour
(1800 seconds). The default value for the time_threshold is 3600
seconds if no time_threshold is specified.
The time_threshold is the maximum amount of time that can pass before
gmond sends all metrics specified in the collection_group to all
configured udp_send_channels. A metric may be sent before this
time_threshold is met if during collection the value surpasses the
value_threshold (explained below).
Here is an example of a collection group for a volatile metric...
collection_group {
collect_every = 60
time_threshold = 300
metric {
name = "cpu_user"
value_threshold = 5.0
title = "CPU User"
}
metric {
name = "cpu_idle"
value_threshold = 10.0
title = "CPU Idle"
}
}
This collection group would collect the cpu_user and cpu_idle metrics
every 60 seconds (specified in collect_every). If cpu_user varies by
5.0% or cpu_idle varies by 10.0%, then the entire collection_group is
sent. If no value_threshold is triggered within time_threshold seconds
(in this case 300), the entire collection_group is sent.
Each time the metric value is collected the new value is compared with
the old value collected. If the difference between the last value and
the current value is greater than the value_threshold, the entire
collection group is send to the udp_send_channels defined.
It's important to note that all metrics in a collection group are sent
even when only a single value_threshold is surpassed.
In addition a user friendly title can be substituted for the metric
name by including a title within the metric section.
By using the name_match parameter instead of name, it is possible to
use a single definition to configure multiple metrics that match a
regular expression. The perl compatible regular expression (pcre)
syntax is used. This approach is particularly useful for a series of
metrics that may vary in number between reboots (e.g. metric names that
are generated for each individual NIC or CPU core).
Here is an example of using the name_match directive to enable the
multicpu metrics:
metric {
name_match = "multicpu_([a-z]+)([0-9]+)"
value_threshold = 1.0
title = "CPU-\\2 \\1"
}
Note that in the example above, there are two matches: the alphabetical
match matches the variations of the metric name (e.g. idle, system)
while the numeric match matches the CPU core number. The second thing
to note is the use of substitutions within the argument to title.
If both name and name_match are specified, then name is ignored.
Modules
A modules section contains the parameters that are necessary to load a
metric module. A metric module is a dynamically loadable module that
extends the available metrics that gmond is able to collect. Each
modules section contains at least one module section. Within a module
section are the directives name, language, enabled, path and params.
The module name is the name of the module as determined by the module
structure if the module was developed in C/C++. Alternatively, the
name can be the name of the source file if the module has been
implemented in a interpreted language such as python. A language
designation must be specified as a string value for each module. The
language directive must correspond to the source code language in which
the module was implemented (ex. language = "python"). If a language
directive does not exist for the module, the assumed language will be
"C/C++". The enabled directive allows a metric module to be easily
enabled or disabled through the configuration file. If the enabled
directive is not included in the module configuration, the enabled
state will default to "yes". One thing to note is that if a module has
been disabled yet the metric which that module implements is still
listed as part of a collection group, gmond will produce a warning
message. However gmond will continue to function normally by simply
ignoring the metric. The path is the path from which gmond is expected
to load the module (C/C++ compiled dynamically loadable module only).
The params directive can be used to pass a single string parameter
directly to the module initialization function (C/C++ module only).
Multiple parameters can be passed to the module's initialization
function by including one or more param sections. Each param section
must be named and contain a value directive. Once a module has been
loaded, the additional metrics can be discovered by invoking gmond -m.
modules {
module {
name = "example_module"
language = "C/C++"
enabled = yes
path = "modexample.so"
params = "An extra raw parameter"
param RandomMax {
value = 75
}
param ConstantValue {
value = 25
}
}
}
sFlow
The sflow group is optional and has the following optional attributes:
udp_port, accept_vm_metrics, accept_http_metrics,
accept_memcache_metrics, accept_jvm_metrics,
multiple_http_instances,multiple_memcache_instances,
multiple_jvm_instances. By default, a udp_recv_channel on port 6343
(the IANA registered port for sFlow) is all that is required to accept
and process sFlow datagrams. To receive sFlow on some other port
requires both a udp_recv_channel for the other port and a udp_port
setting here. For example:
udp_recv_channel {
port = 7343
}
sflow {
udp_port = 7343
}
An sFlow agent running on a hypervisor may also be sending metrics for
its local virtual machines. By default these metrics are ignored, but
the accept_vm_metrics flag can be used to accept those metrics too,
and prefix them with an identifier for each virtual machine.
sflow {
accept_vm_metrics = yes
}
The sFlow feed may also contain metrics sent from HTTP or memcached
servers, or from Java VMs. Extra options can be used to ignore or
accept these metrics, and to indicate that there may be multiple
instances per host. For example:
sflow {
accept_http_metrics = yes
multiple_http_instances = yes
}
will allow the HTTP metrics, and also mark them with a distinguishing
identifier so that each instance can be trended separately. (If
multiple instances are reporting and this flag is not set, the results
are likely to be garbled.)
Include
This directive allows the user to include additional configuration
files rather than having to add all gmond configuration directives to
the gmond.conf file. The following example includes any file with the
extension of .conf contained in the directory conf.d as if the contents
of the included configuration files were part of the original
gmond.conf file. This allows the user to modularize their configuration
file. One usage example might be to load individual metric modules by
including module specific .conf files.
include ('/etc/ganglia/conf.d/*.conf')
ACCESS CONTROL
The udp_recv_channel and tcp_accept_channel directives can contain an
Access Control List (ACL). This ACL allows you to specify exactly
which hosts gmond process data from.
An example of an acl entry looks like
acl {
default = "deny"
access {
ip = 192.168.0.4
mask = 32
action = "allow"
}
}
This ACL will by default reject all traffic that is not specifically
from host 192.168.0.4 (the mask size for an IPv4 address is 32, the
mask size for an IPv6 address is 128 to represent a single host).
Here is another example
acl {
default = "allow"
access {
ip = 192.168.0.0
mask = 24
action = "deny"
}
access {
ip = ::ff:1.2.3.0
mask = 120
action = "deny"
}
}
This ACL will by default allow all traffic unless it comes from the two
subnets specified with action = "deny".
EXAMPLE
The default behavior for a 2.5.x gmond would be specified as...
udp_recv_channel {
mcast_join = 239.2.11.71
bind = 239.2.11.71
port = 8649
}
udp_send_channel {
mcast_join = 239.2.11.71
port = 8649
}
tcp_accept_channel {
port = 8649
}
To see the complete default configuration for gmond simply run:
% gmond -t
gmond will print out its default behavior in a configuration file and
then exit. Capturing this output to a file can serve as a useful
starting point for creating your own custom configuration.
% gmond -t > custom.conf
edit custom.conf to taste and then
% gmond -c ./custom.conf
SEE ALSOgmond(1).
NOTES
The ganglia web site is at http://ganglia.info/.
COPYRIGHT
Copyright (c) 2005 The University of California, Berkeley
ganglia/3.6.0 2013-05-07 gmond.conf(5)