The platform uses both systemd and monit daemons to monitor all essential services. Since Sipwise C5 runs in an active/standby mode, not all services are always running on both nodes, some of them will only run on the active node and be stopped on the standby node. The following commands show the most critical services on the platform:
ngcp-service summary
- to get the list of services and their current status,
systemctl status
- to get a tree of the services running,
systemctl list-units
- to get a list of the service states,
monit summary
- to get the list of services known to monit and their current status,
monit status
- to get the list of services known to monit with detailed status.
important | |
When you perform a stop/start/monitor/unmonitor operation on a service, monit affects other services that depend on the initial one. Hence, if you stop or unmonitor a service all services that depend on it will be stopped or unmonitored as well. |
For example, monit stop mysql
operation will stop kamailio, sbc, asterisk,
prosody and some other services. Although the recommended way to operate on
services is via the ngcp-service
wrapper which will take care of
abstracting the underlying process monitoring implementation.
If any service ever fails for whatever reason either the systemd or
monit daemons will quickly restart it. When that happens, the daemon will
send a notification email to the address specified in the config.yml
file
under the general.adminmail
key. It will also send warning emails to this
address under certain abnormal conditions, such as high memory consumption
(> 75% is used) or high CPU load.
important | |
In order for monit to be able to send emails to the specified address, the local MTA (exim4) must be configured correctly. The CE edition’s handbook contains more information about this in the Installation chapter. |
The platform uses the Prometheus monitoring backend on new installations and on upgraded systems that have been migrated. On older systems the monitoring backend was InfluxDB, which is now deprecated.
The platform uses various monitoring backend services to monitor many aspects of the system, including CPU, memory, swap, disk, filesystem, network, processes, NTP, Nginx, Redis and MySQL.
The gathered information is stored in VictoriaMetrics which is a long-term storage backend for Prometheus. NOTE: Both VictoriaMetrics and Prometheus can act as the prometheus server implementation, and are mutually exclusive in their execution. On systems still using InfluxDB the information is stored in the telegraf database.
The platform uses the internal ngcp-witnessd service to monitor Sipwise C5 specific metrics or system metrics currently not tracked by the monitoring backend (either Prometheus exporters or the telegraf service when using the deprecated InfluxDB), including HA status, MTA, Kamailio, SIP and MySQL.
The gathered information is stored in VictoriaMetrics in the ngcp namespace, or in InfluxDB in the ngcp database.
tip | |
Some of the data gathering can be disabled (most are enabled by default)
through the |
The platform uses VictoriaMetrics as a long-term Prometheus time series database to store most of the metrics collected in the system. On systems still using InfluxDB the time series databases role is filled by InfluxDB itself.
On a Sipwise C5 each node stores its own metrics and the ones for their peer node, and in addition on CARRIER systems the management nodes store the metrics for all the nodes in the cluster. On new installations and migrated ones this is done with Prometheus instances on each peer, and a VictoriaMetrics instance on the management node which uses its Prometheus federation and scrapping support. On older installations this is done with influxdb-relay which listens for InfluxDB writes and multiplexes them to the local node and any other node necessary.
The monitoring data is used by various components of the platform, including ngcp-collective-check, ngcp-snmp-agent and by the statistics dashboard powered by Grafana.
The monitoring data can also be accessed directly by various means. On new installations by using the promtool command-line tool; or by using the HTTP API with curl (or other HTTP fetchers), or with the NGCP::Prometheus::HTTP perl module. On old installations by using the influx command-line tool in CLI or TUI modes; by using the ngcp-influxdb-extract wrapper which provides two convenience commands to run arbitrary queries or to fetch the last value for a measurement’s field; or by using the HTTP API with curl (or other HTTP fetchers), or with the NGCP::InfluxDB::HTTP perl module.
See Section 4, “Prometheus monitoring metrics” for detailed information about the list of ngcp namespaced metrics stored in the Prometheus monitoring database.
See Section 5, “InfluxDB monitoring keys” for detailed information about the list of data stored in the InfluxDB ngcp monitoring database.
See https://prometheus.io/docs/prometheus/latest/querying/basics/ for information about PromQL, the query language used by Prometheus.
tip | |
To get the list of all metrics for a specific namespace the following
query can be used |
See https://docs.influxdata.com/influxdb/v1.1/query_language/spec/ for information about InfluxQL, the query language used by InfluxDB.
tip | |
To get the list of all measurements for a specific database the following
query can be used |
The platform’s administration interface (described in Section 5, “VoIP Service Configuration Scenario”) provides a graphical overview based on Grafana of the most important system health indicators, such as memory usage, load averages and disk usage. VoIP statistics, such as the number of concurrent active calls, the number of provisioned and registered subscribers, etc. is also present.
The Sipwise C5 exports a variety of cluster health data and statistics
over the standard SNMP interface. By default, the SNMP interface can only be
accessed locally. To make it possible to provide the SNMP data to an external
system, the config.yml
file needs to be edited and the list of allowed
community names and allowed hosts/IP ranges must be populated. This list can be
found under the snmpd.communities
key and it consists of one or
more hashes of name
and sources
key/values. The community name
is
the allowed community name, while sources
is a list of IP address or
IP blocks where to allow the requests from.
The SNMP notifications (or traps) can also be configured in a similar way, to
send them to an external system, by populating the snmpd.trap_communities
key with name
and targets
key/values. The community trap name
is
the value that will be used when sending the trap, while the targets
is
a list of IP addresses where to send the trap.
The public
communities with the localhost
source and target are used
for local testing of SNMP functionality. It is recommended that you leave
these entries in place. Other legal sources
can be formed as single IP
addresses or IP blocks in IP/prefix notation, for example 192.168.115.0/24
.
Other targets
can be formed as single IP addresses.
The origin of the SNMP notifications for the SIPWISE MIB can also be configured
with the snmpagent.traps_origin
. The supported modes are:
mgmt
mode.
tip | |
To locally check if SNMP is working correctly, execute the command
|
tip | |
To locally check if SNMP notifications (or traps) are working correctly,
install the snmptrapd package, which will be configured by default to
catch the traps sent by the localhost SNMP agent. The traps will show up on
|
info | |
SNMP version 1 and version 2c are supported. |
There are two kinds of information that can be retrieved from SNMP OIDs (Object Identifiers). The first one is the native Sipwise C5 cluster overview from Sipwise C5 MIBs (Management Information Bases), which is available from the management nodes. The second is from the stock snmpd implementing the UCD (University of California, Davis) MIBs, which requires querying each individual node.
The entire Sipwise C5 cluster can be monitored from the management nodes
by using the SIPWISE-NGCP-MIB
and SIPWISE-NGCP-MONITOR-MIB
(SIPWISE-NGCP-STATS-MIB
is deprecated and should not be used anymore).
These OIDs are rooted at Sipwise C5 slot .1.3.6.1.4.1.34274.1.*
.
The MIBs are self-documented, and can be found as part of the
ngcp-snmp-mibs package (running dpkg -S SIPWISE*MIB
will list their
pathnames). The Sipwise C5 SNMP Agent is a part of the
ngcp-snmp-agent package, which is installed by default and works
out-of-the-box as long as the snmpd has been properly configured.
The SIPWISE-NGCP-MIB
acts as the root MIB and provides information
about the cluster licensing and layout (which is mostly static data about
each node, such as node name, its IP address, its roles, etc.) and information
required to access the OIDs from the other MIBs.
The SIPWISE-NGCP-MONITOR-MIB
provides current monitoring information,
global health conditions, the number of provisioned and registered subscribers
and devices. It also provides per node information (independently of the number
of nodes or their names) on their filesystem, processes, databases, system load,
memory, HA status, MTA queues, etc.
The SIPWISE-NGCP-STATS-MIB
is deprecated and has been superseded by the
SIPWISE-NGCP-MONITOR-MIB
.
info | |
OIDs under the following trees are not yet implemented: ngcpMonitorFraud, ngcpMonitorPerformance.sipStatsTable.sipCallAttemptsPerSecond. Deprecated OIDs are currently implemented but will eventually be obsoleted. Obsolete OIDs are not implemented and won’t be in the future. |
info | |
The Sipwise C5 SNMP Agent uses Redis and Prometheus or InfluxDB as data sources. This data is essential for accurate and complete monitoring data in the SNMP OID tree. In addition, the Redis database must be available on a shared IP address, so that ngcp-witnessd can always write to it. |
All basic system health variables (such as memory, disk, swap, CPU usage,
network statistics, process lists, etc.) for every node can also be found
in standard OID slots from standard MIBs from each node. For example, memory
statistics can be found through the UCD-SNMP-MIB in OIDs such as
memTotalSwap.0
, memAvailSwap.0
, memTotalReal.0
,
memAvailReal.0
, etc., which translate to numeric OIDs
.1.3.6.1.4.1.2021.4.*
. In fact, UCD-SNMP-MIB
is a useful MIB for
overall non-centralized system health checks.
Additionally, there is a list of specially monitored processes, also
found through the UCD-SNMP-MIB
. UCD-SNMP-MIB::prNames
(.1.3.6.1.4.1.2021.2.1.2
) gives the list of monitored processes,
prCount
(.1.3.6.1.4.1.2021.2.1.5
) is how many of each process are
running and prErrorFlag
(.1.3.6.1.4.1.2021.2.1.100
) gives a 0/1
error indication (with prErrMessage
(.1.3.6.1.4.1.2021.2.1.101
)
providing an explanation of any error).
tip | |
Some of these processes are not supposed to be running on the standby node, so you will see the error flag raised there. A possible solution is to run these SNMP checks against the shared service IP of the cluster. See in Section 2.7, “High Availability and Fail-Over” below for more information. |
important | |
Furthermore, Sipwise C5 used to provide platform specific
information via the |
UCD OID name | UCD check name | SIPWISE-NGCP OID name |
---|---|---|
UCD-SNMP-MIB::extNames.1 | collective_check | SIPWISE-NGCP-MONITOR-MIB::ngcpCollectiveCheckResult and SIPWISE-NGCP-MONITOR-MIB::ngcpCollectiveCheckOutput |
UCD-SNMP-MIB::extNames.2 | sip_check_sp1 | SIPWISE-NGCP-MONITOR-MIB::sipResponsiveness.* |
UCD-SNMP-MIB::extNames.3 | sip_check_sp2 | SIPWISE-NGCP-MONITOR-MIB::sipResponsiveness.* |
UCD-SNMP-MIB::extNames.4 | mysql_check_sp1 | SIPWISE-NGCP-MONITOR-MIB::dbQueryRate.* |
UCD-SNMP-MIB::extNames.5 | mysql_check_sp2 | SIPWISE-NGCP-MONITOR-MIB::dbQueryRate.* |
UCD-SNMP-MIB::extNames.6 | mysql_replication_check_sp1 | SIPWISE-NGCP-MONITOR-MIB::dbReplDelay.* |
UCD-SNMP-MIB::extNames.7 | mysql_replication_check_sp2 | SIPWISE-NGCP-MONITOR-MIB::dbReplDelay.* |
UCD-SNMP-MIB::extNames.8 | mpt_check_sp1 | Obsolete |
UCD-SNMP-MIB::extNames.9 | mpt_check_sp2 | Obsolete |
UCD-SNMP-MIB::extNames.10 | exim_queue_check_sp1 | SIPWISE-NGCP-MONITOR-MIB::mailQueue.* |
UCD-SNMP-MIB::extNames.11 | exim_queue_check_sp2 | SIPWISE-NGCP-MONITOR-MIB::mailQueue.* |
UCD-SNMP-MIB::extNames.12 | provisioned_subscribers_check_sp1 | SIPWISE-NGCP-MONITOR-MIB::ngcpClusterProvSubs |
UCD-SNMP-MIB::extNames.13 | provisioned_subscribers_check_sp2 | SIPWISE-NGCP-MONITOR-MIB::ngcpClusterProvSubs |
UCD-SNMP-MIB::extNames.14 | kam_dialog_active_check_sp1 | SIPWISE-NGCP-MONITOR-MIB::sipDialogActive.* |
UCD-SNMP-MIB::extNames.15 | kam_dialog_active_check_sp2 | SIPWISE-NGCP-MONITOR-MIB::sipDialogActive.* |
UCD-SNMP-MIB::extNames.16 | kam_dialog_early_check_sp1 | SIPWISE-NGCP-MONITOR-MIB::sipEarlyMedia.* |
UCD-SNMP-MIB::extNames.17 | kam_dialog_early_check_sp2 | SIPWISE-NGCP-MONITOR-MIB::sipEarlyMedia.* |
UCD-SNMP-MIB::extNames.18 | kam_dialog_type_local_check_sp1 | SIPWISE-NGCP-MONITOR-MIB::sipDialogLocal.* |
UCD-SNMP-MIB::extNames.19 | kam_dialog_type_local_check_sp2 | SIPWISE-NGCP-MONITOR-MIB::sipDialogLocal.* |
UCD-SNMP-MIB::extNames.20 | kam_dialog_type_relay_check_sp1 | SIPWISE-NGCP-MONITOR-MIB::sipDdialogRelay.* |
UCD-SNMP-MIB::extNames.21 | kam_dialog_type_relay_check_sp2 | SIPWISE-NGCP-MONITOR-MIB::sipDdialogRelay.* |
UCD-SNMP-MIB::extNames.22 | kam_dialog_type_incoming_check_sp1 | SIPWISE-NGCP-MONITOR-MIB::sipDdialogIncoming.* |
UCD-SNMP-MIB::extNames.23 | kam_dialog_type_incoming_check_sp2 | SIPWISE-NGCP-MONITOR-MIB::sipDdialogIncoming.* |
UCD-SNMP-MIB::extNames.24 | kam_dialog_type_outgoing_check_sp1 | SIPWISE-NGCP-MONITOR-MIB::sipDdialogOutgoing.* |
UCD-SNMP-MIB::extNames.25 | kam_dialog_type_outgoing_check_sp2 | SIPWISE-NGCP-MONITOR-MIB::sipDdialogOutgoing.* |
UCD-SNMP-MIB::extNames.26 | kam_usrloc_regusers_check_sp1 | SIPWISE-NGCP-MONITOR-MIB::ngcpClusterRegSubs |
UCD-SNMP-MIB::extNames.27 | kam_usrloc_regusers_check_sp2 | SIPWISE-NGCP-MONITOR-MIB::ngcpClusterRegSubs |
UCD-SNMP-MIB::extNames.28 | kam_usrloc_regdevices_check_sp1 | SIPWISE-NGCP-MONITOR-MIB::ngcpClusterRegDevs |
UCD-SNMP-MIB::extNames.29 | kam_usrloc_regdevices_check_sp2 | SIPWISE-NGCP-MONITOR-MIB::ngcpClusterRegDevs |
UCD-SNMP-MIB::extNames.30 | mysql_replication_discrepancies_check_sp1 | SIPWISE-NGCP-MONITOR-MIB::dbReplDiff.* |
UCD-SNMP-MIB::extNames.31 | mysql_replication_discrepancies_check_sp2 | SIPWISE-NGCP-MONITOR-MIB::dbReplDiff.* |
UCD-SNMP-MIB::extNames.32 | sip_check_self | SIPWISE-NGCP-MONITOR-MIB::sipResponsiveness.* |
UCD-SNMP-MIB::extNames.33 | mysql_check_self | SIPWISE-NGCP-MONITOR-MIB::dbQueryRate.* |
UCD-SNMP-MIB::extNames.34 | mysql_replication_check_self | SIPWISE-NGCP-MONITOR-MIB::dbReplDelay.* |
UCD-SNMP-MIB::extNames.35 | mpt_check_self | Obsolete |
UCD-SNMP-MIB::extNames.36 | exim_queue_check_self | SIPWISE-NGCP-MONITOR-MIB::mailQueue.* |
UCD-SNMP-MIB::extNames.37 | provisioned_subscribers_check_self | SIPWISE-NGCP-MONITOR-MIB::ngcpClusterProvSubs |
UCD-SNMP-MIB::extNames.38 | kam_dialog_active_check_self | SIPWISE-NGCP-MONITOR-MIB::sipDialogActive.* |
UCD-SNMP-MIB::extNames.39 | kam_dialog_early_check_self | SIPWISE-NGCP-MONITOR-MIB::sipEarlyMedia.* |
UCD-SNMP-MIB::extNames.40 | kam_dialog_type_local_check_self | SIPWISE-NGCP-MONITOR-MIB::sipDialogLocal.* |
UCD-SNMP-MIB::extNames.41 | kam_dialog_type_relay_check_self | SIPWISE-NGCP-MONITOR-MIB::sipDialogRelay.* |
UCD-SNMP-MIB::extNames.42 | kam_dialog_type_incoming_check_self | SIPWISE-NGCP-MONITOR-MIB::sipDialogIncoming.* |
UCD-SNMP-MIB::extNames.43 | kam_dialog_type_outgoing_check_self | SIPWISE-NGCP-MONITOR-MIB::sipDialogOutgoing.* |
UCD-SNMP-MIB::extNames.44 | kam_usrloc_regusers_check_self | SIPWISE-NGCP-MONITOR-MIB::ngcpClusterRegSubs |
UCD-SNMP-MIB::extNames.45 | kam_usrloc_regdevices_check_self | SIPWISE-NGCP-MONITOR-MIB::ngcpClusterRegDevs |
UCD-SNMP-MIB::extNames.46 | mysql_replication_discrepancies_check_self | SIPWISE-NGCP-MONITOR-MIB::dbReplDiff.* |
UCD-SNMP-MIB::extNames.47 | kam_dialog_type_local_check_prx0X | SIPWISE-NGCP-MONITOR-MIB::sipDialogLocal.* |
UCD-SNMP-MIB::extNames.48 | kam_dialog_type_relay_check_prx0X | SIPWISE-NGCP-MONITOR-MIB::sipDialogRelay.* |
UCD-SNMP-MIB::extNames.49 | kam_dialog_type_incoming_check_prx0X | SIPWISE-NGCP-MONITOR-MIB::sipDialogIncoming.* |
UCD-SNMP-MIB::extNames.50 | kam_dialog_type_outgoing_check_prx0X | SIPWISE-NGCP-MONITOR-MIB::sipDialogOutgoing.* |
UCD-SNMP-MIB::extNames.51 | kam_dialog_active_check_prx0X | SIPWISE-NGCP-MONITOR-MIB::sipDialogActive.* |
UCD-SNMP-MIB::extNames.52 | kam_dialog_early_check_prx0X | SIPWISE-NGCP-MONITOR-MIB::sipEarlyMedia.* |