segunda-feira, 7 de dezembro de 2009

PostgreSQL monitoring on ZABBIX

I've recently started to get to know ZABBIX [1] a little deeper, especially regarding PostgreSQL database servers monitoring. At first sight I thought it an incredible monitoring and notification system as it fulfills most of the requirements I wondered to have in Cedrus [2].

After setting up some basic OS-related items to be monitored, I was searching for PostgreSQL specific configurations and then I found a wiki [3] on ZABBIX UserParameters. It is indeed very simple! You just need to create SQL statements and then invoke them with psql. If successful, you could append the corresponding lines into ZABBIX agent configuration file and restart it. Then, in the front-end application, what is left to do is to properly set these PostgreSQL parameters to a given host.

I've created some additional parameters I always use in PostgreSQL instances in order to monitor the server health. In this case, I previously created a user called "zabbix" and a database with same name on PostgreSQL.

Here are the included lines on /etc/zabbix/zabbix_agentd.conf:

# PostgreSQL custom parameters

# instance version
UserParameter=pgsql.version,psql -U zabbix zabbix -Atc 'select version()'

# instance databases summary
UserParameter=pgsql.db.summary,psql -c "select a.datname, pg_size_pretty(pg_database_size(a.datid)) as size, cast(blks_hit/(blks_read+blks_hit+0.000001)*100.0 as numeric(5,2)) as cache, cast(xact_commit/(xact_rollback+xact_commit+0.000001)*100.0 as numeric(5,2)) as success from pg_stat_database a order by a.datname"

# total databases size
UserParameter=pgsql.db.totalsize,psql -Atc "select sum(pg_database_size(datid)) as total_size from pg_stat_database"

# specific database size (in bytes)
UserParameter=pgsql.db.size[*],psql -Atc "select pg_database_size('$1') as size"

# database cache hit ratio (percentage)
UserParameter=pgsql.db.cache[*],psql -Atc "select cast(blks_hit/(blks_read+blks_hit+0.000001)*100.0 as numeric(5,2)) as cache from pg_stat_database where datname = '$1'"

# database success rate (percentage)
UserParameter=pgsql.db.success[*],psql -Atc "select cast(xact_commit/(xact_rollback+xact_commit+0.000001)*100.0 as numeric(5,2)) as success from pg_stat_database where datname = '$1'"

After restarting ZABBIX Agent, you can check whether the added parameters are valid by issuing zabbix_get command in a shell. For instance, to query PostgreSQL instance version, type this:

$ zabbix_get -s localhost -k pgsql.version
PostgreSQL 8.3.3 on i486-pc-linux-gnu, compiled by GCC cc (GCC) 4.2.3 (Ubuntu 4.2.3-2ubuntu7)

This other script returns an overview of existing databases in the instance, highlighting their names, size in disk, cache hit ratio and percentage of successful transactions:

$ zabbix_get -s localhost -k pgsql.db.summary
datname | size | cache | success
auction5 | 4400 kB | 0.00 | 0.00
auditing | 4512 kB | 0.00 | 0.00
escola | 4656 kB | 0.00 | 0.00
postgres | 4223 kB | 99.81 | 100.00
rodrigo | 4144 kB | 0.00 | 0.00
template0 | 4144 kB | 0.00 | 0.00
template1 | 4144 kB | 0.00 | 0.00
zabbix | 10 MB | 99.99 | 100.00
zahle | 86 MB | 0.00 | 0.00
(9 registros)

In order to measure total space in disk occupied by the entire instance, this script should be used:

$ zabbix_get -s localhost -k pgsql.db.totalsize

On the other hand, to retrieve the space (in bytes) occupied by a single database, you just need to specify its name in the parameter key, as exemplified below:

$ zabbix_get -s localhost -k pgsql.db.size[auction5]

Likewise, to recover cache hit ratio (in percentage) of a single database, use this key:

$ zabbix_get -s localhost -k pgsql.db.cache[zabbix]

The percentage of successful transactions in relation to all attempts is given by this parameter:

$ zabbix_get -s localhost -k pgsql.db.success[postgres]

After testing these parameters, it is time to set them up onto ZABBIX Frontend as illustrated below:

It is very interesting to further add some traps and actions based on the configured items. For example, to send an email to the DBA every time a given database grows up faster than expected or whether its cache ratio starts to lower significantly.

At Joe Uhl's blog there is a post [4] concerning using ZABBIX to monitor PostgreSQL TPS (Transactions per Second). It is an interesting source as it explains how to configure deltas and graphs in ZABBIX.


[1] ZABBIX Monitoring Solution
[2] Cedrus: PostgreSQL Manager
[3] ZABBIX Wiki - PostgreSQL UserParameters
[4] Monitoring PostgreSQL TPS with Zabbix