A European manufacturing company needed a reliable way to monitor their mirrored SQL Server systems.
This manufacturer took the necessary steps to make their various business critical production systems highly available
by mirroring their SQL Servers.
They then discovered a distinct lack of suitable tools to monitor the status of the
mirroring and to raise the necessary alerts when it either failed or was not operating within desirable parameters.
The requirement was to have a sophisticated, yet simple monitoring and alerting system. It needed to require no IT
infrastructure support, be simple to install and extremely lightweight.
Develop a batch based, single executable monitoring system with e-mail alerting on problems and a heartbeat system to
ensure it was always running.
- Dotnet Framework
- Visual Studio
- SentiLAN Heartbeat Monitor
24x7 monitoring of the mirrored SQL Servers to ensure the transactions are being mirrored and the system
performance is within acceptable parameters.
This client recognised the need to make the database servers for their manufacturing systems highly available through SQL Mirroring.
The servers are housed in geographically separate locations and connected by high-speed leased line. SQL Server Mirroring allows
the transactions to be replicated in real-time from the primary to mirror server. In the event of a link failure, these transactions
are queued until such time as the link is reinstated. The queued transactions are then applied to the mirror server and transactional
This process is presided over by a third server - the witness server. This server is responsible for identifying the failure of the primary server and via a quorum
vote, instructing the mirror server to take over live operations. While it does not participate in the transactional replication, its role is still important in the
overall running of the system.
This type of system has to be monitored to ensure all the components are functioning properly and efficiently. There is no point in having the primary
server fail, only to discover that the backup had failed some time back and nobody had noticed. Equally disastrous would be a backup server that was
failing to keep up to date with the transactions being replicated from the primary. In the event of a failover, there then becomes a discrepancy in the
data. Finally, a successful failover from primary to backup which goes unnoticed can leave the company at risk if no-one knows the primary requires attention.
However, they were to subsequently discover a considerable lack of monitoring tools for this type of solution.
Their requirement was to have a suitable monitoring tool developed quickly. It needed to place no demands on their internal IT team or infrastructure
and had to run in the existing SQL Server environment. This meant it had to be extremely lightweight, easy to install and simple to configure.
The monitoring tool had to not only monitor the health of the transactional mirror, but also the general availability of the three servers, the
performance of the replication and a few other general health indicators such as disk space.
We elected to develop a simple console utility which could be triggered by the built-in operating system scheduler every few minutes. It performs the following functions:
- Ping all servers for basic availability. Alert if the ping does not respond, or latency is above acceptable limits.
- Check the disk space on all servers. Alert if space is below acceptable limits.
- Make a database connection to each server. Alert if the connection fails.
- Check the mirror status of all mirrored databases. Alert if the mirror is degraded or failed over.
- Check the size of the replication queues and the rate they are being processed. Alert if outside acceptable limits.
The application maintains state information between executions in order to calculate the replication rates.
In the event a problem is discovered, a detailed e-mail is dispatched to key support/business support personnel as well as to the Exmos support team/developers.
One thing that was not desirable was an e-mail every few minutes indicating everything was OK. At the same time, it was important not to assume that
no news was good news. If the server on which the monitoring was being performed was to fail, then the utility would no longer be checking the status.
We therefore integrated the utility into our SentiLAN Heartbeat system. This allows us to receive heartbeat transmissions by a variety of mechanisms to
ensure that various systems and tools are functioning. In this case, the utility sends a very small e-mail each time it runs. This is received via
a custom SMTP Server at Exmos and fed into our heartbeat system. Should the utility fail to run for any reason, the Exmos support/development people
are notified via a dashboard application which runs on all desktops.
The customer has two separate SQL Mirror systems (one in the UK and one in Europe) which it is comfortable in knowing they are monitored 24x7. Although they
have not had a real world disaster which would cause a failover, they have run a number of disaster recovery tests and the utility has alerted exactly as
expected. This is in both recognising the failover and subsequent return to normality once the failback was completed.
It has however been triggered due to unplanned system maintenance and also identified occasional latency issues between the locations.
Finally, it requires no sophisticated IT infrastructure to support it.