Monitor Oracle Standby Databases In OCI With Custom Metrics

Monitoring Oracle Standby Database (Data Guard) has been always a tricky task. Just by the nature of them (the fact that the instance is not open in read-write mode) is hard to gather information about them. Even using specialized tools like Oracle Enterprise Manager requires SYSDBA credentials in order to effectively monitor them. But what about when running them on OCI?

Oracle OCI Monitoring service allow us to monitor cloud resources using metrics and alarms.

Oracle OCI Monitoring Service Architecture

In this post I want to show you how you can create a custom metric to monitor the “Apply Lag” on your Oracle Standby database, so you can create an alarm if it crosses a threshold.

I’m going to follow most of the steps detailed by Liu-Wei on this post “https://qiita.com/liu-wei/items/5e8e04f1e58cc6406ca9” .

Step 1 – Prerequisites

Add an API Key

First of all. You will have to designate an OCI user that has the proper permissions to access the Monitoring Service metrics and post them using custom metrics. This could be your account or a service account. Once you have designated this user, then login to the OCI console and choose the region where the Standby DB resides. Then click on the Profile icon and click on the account name.

Once there, scroll down and click on API Keys from the left menu.

Then click on the Add API Keys button.

Then generate the API Keys, save them nd store them in a secure place and click Add.

This will allow the script to login to the Monitoring Service in order to post custom metric data.

Create an OCI configuration file

For this exercise we will use the API Keys we just generated and we will create a config file in the host where the Standby DB is running using the oracle account.

I used the location /home/oracle/.oci in order to store the OCI config file and the private key. You may use another location depending on your internal standards.

Using the Configuration File Preview copy the contents and save them in the configuration file we are creating in the DB host.

This Preview already has the correct setting for the user, fingerprint, tenancy and region. However, you should amend the key_file setting. This setting is the path where your private key file is stored.

For this exercise it will be:

key_file=/home/oracle/.oci/mykey_private.pem

At the end of this, you should have 2 files. The config file and the private key file in the DB host.

[oracle]$ ls
config mykey_private.pem

Setup Python on DB Host

For this exercise we are going to use Python in order to consume the required REST APIs to post the metric data to the Monitoring Service.

Verify the Python installation on the DB Host using the oracle account. Python3 was already installed on this host.

[oracle]$ which python3
/bin/python3

However we need to install the oci module in Python. Before we install the oci module we need to upgrade pip in Python.

For this, logout from the oracle account and use the opc account. Execute the command below:

[opc]$ sudo pip3 install --upgrade pip

Login again with the oracle account and execute:

[oracle]$ pip3 install -U oci

This should install the oci module correctly.

Step 2 – Create the Python script

In this step we are going to create the Python script that connects to the Standby DB, gathers the Apply Lag and posts the data to the Monitoring service.

Copy the code below and paste it into a file name post_lag_value.py

#!/usr/bin/python3

# This is a sample python script that post a custom metric(lag_value) to oci monitoring.
# Run this script on the client that you want to monitor.
# Command: python post_lag_value.py

import oci,subprocess,os,datetime
from pytz import timezone

# using default configuration file (~/.oci/config)
from oci.config import from_file
config = from_file()

# initialize service client with default config file
monitoring_client = oci.monitoring.MonitoringClient(config,service_endpoint="https://telemetry-ingestion.us-ashburn-1.oraclecloud.com")

os.environ['ORACLE_HOME'] = "<YOUR ORACLE HOME>"
os.environ['ORACLE_SID'] = "<YOUR SID>"

def run_sqlplus(sqlplus_script):

    """
    Run a sql command or group of commands against
    a database using sqlplus.
    """

    p = subprocess.Popen(['<YOUR ORACLE HOME>/sqlplus','-s','/nolog'],stdin=subprocess.PIPE,
        stdout=subprocess.PIPE,stderr=subprocess.PIPE)
    (stdout,stderr) = p.communicate(sqlplus_script.encode('utf-8'))
    stdout_lines = stdout.decode('utf-8').split("\n")

    return stdout_lines

sqlplus_script="""
connect / as sysdba
set heading off
SELECT extract(day from p.val) *1440 + extract(hour from p.val)*60 +
extract(minute from p.val) + extract(second from p.val)/60 lag_minutes
from (SELECT name,to_dsinterval(value) val from v$dataguard_stats where name ='apply lag') p;
exit
"""

sqlplus_output = run_sqlplus(sqlplus_script)

for line in sqlplus_output:
     if line.strip():
         lag_value=float(line)

print(lag_value)

times_stamp = datetime.datetime.now(timezone('UTC'))

# post custom metric to oci monitoring
# replace "compartment_ocid string with your compartmet ocid
post_metric_data_response = monitoring_client.post_metric_data(
    post_metric_data_details=oci.monitoring.models.PostMetricDataDetails(
        metric_data=[
            oci.monitoring.models.MetricDataDetails(
                namespace="<YOUR CUSTOM NAMESPACE>",
                compartment_id="<YOUR COMPARTMENT ID>",
                name="<YOUR METRIC NAME>",
                dimensions={'server_id': '<YOUR SERVER ID>'},
                datapoints=[
                    oci.monitoring.models.Datapoint(
                        timestamp=datetime.datetime.strftime(
                            times_stamp,"%Y-%m-%dT%H:%M:%S.%fZ"),
                        value=lag_value)]
                )]
    )
)

# Get the data from response
print(post_metric_data_response.data)

Amend the inputs needed depending on your DB and OCI configuration:

  • <YOUR ORACLE HOME>
  • <YOUR SID>
  • <YOUR CUSTOM NAMESPACE>
  • <YOUR COMPARTMENT ID>
  • <YOUR METRIC NAME>
  • <YOUR SERVER ID>

One important thing to mention is the ingestion service endpoint. I’m using Ashburn as my region, therefore my ingestion endpoint is “https://telemetry-ingestion.us-ashburn-1.oraclecloud.com”. Yours should be different depending on your region.

https://docs.oracle.com/en-us/iaas/api/#/en/monitoring/20180401/

Next, let’s make the post_lag_value.py file executable.

[oracle]$ chmod +x post_lag_value.py

Let’s try our Python script.

./post_lag_value.py 
/home/oracle/.local/lib/python3.6/site-packages/oci/_vendor/httpsig_cffi/sign.py:10: CryptographyDeprecationWarning: Python 3.6 is no longer supported by the Python core team. Therefore, support for it is deprecated in cryptography. The next release of cryptography (40.0) will be the last to support Python 3.6.
  from cryptography.hazmat.backends import default_backend  # noqa: F401
0.0
{
  "failed_metrics": [],
  "failed_metrics_count": 0
}

As you can see from the output of the file, the current lag is “0.0” minutes and the failed_metrics_count is also “0”. This means that we successfully posted this data to the Monitoring service.

Let’s now find out if our custom metric is visible from the OCI console.

Using the hamburger menu navigate to “Observability & Management” and under the Monitoring Service click on Metrics Explorer.

Inside Metrics Explorer choose the correct Compartment, Namespace and metric. Remember that you provided them in the Python script. Verify you can see data in the graph.

The script is now posting Apply Lag data to the monitoring service.

Step 3 – Schedule

Now we need to schedule the execution of our Python script every “x” minutes. For this I’m using a Cron job. Follow the instructions in the MOS note to enable Cron. How To Use Crontab In OCI DBCS? (Doc ID 2639985.1)

My Cron looks as follows:

[opc]$ sudo cat /etc/crontab
SHELL=/bin/bash
PATH=/sbin:/bin:/usr/sbin:/usr/bin
MAILTO=root

# For details see man 4 crontabs

# Example of job definition:
# .---------------- minute (0 - 59)
# |  .------------- hour (0 - 23)
# |  |  .---------- day of month (1 - 31)
# |  |  |  .------- month (1 - 12) OR jan,feb,mar,apr ...
# |  |  |  |  .---- day of week (0 - 6) (Sunday=0 or 7) OR sun,mon,tue,wed,thu,fri,sat
# |  |  |  |  |
# *  *  *  *  * user-name  command to be executed


# System should configure AIDE for Periodic Execution 
05 4 * * * root /usr/sbin/aide --check

*/5 * * * * oracle /home/oracle/post_lag_value.py >> post_lag.log 2>&1

I schedule this Cron job every 5 minutes. You may adjust it to your desired frequency.

Step 4 – Create an Alarm

Go to the Monitoring service and create an Alarm using the Alarm Definitions option.

After this, we will have a notification when the Apply Lag is more than 60 minutes in our Standby DB.

This concludes this small exercise of monitoring the Apply Lag for a Standby Oracle Database using the OCI Monitoring service.

Hope this helps,
Alfredo