Monitoring

Services

The HYPR environment has the following processes

  • Nginx for Reverse Proxy and SSL termination
  • JVM for HYPR Servers (embedded Tomcat)
  • Vault for Configuration 
  • Redis for In Memory Caching
  • MySQL for Persistent Storage

HYPR Server health checks

Adding health checks to the HYPR services helps ensure that the services are available. A health checker can poll the HTTP API endpoints at regular intervals. If the service is running normally it will send a response to the health check request.

For a conceptual overview of health checks see health check concepts

Servers behind a Load Balancer

The load balancer can make the health check requests at regular intervals. Servers which do not respond can be marked as unhealthy and taken out of service. Once the problem has been addressed and the servers are responding normally, they can be marked as healthy and return to service.

Server

Endpoint

Expected response

Control Center server

http://HOSTNAME:8009/health

Http 200 OK

Nginx web server

https://HOSTNAME/health

Http 200 OK

Since 3.8.0

The /health endpoints include a check of all the dependencies
Redis, Vault, and DB connection liveness is checked via a test write

📘

Response code

The API call returns a response code of 200. If any of the dependencies fail, a non 200 error code is returned

Sample response for /health

{
  "status": "UP",
  "components": {
    "db": {
      "status": "UP",
      "components": {
        "eventDataSource": {
          "status": "UP",
          "details": {
            "database": "MySQL",
            "validationQuery": "isValid()"
          }
        },
        "fido2DataSource": {
          "status": "UP",
          "details": {
            "database": "MySQL",
            "validationQuery": "isValid()"
          }
        },
        "rpDataSource": {
          "status": "UP",
          "details": {
            "database": "MySQL",
            "validationQuery": "isValid()"
          }
        },
        "uafDataSource": {
          "status": "UP",
          "details": {
            "database": "MySQL",
            "validationQuery": "isValid()"
          }
        }
      }
    },
    "discoveryComposite": {
      "description": "Discovery Client not initialized",
      "status": "UNKNOWN",
      "components": {
        "discoveryClient": {
          "description": "Discovery Client not initialized",
          "status": "UNKNOWN"
        }
      }
    },
    "diskSpace": {
      "status": "UP",
      "details": {
        "total": 107361579008,
        "free": 19792601088,
        "threshold": 10485760,
        "exists": true
      }
    },
    "ping": {
      "status": "UP"
    },
    "reactiveDiscoveryClients": {
      "description": "Discovery Client not initialized",
      "status": "UNKNOWN",
      "components": {
        "Simple Reactive Discovery Client": {
          "description": "Discovery Client not initialized",
          "status": "UNKNOWN"
        }
      }
    },
    "redis": {
      "status": "UP",
      "details": {
        "version": "4.0.13"
      }
    },
    "refreshScope": {
      "status": "UP"
    },
    "vault": {
      "status": "UP",
      "details": {
        "version": "0.10.3"
      }
    },
    "vaultReactive": {
      "status": "UP",
      "details": {
        "version": "0.10.3"
      }
    }
  }
}

OS Level Monitoring

Resource

Threshold

Level

Notes

CPU

Sustained usage > 90%

Critical

Significant slow down in server response time might cause timeouts

Memory

Free memory < 10%

Critical

HYPR services and dependencies have a max memory limit defined. Variance is likely to come from cache utilization.

Disk usage

Free space < 10%

Critical

Logs can cause disks to fill up. Logs are zipped up at regular intervals. The zipped log files can be moved or deleted if needed.

HYPR Service Dependencies

These are systemd services which can be monitored

Resource

Port check

Level

Notes

Systemd service check

Redis

Port monitoring on

  • 6379 (server)
  • 26379 (sentinel)

Medium

HYPR setup can detect and survive any 1 Redis node failure

hypr-redis

Vault

Port monitoring on
8200

Medium

Non critical process, only needed during app startup

hypr-vault

MySql DB

Port monitoring on
3306

Critical

Although not generally necessary, you may use a MySQL specific tool like MyTop to view more details on MYSQL performance metrics.

This is typically hosted as an external service and should be monitored accordingly

Nginx

Port monitoring on
443

Critical

Nginx failure will stop http requests being proxied to the HYPR process

hypr-nginx

Integration with a Third Party APM

Third party monitoring agents may be attached to the Java process for further insight. The exact command line depends upon the tool being used. Sample config is listed below.

DataDog

 /opt/hypr/ServerInstaller-6.11.0/jre/jdk-11.0.10+9-jre/bin/java \
 -javaagent:/opt/hypr/dd-java-agent.jar \
-Ddd.service.name=cc \
-Ddd.env=PROD1

App dynamics

/opt/hypr/ServerInstaller-6.11.0/jre/jdk-11.0.10+9-jre/bin/java \
-javaagent:/opt/hypr/javaagent.jar \
-Dappdynamics.agent.tierName=hypr-cc \
-Dappdynamics.agent.nodeName=PROD1"

Log monitoring

HYPR and third party logs are detailed here

To feed logs into external systems, use a log aggregator such as Fluentd or Logstash. This decouples the log handling and consumption from the core HYPR services.

You can also use Splunk to monitor and collect these logs.