Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • the server (usually a linux box running RHEL; whether it's hardware or virtual is immaterial)
  • apache (plain and/or SSL), front-ending for one or more instances of...
  • apache-tomcat, our servlet container of choice
  • the applications themselves

Monitoring Requirements:

Web-based applications deployed for ISDA monitoring services can enjoy the expectation of a stable and monitored servlet container (currently tomcat) for their use.  Each servlet container must contain a special web application that is specifically designed to monitor all other applications that are deployed to the container.  This special web application will be referred to as the MONITOR for the remainder of this document.  The MONITOR must support service of a standard page retrievable by the URL path: /<container_id>-monitor, where <container_id> is the Tomcat container identifier.  This content of the response page must satisfy the following criteria:

  • if the MONITOR detects that an application has failed, the text of the response page must contain the word FAILURE.  It is also recommended (but not required) that the response page contain brief text as to the nature of the failure.
  • retreival completes within 30 seconds

Note that, if you only want the operations staff to monitor the availability of the servlet container, the MONITOR application only has to return a static page containing nothing more than "SUCCESS".  However, you are encouraged to make checks of application state and connectivity to any backend services you may utilize, so long response time of the MONITOR is within the expected response window. 

In any case, note that while verbose error messages (example: "FAILURE: backend data server lotsobits.mit.edu is offline") can be helpful for debugging purposes, operational staff response beyond ensuring a functional and health servlet container, restarting the application, and notifying the application developer is on a discretionary and workload-permitting basis.

Critical Monitoring: ("Urgent Response")

...

  • per-server, basic: ICMP ping
  • Wiki Markup
    per-server, apache health monitoring: a simple Nagios HTTP fetch test of a non-tomcat-served URL, /ping.html \[optional if server is SSL-only\]
  • Wiki Markup
    per-server, apache SSL health monitoring: a imple Nagios HTTPS fetch test of a non-tomcat-served URL, /ping.html \[optional for non-SSL servers\]
  • Wiki Markup
    per-application state monitoring: Nagios HTTP\[S\] fetch of <application-root>/ping.jsp*/<container_id>-monitor* <span style="color: #ff3300">(see applicationmonitoring requirements belowablow)</span>
  • per-replicated-service state monitoring: as above (ping.jsp), but accessed via an application root URL which is handled by the f5 load balancer

Notes:

  • tomcat and the applications served therefrom are fairly inseperable, rely on the same jvm, and thus share a test, but each server will generally have multiple separate containers for different application groups
  • Ideally, this critical reporting functionality will remain unused; see "Proactive Status Monitoring" below.

Application Requirements for Deployment:

Web-based applications deployed for ISDA monitoring services can enjoy the expectation of a stable and monitored servlet container (currently tomcat) for their use.  All applications must provide at least minimal self-reporting, however.  All applications must support service of a standard page retrievable from <application-root>/ping.jsp.  This page's content must satisfy the following criteria:

  • page text contains exactly one of "SUCCESS" or "FAILURE"
  • retreival completes within 30 seconds

Note that, if you just want operations staff to notice that your application has completely shut down, you as an application developer are welcome to make this page static content containing nothing more than "SUCCESS", but you are also welcome to make checks of application state and connectivity to any backend services you may utilize, so long as such checks do not impact the retreivability of ping.jsp well within the expected window.  If your test process may take an extended period which could exceed the ping.jsp retreival window but still not represent an error condition, this must not impact the timely retrievability of ping.jsp; your application must instead make any required checks asynchronously with ping.jsp used only to report on those checks.

...

  • .

Proactive Status Monitoring: ("Impending Doom")

...