Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Web-based applications deployed for ISDA monitoring services can enjoy the expectation of a stable and monitored servlet container (currently tomcatTomcat) for their use.  Each servlet container must contain a special web application that is specifically designed to monitor all other applications that are deployed to the container.  This special web application will be referred to as the MONITOR for the remainder of this document.  The MONITOR must support service of a standard page retrievable by the URL path: /<container_id>-monitor, where <container_id> is the Tomcat servlet container identifier.  This The content of the response page must satisfy the following criteria:

  • if the MONITOR detects that an application has failed, the text of the response page must contain the word FAILURE.  It is also recommended (but not required) that the response page contain brief text as to the nature of the failure.
  • retreival completes within 30 secondssecond response window.

Note that, if you only want the operations staff to monitor the availability of the servlet container, the MONITOR application only has to return a static page containing nothing more than "SUCCESS".  However, you are encouraged to make checks of application state and connectivity to any backend services you may utilize, so long response time of the MONITOR is within the expected response window. 

...

  • per-server, basic: ICMP ping
  • Wiki Markup
    per-server, apache health monitoring: a simple Nagios HTTP fetch test of a non-tomcat-served URL, /ping.html \[optional if server is SSL-only\]
  • Wiki Markup
    per-server, apache SSL health monitoring: a imple Nagios HTTPS fetch test of a non-tomcat-served URL, /ping.html \[optional for non-SSL servers\]
  • Wiki Markup
    per-application state monitoring: Nagios HTTP\[S\] fetch of */<container_id>-monitor* <span style="color: #ff3300">(see monitoringMonitoring requirementsRequirements ablowabove)</span>

Notes:

  • tomcat Tomcat and the applications served therefrom are fairly inseperable, rely on the same jvm, and thus share a test, but each server will generally have multiple separate containers for different application groups
  • Ideally, this critical reporting functionality will remain unused ; (see " Proactive Status Monitoring " below).

Proactive Status Monitoring: ("Impending Doom")

Ideally, critical reporting functionality will remain unused, because nothing will ever crash.  Much of our infrastructure is reliable, but tomcat Tomcat is a notable exception, and we've found a number of ways for it to go south.  There are also some system parameters we'd like to track.  So, more monitoring:

  • tomcatTomcat: resident size for each running jvm
  • system memory statistics
  • system disk usage statistics
  • tomcat Tomcat crash detection and restart (no, this isn't just status monitoring)

Much of this will probably be packaged with scripts of our own devising, with daily reporting via email; perhaps more frequently for alarm states.  We're investigating tools from SourceLabs for tomcat Tomcat instance management.

Since tomcat Tomcat crashes due to transient problems or resource exhaustion have been all-too-common, we plan to have a crashed tomcat Tomcat automatically and immediately restarted once any crash state has been saved, though not ad nauseam.

...

IPS cannot set up end-to-end testing for every application we host or support.  We have constrained this requirement to web services for which we have assumed primary support.

...

  • thalia: "GET /libraries" from a valid domain; needs both Alfresco and the database running to return a library list.
  • mitid: isda-console1 currently has end-to-end testing implemented for this service.
  • geocodes: TBDisda-console1 currently has end-to-end testing implemented for this service.
  • UA: TBDisda-console1 currently has end-to-end testing implemented for this service.
  • roles: TBD