Maintaining the terracotta cluster on the IdPs

This page is under construction

The Shibboleth IdP uses terracotta for clustering (idp session state, etc.). Unfortunately, the terracotta software is not robust in all situations. It can have problems recovering if both nodes are restarted at the same time (e.g. following a power outage). Also, if the server loses contact with a client, e.g. if the client takes too long doing a garbage collection, the server can declare the client dead, and then refuse its subsequent reconnection attempts.

Restarting the cluster

Determine which node is active, by checking the node server state using the /usr/local/terracotta/bin/server-stat.sh script. The active server node will be in the ACTIVE-COORDINATOR state, e.g.:

# /usr/local/terracotta/bin/server-stat.sh
localhost.health: OK
localhost.role: ACTIVE
localhost.state: ACTIVE-COORDINATOR
localhost.jmxport: 9520

A passive node should be in the PASSIVE-STANDBY state:

# /usr/local/terracotta/bin/server-stat.sh
localhost.health: OK
localhost.role: PASSIVE
localhost.state: PASSIVE-STANDBY
localhost.jmxport: 9520

The active node should be restarted first. The passive node should detect this and take over the active role. Wait for the restarted node to enter the
PASSIVE-STANDBY state before proceeding. If it has a problem recovering state, it is likely due to corrupted data; check the server log for errors. In this case, it will not enter the STANDBY state, and manual intervention will be needed. First, try removing the terracotta server's "dirty" saved object data, e.g.:

# /etc/init.d/terracotta stop
# rm -rf /usr/local/shibboleth-idp/cluster/server/data/dirty-objectdb-backup/*
# /etc/init.d/terracotta start

If it still fails to recover to STANDBY state, stop the server again, and remove the object data, e.g.:

# /etc/init.d/terracotta stop
# rm -rf /usr/local/shibboleth-idp/cluster/server/data/objectdb/*
# /etc/init.d/terracotta start

Once this node reaches the PASSIVE-STANDBY state, you can proceed to restart the other node (newly active).

If both nodes are restarted at the same time, e.g. after a power failure, it is likely that manual intervention will be required, to clean up the data directories.

Child pages

Maintaining the terracotta cluster on the IdPs

Restarting the cluster