(Stonebranch) Automating Monitoring & Reviving Agent
Stonebranch agents as of version 6.4 or so do not have good control around the idea of whether they are connected to the controller or not, they do not stop or try to reconnect or restart if they become disconnected. This can be problematic as from your agent's side everything may be appear to be working at first glance but it may not be able to be reached by the controller.
To mitigate this problem the solution is straightforward. A script on the same machine as your agent runs to query the Stonebranch controller API for the environment you are connecting to to see if the controller is reading your agent as actively connected or not. If the response does not read active the script will try and restart your agent to re-initiate the connection.
A very simple monitor and revive for RHEL 7 boxs:
- Request a service account for stonebranch at: firstname.lastname@example.org
- Download https://github.austin.utexas.edu/eis1-mca/stonebranch/blob/master/playbooks/files/checkagent.sh
The script is set up for if you have three agents on a single box. If you only have one remove the environments that aren't applicable. Fill in the agent names and your service account and password on the script.
- Save this file on the machine with your agent in '/usr/local/bin/' as 'checkagent.sh' with executable permissions (700)
With `sudo crontab -e` add the following to your sudo crontab: https://github.austin.utexas.edu/eis1-mca/stonebranch/blob/master/playbooks/files/cronaddition