Overview
New Prezi Diagram
View Explanation on Prezi!
OR
There are a lot of moving parts, this poorly constructed swimlane is my best chance at explaining the inter-related use of APIs and Tasks.
Details
Build & Release pipeline for actual agent config (JVM settings, monit rules, build capabilities, etc) includes canary agents and smoke testing.
Uses dedicated canary agent for build pipeline to validate latest release before publishing to all agents.
While the canary agent is offline performing maintenance, the plan will wait to resume with smoke tests.
Configure optional Deployment Task to set the latest release version in Chaperone API
Now any remote agents will see a new version when they poll, and initiate upgrade process. Chaperone rules ensure no more than n agents are offline at once.
Click here to expand sample script to use this process.
Output from Sample script
Shows that process waited while one agent was already undergoing maintenance, and then as it waits again while a running build completes.
The Script
#!/bin/bash
# Agent ID can be hard coded, but is easily pulled from the running system
agentId=`cat ~/bamboo-agent-home/bamboo-agent.cfg.xml | grep -oPm1 "(?<=<id>)[^<]+"`
#check for required libraries/tools (optional)
command -v curl >/dev/null 2>&1 || { echo "Required tool 'curl' not found" ;exit 2; }
#grab uuid for this environment of local path (again this can ve hard coded depending on our use case)
uuid=`cat /opt/bamboo/management/token.uuid`
# make a tmp dir for cookie jar and other random files we'll make
TMPDIRD=`mktemp -d /tmp/agentMaintenance.XXXXXX` || exit 1
# Bamboo api is quirky and even for "anonymous" access you need a cookie with the site info. Just hit the homepage first
# WARNING any other API calls fail without a valid cookie !!!
curl -k -c $TMPDIRD/cookies "$bambooUrl" > /dev/null 2>&1
#
# EDIT THESE
#
bambooUrl="http://10.0.2.2:6990/bamboo" #domain, port and context without environment prefixes (currently mocked for testing. )
# Waiting rules if master reports too many other agents offline currently.
SIBLING_PATIENCE_TIME=60 # will wait 1 minutes between subsequent checks
SIBLING_PATIENCE_COUNT=15 # will check back 15 times before giving up.
#
# Edit this to do whatever is considered a local upgrade. It's called if server gives us permission
#
upgradeLocalAgent(){
# save taskID to a file we can use after restart
cp $TMPDIRD/state.txt ~/finishTaskOnStartup.txt
# Use version to download from repo
echo "downloading Agent version ${PVERSION}
#curl "https//some.repo.you.have/agent-install.extension?v=${PVERSION} # i.e. nexus/artifactory/fileshare rpm/tarball, etc.
# shutdown local process
echo "killing local bamboo agent"
pkill -9 -f bamboo-agent-home
# clear any previously set capabilities
echo "purging old capabilities"
curl -X DELETE -k -b $TMPDIRD/cookies "$bambooUrl/rest/agents/latest/$agentId/capabilities?uuid=${uuid}" -o $TMPDIRD/state.txt 2>/dev/null
# call commands, chef, puppet, docker, REAL WORK, etc here.
echo "beep bop, upgrading to version ${PVERSION} and saving task ID to file."
## this should include re-defining capabilities via bamboo-capabilities.properties
## clear any temporary space, old artifacts, etc.
## completely rebuild from an imaage in source (docker, etc)
echo "Install complete, rebooting server"
}
#
# DON'T CHANGE THIS STUFF
# (unless you know why :) )
#
## we use function recursion to track # of attempts before giving up
let attempts=1
checkBambooMaster(){
curl -X POST -k -b $TMPDIRD/cookies "$bambooUrl/rest/agents/latest/$agentId/maintenance?uuid=${uuid}" -o $TMPDIRD/state.txt 2>/dev/null
# MOnitor status until the agent is idle
source $TMPDIRD/state.txt
if [ "$PCODE" == "YES_CHILD" ]
then
#server says we can upgrade, make sure we are idle.
echo "Master server says I can upgrade to version ${PVERSION}"
if [ "$BUSY" == "true" ]; then
printf "\tAgent is still running a job, waiting ..\n"
# while polling, and is still running, slee
running=1
while [ $running -eq 1 ]
do
sleep 60
curl -k -b $TMPDIRD/cookies "$bambooUrl/rest/agents/latest/$agentId/state.text" -o $TMPDIRD/state.txt 2>/dev/null
source $TMPDIRD/state.txt
if [ "$BUSY" == "false" ]; then
printf "\tYay, agent is now idle!\n"
break
else
printf "\tstill busy..\n"
fi
done
fi
elif [ "$PCODE" == "WAIT_FOR_SIBLINGS" ]
then
# allowed to upgrae, but too many others are working right now, check back in a few
echo "Master server wants me to wait, this is my $attempts attempt."
echo "${PCODE}: ${PMESSAGE}"
if [ $attempts -gt $SIBLING_PATIENCE_COUNT ]
then
echo "Siblings have exhausted my patience. INcrease wait times, offset cycles, or increase concurrency"
exit 9
fi
let attempts+=1
sleep $SIBLING_PATIENCE_TIME
checkBambooMaster #will recurse back into this functuion
elif [ "$PCODE" == "NO_CHILD" ]
then
echo "Master server says I can not upgrade now."
echo "$PCODE: $PMESSAGE"
exit 0
elif [ "$PCODE" == "UH_OH" ]
then
echo "ERROR: Master server is reporting an issue."
echo "$PCODE: $PMESSAGE , existing Task ID: $TASK"
exit 0
else
echo "ERROR: I don't understand server response!"
cat $TMPDIRD/state.txt
exit 9
fi
}
# disable agent in bamboo if allowed
echo "Requesting maintenance window from master server ${bambooUrl}"
checkBambooMaster
# assume success as function above exits on NOs. So call out heavy lifting function for maintenance.
echo "Agent was given permission, and is now disabled and idle, starting upgrade"
upgradeLocalAgent
##
##
## some time passes as bits are chewed and copied
##
##
##
## Server is now restarted either by reboot or call to java -jar atlassian-agent....
##
echo "starting java wrapper.."
java -jar atlassian-bamboo-agent-installer-5.7.1.jar ${bambooUrl}/bamboo/agentServer/
# Report maintenance complete to master server
# grab task ID from task file
# SAMPLE:
# PCODE=YES_CHILD
# PMESSAGE="you may upgrade once idle"
# PVERSION=1.2
# ENABLED=false
# BUSY=false
# TASK=1
source ~/finishTaskOnStartup.txt
echo "Reporting Task : ${TASK} complete to ${bambooUrl}"
curl -X PUT -k -b $TMPDIRD/cookies "$bambooUrl/rest/agents/latest/$agentId/maintenance/${TASK}/finish ?uuid=${uuid}" 2>/dev/null
# all done. brand new or refreshed agent is back online, and others may not take their turn.
Triggering maintenance with monit
You can use the maintenance APIs outside of full rebuilds. One example is using monit to simply cleanup temporary file systems.
Monit is a small Open Source utility for managing and monitoring Unix systems. Monit conducts automatic maintenance and repair and can execute meaningful causal actions in error situations.
# ensure Bamboo agent has enough disc space every 10 minutes
check filesystem bamboo with path /bamboo
EVERY 10 CYCLES
if space usage > 80% then
exec "/bamboo/scripts/clearAgentDiskSpace.sh"