sysadvent: logger

Cron is just about everywhere. It's configuration and behavior is pretty similar across any platform:

every <scheduled time>, it runs your command as some user
output gets emailed to MAILTO= or $USER

Cron doesn't do everything I want by default. Here's what I want:

to prevent the same job from having overlapping execution.
want emailed output only on failures.
all output to be logged somewhere.
some jobs to timeout if they run too long.
randomize startup time to avoid resource contention.

It's easiest to first discuss each of these features individually.

For the rest of this article, we'll show various improvements to the following cron job that does a twice-daily backup of mysql.

0 0,12 * * * backupmysql.sh

The contents of our backupmysql.sh are:

#!/bin/sh

mysqldump ...

For simplicity, we omit the mysqldump arguments. Let's get on to addressing individual problems.

Overlapping jobs - Locks

Overlapping jobs can be prevented using locking. Last year, we covered lock file practices which can be applied to solve this. Simply pick a unique lockfile for each cronjob and wrap your cron job with flock(1) (or lockf(1) on FreeBSD).

Let's prevent two backups from running simultaneously. Additionally, we want to abort if we can't grab the lock. flock(1) defaults to waiting indefinitely, so let's set the wait time to 0 and use "/tmp/cron.backupmysql" as the lockfile:

#!/bin/sh

lockfile="/tmp/cron.backupmysql"
flock -w 0 $lockfile mysqldump ...

Emailed output only on failures

You don't necessarily need an email every time your job runs and succeeds. Personally, I only want to be contacted if there's a failure. In this case, we want to capture output somewhere and only emit the output if the exit status of something is nonzero.

#!/bin/sh

output=$(mktemp)
mysqldump ... > $output 2>&1

code=$?
if [ "$code" -ne 0 ] ; then
  echo "mysqldump exited with nonzero status: $code"
  cat $output
  rm $output
  exit $code
fi
rm $output

All output should be logged somewhere

Regardless of exit status, I always want the output of the job to be logged so we can audit it later. This is easily done with the logger(1) command.

#!/bin/sh

# pipe all output to syslog with tag 'backupmysql'
mysqldump ...  2>&1 | logger -t "backupmysql"

Some jobs need timeouts

Run-away cronjobs are bad. If you use locking as above to prevent overlaps, a stuck or frozen job can prevent any future jobs from running unless something causes the stuck or very-long job to die. For this, we'll need a tool to interrupt execution of a program after a timeout. I don't know if there's a canonical tool for this, so I wrote one for this artcle.

Download alarm.rb.

You'll need ruby for alarm.rb. We can now apply this to our backup script:

#!/bin/sh

alarm.rb 28800 mysqldump ...

This will abort if the mysqldump runtime exceeds 8 hours (28800 seconds). My alarm.rb will exit nonzero on timeouts, so if we use the email-on-error tip from above, we'll get notified on job timeouts.

Randomized startup

If you have lots of hosts all doing backups at the same time, your backup server may get overloaded. You can hand-schedule all your similar jobs to not run simultaneously on multiple hosts, or you can take a shortcut and randomize the startup time.

To do this in a shell script, you'll need something to generate random numbers for you. Doing this explicitly in shell requires a shell that can generate random numbers: bash, Solaris ksh, and zsh support the magic variable $RANDOM which evaluates to a random number between 0 and 32767. You'll also need something to map your random value across your sleep duration, we'll use bc(1) and bash(1) here (Even though zsh's $(( )) math operations support floats, bash seems more common).

#!/bin/bash

maxsleep=3600
sleeptime=$(echo "scale=8; ($RANDOM / 32768) * 3600" | bc | cut -d. -f1)
echo "Sleeping for $sleeptime before starting backupmysql."
sleep $sleeptime

mysqldump ...

Combining everything

Now let's combine all of the above into one super script. Doing all of the above cleanly and safely in bash is not the most trivial thing. Here is the result:

cronhelper.sh

Using cronhelper.sh is simple. It takes options as environment variables. Here's an example:

% TIMEOUT=5 JOBNAME=helloworld cronhelper.sh sh -c "echo hello world; sleep 10"
Job failed with status 254 (command: sh -c echo hello world; sleep 10)
hello world
/home/jls/bin/alarm.rb: Execution expired (timeout == 5.0)

# and in /var/log/messages:
Dec  8 02:58:02 snack helloworld[19565]: hello world
Dec  8 02:58:07 snack helloworld[19565]: /home/jls/bin/alarm.rb: Execution expired (timeout == 5.0)
Dec  8 02:58:07 snack helloworld[19573]: Job failed with status 254 (command: sh -c echo hello world; sleep 10)

Now armed with cronhelper.sh and alarm.rb, we can modify our cron job. Let us choose an 8 hour timeout and a 1 hour random startup delay:

0 0,12 * * * JOBNAME="backupmysql" SLEEPYSTART=3600 TIMEOUT=28800 cronhelper.sh backupmysql.sh

The new cron entry is now:

logging any output to syslog
only outputting to stdout when there's been a failure (and thus only emailing us on failures)
staggering startup across an hour
aborting after 8 hours if not finished
locking so overlapping runs are impossible

Using the tools above should help you build more reliable and less noisy cron jobs, which makes your systems more reliable and your pager more quiet.

Downloads:

December 8, 2009

Day 8 - Cron Practices

Overlapping jobs - Locks

Emailed output only on failures

All output should be logged somewhere

Some jobs need timeouts

Randomized startup

Combining everything

What is sysadvent?

Blog Archive

December 8, 2009

Day 8 - Cron Practices

Overlapping jobs - Locks

Emailed output only on failures

All output should be logged somewhere

Some jobs need timeouts

Randomized startup

Combining everything

What is sysadvent?

Subscribe

Blog Archive