I have been running a bash script on CEntOS 5.5 machine that checks replication on a remote mysql server. The script creates a temporary lock file and is scheduled to run in Crontab every minute. But once in a while the cron job goes out of sync - pauses or delays the jobs for two minutes and tries to run three jobs at a time. This is creating false alarms, flooding my mailbox saying that "Lock file exists! Possible conflict".
Here is the part of the script that might be of interest:
#!/bin/sh
lock_file=/tmp/slave_alert.lck
finished=0
# Alert function
function mail_alert () {
cat /var/log/replication_check.log | mail -s "Replication check errors!" mail@mail.com
}
# Check if lock file exists
if [ -f $lock_file ];
then
echo "Lock file exists! Possible conflict!" > /var/log/replication_check.log 2>&1
mail_alert
exit 1
else
touch $lock_file
fi
finished=1
while [ $finished -ne 0 ]
do
if [Replication is not configured or you do not have the required access to MySQL];
then
rm $lock_file
exit 1
fi
# Do some logic
sleep 1
done
rm $lock_file
echo "Script complete!"
exit 0
Here is the Cronjob:
* 0-1,4-23 * * * path/check_replication.sh > /dev/null 2>&1
Here is a part of the output of the log file: /var/log/cron . We can see that 14:35 and 14:36 are delayed.
Let us name this string CRONJOB = (root) CMD (path/check_replication.sh > /dev/null 2>&1)
Sep 23 14:30:01 remote-host crond[3959]: CRONJOB
Sep 23 14:31:01 remote-host crond[4025]: CRONJOB
Sep 23 14:32:01 remote-host crond[4054]: CRONJOB
Sep 23 14:33:01 remote-host crond[4102]: CRONJOB
Sep 23 14:34:01 remote-host crond[4129]: CRONJOB
Sep 23 14:37:00 remote-host crond[4276]: CRONJOB
Sep 23 14:37:01 remote-host crond[4308]: CRONJOB
Sep 23 14:37:02 remote-host crond[4365]: CRONJOB
Sep 23 14:38:01 remote-host crond[4129]: CRONJOB
Sep 23 14:39:01 remote-host crond[4129]: CRONJOB
Answer
Cron does not actually guarantee execution times. It works on a "best effort" model. Although it would like it to run once a minute, it will try as hard as it can to do so, but may decide every once and a while (for various resource-related reasons) that it just can't get to it at that moment, and may push it to it's next cycle.
In addition, Cron won't even guarantee that it will run items in the exact order they show up in your crontab. Although it's a bit more rare to see, you can definitely get jobs running out of order, or doubling up with other jobs (which may very well be what you are seeing here).
You probably don't really care if your replication check is going to be a minute or two lagged (although that will really be your call). In the high-traffic environment that I manage, we only perform replication checks every 30 minutes via cron.
That said, if you absolutely have to have minute accuracy to fire these jobs, you will probably want to look into a different solution that Cron.
No comments:
Post a Comment