Wednesday, November 5, 2014

cron - Cronjob for a bash script pauses for 2 minutes randomly



I have been running a bash script on CEntOS 5.5 machine that checks replication on a remote mysql server. The script creates a temporary lock file and is scheduled to run in Crontab every minute. But once in a while the cron job goes out of sync - pauses or delays the jobs for two minutes and tries to run three jobs at a time. This is creating false alarms, flooding my mailbox saying that "Lock file exists! Possible conflict".



Here is the part of the script that might be of interest:



#!/bin/sh


lock_file=/tmp/slave_alert.lck
finished=0

# Alert function
function mail_alert () {
cat /var/log/replication_check.log | mail -s "Replication check errors!" mail@mail.com
}

# Check if lock file exists
if [ -f $lock_file ];

then
echo "Lock file exists! Possible conflict!" > /var/log/replication_check.log 2>&1
mail_alert
exit 1
else
touch $lock_file
fi

finished=1



while [ $finished -ne 0 ]
do

if [Replication is not configured or you do not have the required access to MySQL];
then
rm $lock_file
exit 1
fi
# Do some logic

sleep 1
done

rm $lock_file

echo "Script complete!"
exit 0


Here is the Cronjob:




* 0-1,4-23 * * * path/check_replication.sh > /dev/null 2>&1


Here is a part of the output of the log file: /var/log/cron . We can see that 14:35 and 14:36 are delayed.



Let us name this string CRONJOB = (root) CMD (path/check_replication.sh > /dev/null 2>&1)



Sep 23 14:30:01 remote-host crond[3959]: CRONJOB
Sep 23 14:31:01 remote-host crond[4025]: CRONJOB

Sep 23 14:32:01 remote-host crond[4054]: CRONJOB
Sep 23 14:33:01 remote-host crond[4102]: CRONJOB
Sep 23 14:34:01 remote-host crond[4129]: CRONJOB
Sep 23 14:37:00 remote-host crond[4276]: CRONJOB
Sep 23 14:37:01 remote-host crond[4308]: CRONJOB
Sep 23 14:37:02 remote-host crond[4365]: CRONJOB
Sep 23 14:38:01 remote-host crond[4129]: CRONJOB
Sep 23 14:39:01 remote-host crond[4129]: CRONJOB

Answer




Cron does not actually guarantee execution times. It works on a "best effort" model. Although it would like it to run once a minute, it will try as hard as it can to do so, but may decide every once and a while (for various resource-related reasons) that it just can't get to it at that moment, and may push it to it's next cycle.



In addition, Cron won't even guarantee that it will run items in the exact order they show up in your crontab. Although it's a bit more rare to see, you can definitely get jobs running out of order, or doubling up with other jobs (which may very well be what you are seeing here).



You probably don't really care if your replication check is going to be a minute or two lagged (although that will really be your call). In the high-traffic environment that I manage, we only perform replication checks every 30 minutes via cron.



That said, if you absolutely have to have minute accuracy to fire these jobs, you will probably want to look into a different solution that Cron.


No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...