I'm asking this question, because I couldn't find the answer here :
Why is my crontab not working, and how can I troubleshoot it?
Context
We have several servers running debian/wheezy.
One backup task requires that we deactivate the crontab of a specific user during the backup, so we have a script, run daily, which roughly does :
# user is legec :
# save the crontab to a file
crontab -ulegec -l > /home/legec/.backup/crontab
# empty the crontab
echo "" | crontab -ulegec
backup ...
# reload crontab
cat /home/legec/.backup/crontab | crontab -ulegec
And this works as we expect, the vast majority of times.
This task runs on ~80 servers ; depending on the server, the backup task will take from 1 minute up to 2 hours.
Bug
Once in a while, cron will not detect the last reload, and will not execute any of the jobs listed in the crontab.
The file in /var/spool/cron/crontabs/legec
has the expected content, and modification date :
$ ls -lh /var/spool/cron/crontabs/legec
-rw------- 1 legec crontab 6.7K Sep 22 04:03 /var/spool/cron/crontabs/legec
but cron logs indicate that cron did not detect the last change :
$ cat /var/log/cron.log | grep -E "LIST|RELOAD|REPLACE"
...
# yesterday's backup : all went fine
Sep 21 04:00:06 lgserver crontab[6670]: (root) LIST (legec)
Sep 21 04:00:06 lgserver crontab[6671]: (root) LIST (legec)
Sep 21 04:00:06 lgserver crontab[6673]: (root) REPLACE (legec)
Sep 21 04:01:01 lgserver /usr/sbin/cron[2025]: (legec) RELOAD (crontabs/legec)
Sep 21 04:03:01 lgserver crontab[7071]: (root) REPLACE (legec)
Sep 21 04:03:01 lgserver /usr/sbin/cron[2025]: (legec) RELOAD (crontabs/legec)
# today's backup : no final RELOAD event
Sep 22 04:00:07 lgserver crontab[24163]: (root) LIST (legec)
Sep 22 04:00:07 lgserver crontab[24164]: (root) LIST (legec)
Sep 22 04:00:07 lgserver crontab[24166]: (root) REPLACE (legec)
Sep 22 04:01:01 lgserver /usr/sbin/cron[2025]: (legec) RELOAD (crontabs/legec)
Sep 22 04:03:01 lgserver crontab[24458]: (root) REPLACE (legec)
# no RELOAD line here
"Once in a while" means : no regularity, we see this bug maybe once a month, on one random server out of the ~80 which are running.
Question
Does anyone have a lead on where to look ?
No comments:
Post a Comment