Tuesday, February 19, 2019

debian - sometimes, crontab is not reloaded by cron daemon

I'm asking this question, because I couldn't find the answer here :
Why is my crontab not working, and how can I troubleshoot it?



Context



We have several servers running debian/wheezy.




One backup task requires that we deactivate the crontab of a specific user during the backup, so we have a script, run daily, which roughly does :



# user is legec :

# save the crontab to a file
crontab -ulegec -l > /home/legec/.backup/crontab
# empty the crontab
echo "" | crontab -ulegec


backup ...

# reload crontab
cat /home/legec/.backup/crontab | crontab -ulegec


And this works as we expect, the vast majority of times.



This task runs on ~80 servers ; depending on the server, the backup task will take from 1 minute up to 2 hours.




Bug



Once in a while, cron will not detect the last reload, and will not execute any of the jobs listed in the crontab.



The file in /var/spool/cron/crontabs/legec has the expected content, and modification date :



$ ls -lh /var/spool/cron/crontabs/legec
-rw------- 1 legec crontab 6.7K Sep 22 04:03 /var/spool/cron/crontabs/legec



but cron logs indicate that cron did not detect the last change :



$ cat /var/log/cron.log | grep -E "LIST|RELOAD|REPLACE"
...
# yesterday's backup : all went fine
Sep 21 04:00:06 lgserver crontab[6670]: (root) LIST (legec)
Sep 21 04:00:06 lgserver crontab[6671]: (root) LIST (legec)
Sep 21 04:00:06 lgserver crontab[6673]: (root) REPLACE (legec)
Sep 21 04:01:01 lgserver /usr/sbin/cron[2025]: (legec) RELOAD (crontabs/legec)
Sep 21 04:03:01 lgserver crontab[7071]: (root) REPLACE (legec)

Sep 21 04:03:01 lgserver /usr/sbin/cron[2025]: (legec) RELOAD (crontabs/legec)

# today's backup : no final RELOAD event
Sep 22 04:00:07 lgserver crontab[24163]: (root) LIST (legec)
Sep 22 04:00:07 lgserver crontab[24164]: (root) LIST (legec)
Sep 22 04:00:07 lgserver crontab[24166]: (root) REPLACE (legec)
Sep 22 04:01:01 lgserver /usr/sbin/cron[2025]: (legec) RELOAD (crontabs/legec)
Sep 22 04:03:01 lgserver crontab[24458]: (root) REPLACE (legec)
# no RELOAD line here



"Once in a while" means : no regularity, we see this bug maybe once a month, on one random server out of the ~80 which are running.



Question



Does anyone have a lead on where to look ?

No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...