Jobs maintenance
MRQ can provide strong guarantees that no job will be lost in the middle of a worker restart, database disconnect, etc...
To do that, you should add these recurring scheduled jobs to your mrq-config.py:
SCHEDULER_TASKS = [
# This will requeue jobs in the 'retry' status, until they reach their max_retries.
{
"path": "mrq.basetasks.cleaning.RequeueRetryJobs",
"params": {},
"interval": 60
},
# This will requeue jobs marked as interrupted, for instance when a worker received SIGTERM
{
"path": "mrq.basetasks.cleaning.RequeueInterruptedJobs",
"params": {},
"interval": 5 * 60
},
# This will requeue jobs marked as started for a long time (more than their own timeout)
# They can exist if a worker was killed with SIGKILL and not given any time to mark
# its current jobs as interrupted.
{
"path": "mrq.basetasks.cleaning.RequeueStartedJobs",
"params": {},
"interval": 3600
},
# This will requeue jobs 'lost' between redis.blpop() and mongo.update(status=started).
# This can happen only when the worker is killed brutally in the middle of dequeue_jobs()
{
"path": "mrq.basetasks.cleaning.RequeueLostJobs",
"params": {},
"interval": 24 * 3600
},
# This will clean the list of known queues in Redis. It will mostly remove empty queues
# so that they are not displayed in the dashboard anymore.
{
"path": "mrq.basetasks.cleaning.CleanKnownQueues",
"params": {},
"interval": 24 * 3600
}
]
Obviously this implies that all your jobs should be idempotent, meaning that they could be done multiple times, maybe partially, without breaking your app. This is a very good design to enforce for your whole task queue, though you can still manage locks yourself in your code that make sure a block of code will only run once.