Hunting Memory Leaks

Memory leaks can be a big issue with gevent workers because several tasks share the same python process.

Thankfully, MRQ provides tools to track down such issues. Memory usage of each worker is graphed in the dashboard and makes it easy to see if memory leaks are happening.

When a worker has a steadily growing memory usage, here are the steps to find the leak:

Check which jobs are running on this worker and try to isolate which of them is leaking and on which queue
Start a dedicated worker with --trace_memory --greenlets 1 on the same queue : This will start a worker doing one job at a time with memory profiling enabled. After each job you should see a report of leaked object types.
Find the most unique type in the list (usually not 'list' or 'dict') and restart the worker with --trace_memory --greenlets 1 --trace_memory_type=XXX --trace_memory_output_dir=memdbg (after creating the directory memdbg).
There you will find a graph for each task generated by objgraph which is incredibly helpful to track down the leak.

Using guppy

If you want to get an interactive debugging session to deal with high memory usage, you can use guppy. Here is how:

First, initialize a REPL with MRQ configured and guppy loaded:

$ pip install guppy
$ python
>>> from mrq.context import setup_context, run_task
>>> setup_context()
>>> from guppy import hpy
>>> hp = hpy()

Then, wrap your memory-intensive task or code around guppy calls

>>> hp.setrelheap()  # Used as reference point for memory usage
>>> run_task("tasks.your.MemoryHungryTask", {"a": 1, "b": 2})
>>> h = hp.heap()

At this point h should contain all the infos you need. You can view an extended debugging session here.

>>> h
Partition of a set of 300643 objects. Total size = 41626536 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0 130043  43 15682088  38  15682088  38 str
     1  76123  25  6978416  17  22660504  54 tuple
     2   1015   0  2794024   7  25454528  61 dict of module
     3  20181   7  2583168   6  28037696  67 types.CodeType
     4  20610   7  2473200   6  30510896  73 function
     5   2321   1  2095216   5  32606112  78 type
     6   2319   1  2045160   5  34651272  83 dict of type
     7   1277   0  1162808   3  35814080  86 dict (no owner)
     8   2890   1   918352   2  36732432  88 unicode
     9    494   0   440912   1  37173344  89 dict of class

So our task added 41.6M of RAM to the current process. Let's see where it comes from, starting by these 15M of strings:

>>> h[0].byvia
Partition of a set of 130043 objects. Total size = 15682088 bytes.
 Index  Count   %     Size   % Cumulative  % Referred Via:
     0   8208   6  3890208  25   3890208  25 '.func_doc', '[0]'
     1  20065  15  3239664  21   7129872  45 '.co_code'
     2  16673  13  1706224  11   8836096  56 '.co_filename'
     3   2398   2  1606864  10  10442960  67 "['__doc__']"
     4  19810  15  1109640   7  11552600  74 '.co_lnotab'
     5    419   0   308392   2  11860992  76 '.func_doc'
     6   4311   3   285232   2  12146224  77 '[1]'
     7   2788   2   167616   1  12313840  79 '[2]'
     8   2153   2   129136   1  12442976  79 '[3]'
     9   1006   1   109560   1  12552536  80 "['__file__']"
<21212 more rows. Type e.g. '_.more' to view.>

Here, it seems that surprisingly, most of the strings are actually docstrings. This can happen if you work with large Python modules like scipy or boto. One might consider stripping them manually or with Python's optimized mode.