How I Made the Django Admin Scale for Stik.com

Craig's avatar

How I Made the Django Admin Scale for Stik.com

At Stik.com, we're using Django to supply a RESTful interface to our database. Our support staff is also using its admin for simple ticket items. Fortunately for the business, but unfortunately for the Django admin, we have lots of tables with hundreds of millions of rows, and a few in the billions. We really didn't want to spend the time writing our own admin section for support staff, so the Django admin was a godsend... until we rolled it out to production and saw admin pages never respond ever. Read on to see what our problem was and how we fixed it.

The problem:

Large tables in InnoDB may not be able to even render their change_list view due to (possibly 2) expensive COUNT(*) queries. With tables of 100M+ records, this can take 15+ minutes to complete, even on fast servers.

The implementation around search_fields is woefully under thoughtout for big tables. The searching-on-every-field-or-not-searching-at-all system breaks very quickly for big tables.

The solution:

Read the rest of this post ;)

Problem 1: Prohibitive COUNT queries

This one is pretty simple, so I'll start here. Counts on InnoDB tables are prohibitively expensive, especially if you get into the hundreds of millions of records department. To fix this, create this custom QuerySet and Manager:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
from django.db import models, connections
from django.db.models import Count
from django.db.models.query import QuerySet

class MyQuerySet(QuerySet):
    def count(self):
        '''
        Override entire table count queries only. Any WHERE or other altering
        statements will default back to an actual COUNT query.
        '''
        if self._result_cache is not None and not self._iter:
            return len(self._result_cache)

        is_mysql = 'mysql' in connections[self.db].client.executable_name.lower()

        query = self.query
        if (is_mysql and not query.where and
                query.high_mark is None and
                query.low_mark == 0 and
                not query.select and
                not query.group_by and
                not query.having and
                not query.distinct):
            # If query has no constraints, we would be simply doing
            # "SELECT COUNT(*) FROM foo". Monkey patch so the we
            # get an approximation instead.
            cursor = connections[self.db].cursor()
            cursor.execute("SHOW TABLE STATUS LIKE %s",
                    (self.model._meta.db_table,))
            return cursor.fetchall()[0][4]
        else:
            return self.query.get_count(using=self.db)


class NoCountManager(models.Manager):
        def get_query_set(self):
            return MyQuerySet(self.model, using=self._db)

And then in any tables you expect to reach prohibitive counting size, simply override the existing manager like you would normally do:

1
2
3
class MyModel(models.Model):
    ...
    objects = NoCountManager()

You can now load this model's admin page without suffering the long delays involved with full InnoDB table counts.

Problem 2: Broken Search

Consider this model and admin model:

1
2
3
4
5
6
7
8
9
# models.py
class MyModel(models.Model):
    name = models.CharField(...)
    email = models.EmailField(db_index=True)
    weight = models.IntegerField(help_text='In pounds')

# admin.py
class MyModelAdmin(admin.ModelAdmin):
    search_fields = ['name', 'email', 'weight']

Both seem pretty reasonable to me. You don't expect to access MyModel by its name field in code, so you only put an index on email. Then you unwittingly enter fred@domain.com into the Django admin's handy little search box. And you press enter.

And you wait.

And you wait some more.

Dafuq is this?

It turns out, BOTH of these queries are running behind the scenes:

1
2
3
4
5
6
7
SELECT COUNT(*)
FROM `module_my_model`
WHERE (
    `module_my_model`.`name` LIKE '%fred@domain.com%' OR
    `module_my_model`.`email` LIKE '%fred@domain.com%' OR
    `module_my_model`.`weight` LIKE '%fred@domain.com%' OR
)

AND

1
2
3
4
5
6
7
8
9
SELECT [all fields]
FROM `module_my_field`
WHERE (
    `module_my_model`.`name` LIKE '%fred@domain.com%' OR
    `module_my_model`.`email` LIKE '%fred@domain.com%' OR
    `module_my_model`.`weight` LIKE '%fred@domain.com%' OR

)
ORDER BY [model ordering]

Umm, what? Those double wildcard LIKE statements require FULL TABLE SCANS, which even ignore the indicies. This is fine when you're in dev and have 10 records, but in production with hundreds of millions of rows, you may literally have to wait hours before this admin page loads.

Also, what if you just want to find people whose weight is 10, or who have a specific email? Why does the search have to suck so badly by hitting every field every time?

My first thought was, "Come on Django -- you're better than this!"

My second thought was, "Wait, Django has a brilliant system in place for overriding functionality. I can do this!"

Step 1:

In your templates directory create admin/your_module/change_list.html and get ready to get your hands dirty. Put this at the top:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
{% extends 'admin/change_list.html' %}
{% load i18n admin_static admin_list %}
{% load url from future %}
{% load admin_urls %}

{# We are overriding search, SO IT CAN GO EFF ITSELF #}
{% block search %}{% endblock %}

{% block result_list %}
    Stuff coming here in Step 4
{% endblock %}

Step 2:

First, create a parent admin model that each model you want to adjust will inherit from. Mine looked like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
class CoreModelAdmin(admin.ModelAdmin):
    def has_search(self):
        '''
        High level function templates will use to see if they need to
        provide the new search functionality.
        '''
        return len(self.search_fields) > 0 or hasattr(self, 'multi_search_fields')

    def get_searchable_fields(self):
        '''
        Return a template-friendly list of searchable fields for a dropdown.
        '''
        obj = []
        if hasattr(self, 'search_fields'):
            for searchable_field in self.search_fields:
                field = self.model._meta.get_field(searchable_field)
                obj.append({
                    'verbose_name': field.verbose_name,
                    'real_name': field.name
                })
        return obj

    def has_multi_search_fields(self):
        return hasattr(self, 'multi_search_fields')

    def get_multi_search_fields(self):
        return self.multi_search_fields

Step 3:

Here we add some additional annotation to our existing models. Single field searches will natively use the existing search_fields attribute you're used to using, but what if your table has multi-column indecies? Searching by only 1 of those columns may not be able to use the full index, especially if you know the value of the other indexed column as well.

Consider a table with an index on email and status, where status is a simple integer field. You may want to search for "fred@domain.com" where status=1, and anything less specific will be prohibitively slow.

For this situation, annotate your AdminModel like so:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
class MyModelAdmin(CoreModelAdmin):
    list_display = ['normal', 'fields', 'here']
    search_fields = ['use', 'this', 'like', 'usual']
    multi_search_fields = [
        {
            'label': 'Email',
            'fields': [
                {
                    'display_name': 'Email Address',
                    'field_name': 'email'
                },
                {
                    'display_name': 'Status',
                    'field_name': 'status',
                    'help_text': '1 is for Facebook, 2 is for LinkedIn',
                    'defaultValue': 1
                }
            ]
        }
    ]

JavaScript we're about to add will manage these multi-field searches to keep your admin honoring two-column indecies.

Step 4:

We're now ready to override the result_list block in our custom change_list.html file. There are a few components, so hang tight.

vanity styling so your admin's users don't kill themselves

1
2
3
4
5
6
<style>
    p.help_text { font-size:0.9em; color:#666; }
    div.search { padding:10px; }
    div.search section { float:left; margin-right:10px; }
    div.multi-search-field { padding:3px; background-color:#fafafa; margin-bottom:3px; border:1px solid #e8e8e8; }
</style>

html at the top of the block

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
<h2>Search by. . .</h2>
<div class="search">
    <section>
        <select id="searchFieldName">
        {% for searchable_field in cl.model_admin.get_searchable_fields %}
            <option value="{{ searchable_field.real_name }}">{{ searchable_field.verbose_name }}</option>
        {% endfor %}
        </select>
    </section>
    <section class='inputs-wrapper'>
        <input id="searchValue" type="text" name="search_value" placeholder="Value"/>
    </section>
    <section>
        <button type="button" id="customSearch">Search</button>
    </section>
    <div style="clear:both;"></div>
</div> <!-- End of .search -->

<script>
    {# Add bootstrapped data for multi-search fields #}
    {% if cl.model_admin.has_multi_search_fields %}
        {% autoescape off %}var multi_search_fields = {{ cl.model_admin.get_multi_search_fields }};{% endautoescape %}
    {% else %}
        var multi_search_fields = false;
    {% endif %}
</script>

Add jQuery, of course.

1
<script src="http://code.jquery.com/jquery-latest.min.js"></script>

And finally, use this functional-ish JavaScript to power our new form. (Within <script> tags, of course).

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
$(function(){
    /**
     * Quick-and-dirty HTML builder for multi-field searches.
     */
    window.buildMultiSearchFieldBlock = function(field) {
        default_value = (field.defaultValue) ? "value='"+ field.defaultValue +"'" : "";
        help_text = (field.help_text) ? "<p class='help_text'>"+ field.help_text +"</p>" : "";
        var html = [
            '<div class="multi-search-field">',
                '<label for="'+ field.field_name +'">'+field.display_name+'</label> <br />',
                '<input type="text" name="'+field.field_name+'" placeholder="'+field.display_name+'" '+default_value+'/><br/>',
                help_text,
            '</div>'
        ];
        return html.join("");
    };

    /**
     * Serializer to the kind of querystring we need.
     */
    window.serializeToQueryString = function($domEl) {
        var queryString = [],
            valid = true;
        $domEl.find("input").each(function(){
            if ($(this).hasClass("ignore")) { return; }
            if ($(this).val() === "") {
                valid = false;
            }
            $that = $(this);
            if (!$that.hasClass("ignore")) {
                queryString.push($(this).prop("name") + "=" + $(this).val());
            }
        });
        if (valid) {
            return "?" + queryString.join("&");
        } else {
            return false;
        }
    };

    /**
     * Append any multi-field searches to the existing `select`.
     */
    if (multi_search_fields !== false) {
        // Add the multiple search fields
        $.each(multi_search_fields, function(key, value){
            $("#searchFieldName").append('<option value="'+ key +'" data-ismultiple="true">'+ value.label +'</option>')
        });

        // Add a listener to the select's change
        $("#searchFieldName").on("change", function(){
            var $select = $(this),
                $option = $("option[value='"+ $select.val() +"']"),
                $inputsWrapper = $(".inputs-wrapper");

            if ($option.data("ismultiple") !== undefined) {
                // We selected a multiple!
                $("div.search").addClass("multiple");

                var fieldset = multi_search_fields[$option.prop("value")];
                $inputsWrapper.find("#searchValue").hide().addClass("ignore");

                $.each(fieldset.fields, function(index, field){
                    $inputsWrapper.append(buildMultiSearchFieldBlock(field));
                });
            } else {
                // We selected a regular one. Clear anything done in the IF above
                $("div.search").removeClass("multiple");

                $inputsWrapper.find("#searchValue").show().removeClass("ignore");
                $inputsWrapper.find(".multi-search-field").remove();
            }
        });
    }

    /**
     * Bind our button
     */
    $("#customSearch").on("click", function(){
        if (!$("div.search").hasClass("multiple")) {
            window.location = window.location.pathname + "?" + $("#searchFieldName").val() + "=" + $("#searchValue").val();
        } else {
            queryString = serializeToQueryString($(".inputs-wrapper"));
            if (queryString !== false) {
                window.location = window.location.pathname + queryString;
            } else {
                alert("You must provide a value for all fields.");
            }
        }
    });

    /**
     * Enter listeners!
     */
    $(".inputs-wrapper").on("keypress", "input", function(e){
        if (e.keyCode === 13) {
            $("#customSearch").click();
            return false;
        }
    });
});

You're now ready to load the page and see a brand new search bar.

Conclusion

This should get you up and running with a faster Django Admin that will scale to production-sized tables without a problem. I hope this helps someone one day not have to write a custom admin and instead just use the pretty-damn-good baked in one that Django supplies.

If you have any other tips or tricks, or are running into unrelated scaling issues with the Django admin, leave a note in the comments.


comments powered by Disqus