Transcript: Performance
Hello, and welcome to another episode of Django Chat, a weekly podcast on the Django web framework.
I'm Will Vincent, joined as always by Carlton Gibson. Hi, Carlton.
Hello, Will.
And this week, we're going to talk about performance. So performance matters because it is the probably
most important part of the user experience. Google punishes slow sites with SEO these
days and even something like amazon with e-commerce has done studies showing that just a 100 millisecond
slow down can cost a percentage of sales and what's 100 milliseconds that's like a blink of an
eye um what's the actually as we get into it what's the default these days for how was it 300
milliseconds a user can't tell the difference but then after that every 100 milliseconds i think
oh well there was i don't know but there was something about iphone um back in the day when
playing with those sorts of things the um there was something about the responsiveness there was
a 300 millisecond yeah 300 clicking in a wet in a web view and it used to drive users mad so people
would be building web apps no native apps but using web views and there'd be this this this
noticeable delay yeah yeah yeah versus a native button which just went off straight away and i
think the delay was 300 milliseconds and they there was some can we get rid of it no we can't
get rid of it because of their uh but that was like a glitch in the system yeah and it just gets
super frustrating i think 300 milliseconds definitely is noticeable yes yes but i don't
know why isn't so we're going to go through it's a whole set of tools and approaches yeah go ahead
well no web web framework uh no web performance metric was um they talked about less than one
second screen um to glass like if you could if you if your full load time to to interactive on
the glass was less than a second then users considered that fast and that's like the the
gold standard that's having your your your html delivered your javascript on there your your um
css in place so at least your first layout done even if you're pulling in images and things like
that but also your page clickable and the two glasses that it's responsive that it's it's
rendered and responsive in under a second and that's considered like you know that's quick
that's and you're talking about on mobile here right i mean on a website i say it's less than
i think over time people have come to expect better but that's of course no but i think
that's pretty good you bang that into even say it's at your desktop even with a decent connection
you put that into your desktop browser and you type any any site that's loading any kind of
javascript and like half of them will be slower than a second so anyway that that that that one
second to glass idea is is kind of like a benchmark um and if you think about how long it takes to
load javascript and for it to um well all the assets to arrive you've got network latency and
then you've got the loading time then you've got the rendering time it doesn't give your jango web
application very much time to respond right you're going to meet that right exactly and last point
before we get into all this i do want to mention uh donald newth is that how you say his name um
knuth knuth knuth yeah it says says wikipedia i always thought it was new that was going new for
years and then i looked it up it's knuth so knuth he's a he's a stanford you know he looks at email
i think like once every six months and thinks all these deep thoughts and he has these incredible
series on computer science anyways here's his quote on performance before we get into it which
is quote the real problem is that programmers have spent far too much time worrying about
efficiency in the wrong places and at the wrong times premature optimization is the root of all
evil or at least most of it in programming so you've probably heard the second half of that but
i think it's in context it makes a lot of sense around uh as we get into all these things
basically think about what you're doing don't just blindly whack every performance efficiency
mole that pops up because those will be infinite but also like what's it turns out as a sort of
matter of fact just from you know how programs behave in the wild that most of your performance
issues will come from a very small number of places and you won't be able to predict where
they are in advance so what you need to do is build it just build it simply and as sanely as
you can don't make optimizations yes you know just don't spend time optimizing it at all and then
profile it and see where the the three performance bottlenecks which are taking 70 percent of the
time are optimize those and with for a fraction of the effort of micro optimizing everything as
you went along you've got a more performant web application yes but it is tempting to try to do
it locally so yeah well it's interesting isn't it okay you know can i can i if i use exists versus
count do i get a four millisecond yeah okay well let's get into that so how do you baby step up it
So the very first thing I would say is you have to have Django debug toolbar,
just to, that's a third party package.
Uh, it.
that gives you configurable panes, configurable panels, so you can see the request response cycle
of a page. Basically, it shows you how many queries are there and how long it takes to load
locally. So this isn't a proxy for production, but it gives you a quick look at it. And the two big
ones, again, do this in production, the two big things you're going to want to look at for queries
is select-related and prefetch-related. Do you want to take a stab at those, Carlton?
okay so so if you so there are as well as django debug toolbar which you'd use locally there are
things called um application apm what does the p stand for application something monitoring i can
never process maybe yeah process but let's pretend it's process it might be something into it could
be performance um but anyway apm so um there used to be one called op beat which got bought up by
elastic and then there are other ones out there um what's the new relic was for a while roll bar
I always think Sentry should have one, but I don't think they do.
Yeah, that's error-checking that.
But I think Rollbar is Datadog, I think.
Right, okay.
But anyway, there's loads of these, and they just wrap something around your application
which monitors how long execution times take.
And what you're going to find if you've got any old normal application
is that your biggest hit is database.
Yes.
Is the time to fetch data from the database.
That's most of your response time from your Django application.
So what SelectRelated does,
is it when there's a foreign key you can say hey can we just join those two together with an sql
join and can we get them all in one database here rather than um two or more um you know if it's
right because one larger one for each database hit is each for a related object um and then there's
the the other so the other option is prefix related which is for many to many or many to one
relationships where you want to fetch um i know all the authors and all the books or i don't know
together look can a book happen yeah a book could have many authors so that's fine um and what that
will do is it'll fetch the authors and then it will fetch all the books that are related to the
authors and it will do that in in a couple of database queries rather than it can't do it in
one because it can't do the join but it will do it in two database queries rather than you know
perhaps potentially hundreds and i think uh the history of django so select related related was
always there and then i believe prefetch related was added later i'd have to go and look to be
honest but yeah that's these are these are the two hammers that you're going to want to use as a as
a first step generally speaking when you see a page that's loading slowly and django's test suite
has a really cool cool tool called assert none queries which if you write a unit test fetching
the data you want i haven't used that you can you can assert that only one query was made when you
select related to fetch your data and so you can kind of something carlton sometimes look at that
Well, I just found it in the Django test suite.
I'm like, what's that?
That's quite exciting.
But yeah, so you can write a unit test fetching your, you know,
so say you've got a convenience method which wraps all the data you need
for your view and returns it nicely.
So, you know, so you keep that logic out of the main line of the view.
You can test that method used with a certain number of queries and say,
look, I'm expecting this to make two queries because I'm using prefetch related.
I want one for the authors and one for the books,
and I don't want any more queries.
Um, so that when you, you know, iterate through your list in your test, it says, yeah, I did
fetch all of the objects here in two queries rather than one for the, for the author and
then one for each of the books as I traversed the relationship.
Right.
Yeah.
Yeah.
I like that.
Huh.
Okay.
I'm going to have to use that.
So what else?
So, yeah, so, so reduce the number of queries, right?
That's the first thing.
And then make sure your queries are efficient.
So that's the second thing I'd say is indexes.
So look at the queries you're making in your views and then make sure that those you can use explain the database explain query sets now from 2.2 have an explain method, which is kind of nice.
It saves you having to extract the query from the query set using query query and then putting that into your shell to explain it.
You can just call explain and it will give you it will send that off to the database and ask it to explain says what it's doing.
And you have to do that a few times and read them.
But it will say, look, and now I'm doing a row scan.
And what a row scan means is I'm going through every single row of the database table to see what the matches are.
And what you don't want that, that's where you want an index.
Because you want it to just go and look up in the index and get the matching values from the index, which is a much quicker operation.
So reduce the number of queries and make sure you're using index correctly.
That's my big one and two.
And then you were just about to say?
Caching.
Yes.
which we have a whole episode on this and we talked about that briefly before.
Yeah. Um, so cash all the things, uh,
well, yeah, but well, you know,
if it took a long time to get out of the database and you're getting it all the
time, cash it. I worked on a site which was, uh,
an API which was serving social media data.
I know as a competitor to some, one of these, um, I can't,
remember see so long ago but anyway it was it was social media data mining nonsense and they had
clients that were um making lots and lots and lots of api requests all the time and every every
request they had to um check the api key so you don't want to go fetching all the api keys from
the database every single request just to check whether the api matched so we would fetch it once
an hour or whatever and we would then check against the cache where the api key was correct
rather than against the database because well that's quicker right but the key thing is you're
doing this on real live production data because again i'll say this again to folks don't waste
time doing this locally it's so tempting to do but you you need yeah but you need um the code
path has to be the same right so that django gives you a dummy cache backend which is great for local
development because it it's it's it exposes the cache but it's just it doesn't work it doesn't
do anything so you can say is this in the cache no it's not because the dummy cache never caches
anything and then you can go and hit the database so in in development even though you need you just
use a dummy cache back end it's a bit like using the console email back end you know right right
yeah yeah uh indexes i want to quickly note so well one thing which i think is cool is that
starting with 1.11 you can do this in a meta class on your models instead of adding a db index field
i personally find that doing it through meta is a little more readable and i can put put more
things in there um but what would be so indexes are abused like what's the downside of just
indexing everything right what time and space so it takes so when you when you if you've got an
indexed field yeah yeah index field it will column in the database table it will um take longer to
write that record to the database if it has to index it at the same time so that's time so right
performance is impaired if you've got an index in place but also space because um you know it's like
a phone book right so the classic example of an index is a phone book where i you know i want to
someone's phone book phone number up by their name okay so instead of going through the list
one by one i can just go to the alphabet look at get to the right place in the alphabet and get the
number um but phone books are big and fat right they take up a lot of space on the coffee table
under in the hallway under the telephone so that's the same same problem for every index that you add
right they're not costless yeah they're not costless but to be honest on balance are you
making actual queries you on this data if so you probably want an index in play but until you've
started and once you're building before you build your application you probably haven't designed it
well enough to know which columns you'll actually be querying right i think that's the key thing is
your schema can and will change and especially once you start indexing schema 1.0 it's that's
the wrong approach i've done that i don't sorry i don't quite like if you you start you start with
your basic schema yeah you go hog wild with indexes and then you find out that i want to
change the end that change the schema around because my data is different or i add new features
but then you've got the indexes and it's too much too much it's premature optimization
yeah i mean and this this is um you know this happens um in uh no sql land quite a lot because
for instance couch tv you have to you you have to create these views which are essentially indexes
and you have to specify them up front and so you think oh i'm gonna here's an index i'll create
this or here's a view i'll create this um and it goes and processes them all and then you realize
that was totally wrong and this happens with elastic search as well because you think i'm
going to search on this so you create an index you know searching with these fields and then
you have to re-index it all you realize that isn't right it's quite difficult um but well this is the
thing with no sequel it's it needs to be used appropriately because when you first start using
it you're like this is amazing like i don't need to worry about anything yeah yeah but uh so anyway
use django debug toolbar when you've got your application running you're like i'm going to
deploy this okay so go through locally go through see what the actual queries are use the explain to
um are these queries sensible put an index in is is it improved and whilst local isn't a proxy for
production it kind of is in in that it won't tell you the exact numbers but it will tell you the
relative scale yeah i mean look at it for sure yeah uh and we mentioned you know the other if
there's a fourth big area i would i would say be the front end assets which is a django developer
you probably don't have as much control of but you can use tools for example you can use django
compressor third party super package um carlton maintains well you want to use a cdn what you
help with um you can use a cdn uh you know i there's a whole actually a link to it um i don't
know how to pronounce it, Adi Osmani, who's at Google, has a whole free web book on images,
which is really fascinating.
I mean, for example, if you haven't thought about it, you can use Easy Thumbnails, which
is a package.
So that rather than showing the two megabyte version of like, let's say it's a photo, a user profile photo, and someone can upload a photo. But when you show it on the screen, it's a tiny little thing where you can have a thumbnail version of it, the full version of it. These are sort of basic steps that are really performant. So just front end assets in general, and especially if you look at Google PageSpeed, there's other ones, all the major browsers have to evaluate site speed, they will help you especially with the front end assets to see like your JavaScript is way too big.
Yeah, or is your web server configured to cache these, to send the right caching headers, to say, look, you know, this CSS file, cache it indefinitely.
And one thing that Django Compressor will give you is a nice concatenated file, but it has a silly hash in the file name.
So you can cache that forever, because if you change your CSS, that hash is different, and so the file name is different.
And so you can configure your web server to tell the browsers and to tell the proxy caches
out there, caches forever.
Oh, it's fun to do performance stuff.
It's just never ending.
The last thing I would say, and you can say what you like, Carlton, is I find that the
Django extensions third party package is very helpful because it's got a whole bunch of
things, but specifically with Shell Plus, which will auto load models into your shell
when you need to drop in.
It also has Run Server Plus.
it's sort of a swiss army knife of tools i find myself using it all the time because whenever i
go into the shell i want the models loaded and it's just i can't live without django extensions
yeah i'm a big fan of it it's um it's got this um i can't remember the exact command but it's
got this ability to output a picture a dot a dot file which is a graphics format um of your models
and you can drag that into you can either view that in in what's the command line program i
can't remember but you i drag it into omni-graphle and then you've got a nice diagram of all your
models and the relationships between them and you're like hey that you know i love that oh
yeah well i mean the hard thing with all these favorite packages and tips is just figuring out
the the priority and the curation of them uh this is why for example like uh so uh awesome
django is a repo i maintain there's a whole bunch of third-party packages and i'm um i appreciate
lots of prs and issues people put in there but i don't want it to be a thousand packages long
i'm trying to keep it curated but when carlton i mentioned django debug toolbar django extensions
i would say almost every django site should use those yeah i don't have a problem saying that
it's amongst the packages that i'd pip install without you know really having any concerns
yeah that actually would be a cool thing to like what's your top you know your top five top ten
third party must-haves if we uh surveyed some talking heads and that might be a fun thing to do
yeah i should do that um any last things on performance we've we've really hit the kind
of the high points but jango probably is probably isn't your um problem as long as you're not making
200 database requests on a single thing you know if you do the basics it's jango probably isn't
your bottleneck for you know most web apps it's your javascript and your front-end stuff that's
that will have more of a an effect but you know if you're pushing if you're pushing it to you know
to its you know if your server is doing something intensive is your january application that's
driving it then you select related prefetch related in indexing then caching things like
serialization can can cost time you know rest framework if you're using rest framework
if you're really pushing that to limit serialization it's like template rendering
it's an expensive process so there are alternate serializing options which you might go for if you
were really you know driving it make sure your middlewares are optimized um you know the list
but what did you know premature optimization is the root of all evil chances are you will get the
throughput you need doing the the two or three things which are eating all the time rather than
you know worrying about oh should i should i spend a week changing my serializer layout to
yeah i was gonna say i mean 20 of the effort will get 80 of the way there um but i feel like i've
been saying this a lot recently but it is so tempting to dive into these small little micro
changes that will have an impact and ignore talking to users you know changing more important
things around but it um feels personally it feels better to endlessly optimize performance so i have
to watch myself with that well it's like the equivalent i mean you know some people their
busy work is answering email right for some yeah i'm doing a little i'm doing a performance
optimization look i've got five percent more throughput but yeah this thing is called once
by a back-end worker right totally pointless anyway okay that's the high points as ever we
are at jango chat.com uh or chat jango on twitter for the dyslexic folks out there
and we'll see you all next week bye-bye all right take care bye-bye