← Back to Show Notes

Transcript: Caching

Hello, and welcome to another episode of Django Chat, a weekly podcast on the Django Web Framework.

I'm Will Vincent, joined as always by Carlton Gibson. Hello, Carlton.

Hello, Will.

And this week, we're going to talk about caching, which is a power tool of all developers, but

may not be familiar to folks newer in their career. And Django has some fantastic built-in

support. So we're going to get into all things caching. So Carlton, what is caching? Why

is it important?

so oh good caching is good if you you know say you've got some database query that takes quite

a long time and you're doing it all the time maybe you want to cache that and that just means keep it

hanging around so you don't have to make it again and ideally your cache gate fetching so if you

store a result that you got from the database you can put it in your cache and ideally it's quicker

to get it back from the cache than it is to go to the database that's the idea so performance right

yeah and that general idea that rather than spinning up the physical disk if you load it

into memory like the ram it's going to be faster that's the general idea yeah i mean sometimes you

might cache on the disk though for instance um yeah if there's a i know you're rendering a

complicated template and you've you've taken you know you've gone to the database you've got some

data you've rendered it into a template there's no reason what you've got then some html out of

that which was computationally expensive there's no reason why you couldn't rec cache that on the

disk because then all you've got to do is fetch it from the disk and serve it straight away rather

than do all that heavy computation before but normally normally we're caching right yeah and

we'll get into where where you can put the cache but that's the basic idea is you you pre-process

things so they can be loaded faster um and i mean i think something that's maybe a little confusing

is it's such a broad idea caching i mean so if you index a database if folks have heard of that

that's basically a cache but then you can also in django you can well for example django has a

built in caching system, where we'll get to where you store it. But there's four options to give

folks a sense of how to think about this, where you can do per site, so you can just cache

everything. So if you have a Django site, but it's basically a static site, if it's a blog,

that's not changing, you can just do a, I think it's basically one line, we'll link to this in

the docs, and just per site, cache everything. And so after the first time, the first time it loads,

someone will come in, hit the site, it actually needs to process. But then after that, everything

is just served from memory. And actually, on that topic, let's talk about refreshing the cash hot

and cold, because I think that's an important thing. Because there's this idea that, you know,

cash has to be run, it has to be hot. So for example, if I've changed, I have a blog, simple

blog site, I add a new blog post, the first person who comes and hits that site is going is not going

be cached even if i have caching run so either i just say well the first folk first person who

comes in is going to take that performance hit or you can um heat you can what's the term you can

run the cache in advance i never know what the terms for these things are you can preheat the

cache right so what you can do yeah it's something when you publish your own blog post you can go and

check that it appeared on the site and in so doing it will load and then it will be cached in your

your page so normally you cache per page right like django's got this nice page caching option

where you like each individual page has a so right so yeah so the four options per site per view uh

template which i guess is what you mean by per page i think i can never remember then you can

get into the low level cache api so this is something you want to play with in reality so

in production like you can make predictions all day long on local usage but you really just want

to see how your site actually performs but i would say with god we're just butchering this heating up

the cash warming up the cash i played around with this a ton and on a big site it's worth it maybe

but it's also fine to just say you know if i have thousands of visitors the first one on this page

when i do a change it can be a little slower for them and they'll live yeah like the pay the pay i

mean what's good about the cash and right so the pay however long it takes to go and get your blog

post out of your database to render it into the template put it on the page well okay the first

time that's a bit slow and then the index page needs to change as well because that you know

the one where it lists the first five blog posts the most recent five blog posts you need to update

that so questions about invalidation that we can come back to in a minute but so that first the

first person who loads that is a bit slow if that's you brilliant right well i i did that i i

manually again this is early days a startup i would mainly go manually go through and uh reheat

god uh these pages um but you know in practice when you're dealing with hundreds thousands of

users it all is it comes out so would you set this to cache forever no i don't think you want

to do that i mean you could if you're updating all the time i recall setting it for a very long

period of time though off the top of my head i can't remember what a long period of time is i

think maybe it was a month oh right well that is quite a long time i mean i've the thing is and

I mean, usually it's like a week or a day.

Yeah, or a day or even an hour, right?

Because let's say you've got one person coming

and hitting your Django application once an hour.

It's really not going to kill your Django application, right?

But if you've got 20,000 people all at once, that will kill it.

So if you can cache that blog post even for an hour,

it means that the Django app is only really doing the hard work

once every hour or once every day or once every week.

Yeah, and again, I'm thinking of this was very early stage

with a startup. Yeah, I think so. So there's a

timeout so there's arguments you can pass into the cache uh caching framework built in the django

and i mean the docs give an example of um 300 seconds so five minutes as

you know substantial period of time yeah i i think that what i said was way too long but whatever

play around with it you know this is why you want logging and other information too on your site so

you can see actually how fast the page is loading um you know it's a balancing act basically but in

general cache everything yeah i mean before we go on and talk about the details of django caching

there's another layer to think about which is could you get nginx or whatever front-end proxy

you've got to do it instead because nginx will serve files off the file system you know far

more efficiently than any application you can ever write and so if you've got a blog post which

perhaps you update i don't know never why not tell nginx to cache it on the file system and

And then it's just like for NGINX perspective,

it's just like it's serving a static site

and it's like not even talking to your backend.

And that's really quite easy to configure.

You give it a path and you say, look, file cache,

and you give it the module,

the amount of time you want to cache it for,

and it will just do it.

So that's worth considering.

What do you make?

I remember using Varnish,

which is a proxy cache layer

when I was doing this all on DigitalOcean.

What's your take on NGINX versus Varnish?

I mean, you could use both, right?

Because they do different things.

People do.

So for me, for your...

I mean, I remember Varnish was like a huge speed bump.

Maybe the biggest of all the things I did.

Right.

But Varnish is a dedicated extra layer that you can use and super powerful.

But I always say, don't go to these things until you need it, right?

So what's your base set?

I did not do that, of course.

Right.

But what's your base setup?

Your base setup is, you know, just for example, I mean, you might be using Apache or whatever,

but let's just go with one example.

You're using Nginx, we're going to go on with Django.

okay you've already got nginx in play and it's it's got first grade um caching module that's

really easy to configure you can use that and that will that will really will cope with you

know probably 90 of sites out there that's perfectly good enough and then if you are

really pushing it to limit then you're going to investigate whether or not you need another

dedicated caching layer on top and i would say this is the type of stuff that is it's really

fun to do because you can at the end of the day you can say oh i increased my you know or decrease

my load time by x amount it sort of scratches that developer itch but it is i'm certainly guilty of

spending way too long getting that last five ten percent when it was totally unwarranted so it um

i would say be aware that this is fun and feels binary um and so a lot of times you'll you know

neglect things like talking to users that is a little more gray you know marketing or any of

that stuff marketing yeah all these things design um okay so where do you put the cache so let's

talk about so historically so memcache was the the first big popular caching layer though these days

i think redis almost everyone would say redis if you're starting from scratch you would use

people like redis it's got some fancy redis a little bit simpler or no it's not but it's a

little bit faster for sure is it is that the case i i believe so i there i've seen we'll link there's

detailed analysis i believe in most cases it's actually faster okay i mean look back in the day

and so we're talking um you know early 2000s where memcache was the option you'd run memcache

you'd be there even into and it was this amazing idea right because it came out of

i'm sorry to interrupt but yeah i remember like it came out like live journal or something in 2005

like it was i don't i think it was like the first major uh caching of that type yeah and it just did

the job and it did it very well and massive adoption because of that um and still brilliant

right it still works and no reason not to use memcache unless you're already thinking about

using redis and again it's like how many components do you want in your stack so if you've got redis

running why not use redis as a cash back end right so i guess yeah the general thing um memcache is

a little bit simpler but if you if you need if you're going to need redis things anyways you

might as well just use redis for all of it well so let's talk about those things why would you

use redis so i mean basically for any queue like tasks so emails one example um what are some other

examples that come to mind of when you would so so i guess we're i'm confusing two things here so

there's caching and then you'd also use something like redis for queue based yeah so why would you

have redis yeah because you want to you want to use a queue so let's take a good queue packet so

you know everyone always talks about celery but celery's overkill for you know the majority of

use cases so what's a good package well there's one called django queue which i love and have fun

with that's nice and simple and that's got a reddish back end so you pip install or you know

apt install reddish and then you pip install django queue into your project you know a little

bit of settings magic and you're up and running what do you put in there anything that you want

to put out banter you know you're rendering a pdf you're something that's going to be process

Intensive that would sending an email you do any of these tasks that we talked about we talk about all the time and then you've

Got you've already at that point. You've got Redis in play. So you might as well use it as your Django cache back-end

For which you'll need a couple of pack or a package. There's a couple of options, right?

There's Django Redis and Django Redis cache and I can never remember what the difference is between these

two ones every single time i start a new project i have to go and search history what did i use

last time and is it still as good yeah well i was just updating uh my awesome jenga repo which has

a bunch of curated third-party apps and i was going through the exact same thing because you

know there's a redis section and i was like what is the difference there is a difference but it's

i can't remember either i have no idea like i so i i was looking this up before we started the talk

last time i did it i used jenga redis cache i've been very happy with that it turns out i've used

that loads of times in the past, but I've also used

Django Redis loads of times in the past, and I have

no idea why. I don't know which one's

good. Just don't peek

under the rug. There was some talk about

bringing a Redis

cache backend into core.

I think the general, the state

of play on that is, yeah, we

are keen on that, but it needs

a Django enhancement proposal, a DEP,

it needs someone to step up and write the

thing. But in principle,

in a, you know, two, three, four

versions time when someone's actually got around and

and written it there might be redis cache back end in django itself yeah because it it really

is on a decent sized site pretty much a guarantee you're going to have red redis or memcache but

probably redis these days yeah and you do want cache i mean like you know just the one thing

we haven't talked about it's not just the pages but the template fragments sometimes templates

are computationally expensive to render and if you've got i don't know let's say you're converting

user-submitted markdown to html okay first of all you've got to render that as html using

markdown and then you've got to run it for a sanitizer like bleach which uses html5lib which

is not necessarily the fastest library in the whole world you if you can cache the output of

that rendering then the next time you have to do it i mean you could cache it in the database say

say you've got that markdown stored in a model field you could have an extra model field for it

for the rendered html you could do it at save time but equally you might do it by caching the

template fragment yeah and i'm thinking this would be a great tutorial to do because for local

development just so folks can see that this actually works if you just have django debug

toolbar which in addition to showing queries will show local page load time which again isn't a

proxy for production but it gives you some sense if you just flip around the switches for per site

per view and just see how much faster it is i mean it is orders of magnitude faster obviously to serve

from a cache so i would say that would be the way to play around with it is just just simply

django debug toolbar and then you can there's more complex tools to see how fast in production

your pages are yeah reality really speaking if you've used one of these apm tools these

these profilers these these live production profilers that monitor your execution time

you will see that the number one place where you're losing time is trips to the database and

the number two time where you're losing time is rendering templates so yeah you know if you can

well after you know doing something stupid with the front end not stupid but doing something with

front end assets like huge images or something oh right okay but okay so here's here's the

interesting thing with caching right is is this actual time the time your django application took

to serve the response versus the perceived time that the user had on the endpoint so you know

let's say your Django application took 300 milliseconds to go to the database, render the

template, serve the response. You know, is that fast? Is that slow? Who knows? But let's say

you're loading, you know, two megabytes of JavaScript, which took two and a half seconds to

be responsive and to load on the client. The client isn't going to notice if you half your

response time from your Django application. They're just not going to notice because it

pales into insignificance. So quite often you'll see the front end people talk about this a lot.

the dominant factor in perceived responsiveness is how fast your page loads to the user so i how

much javascript yeah perceived how much how many images you i mean the images aren't even the thing

it's javascript how much javascript are you loading how long does that take to pull into the

page especially if you're doing one of these um single page applications these client-side rendered

things where it's got to load all the javascript then it's got to pull the data from from an api

and that's the bit where your django app does its thing and it takes 300 milliseconds and then it's

got to render all that into the page before the user says oh yeah the page loaded right and that

whole perceived time I mean it reminds me of so Instagram back when it came out because I was

actually working at Quizlet like just next door to them one of the things besides filters one of

the things that I remember being a wow moment was so this was still when the cell reception in San

Francisco was terrible a lot of places what they did is they as soon as you you picked a image you

wanted to load and start typing in all the information in the background, they started

loading it. So it felt really fast. You didn't, you know, press the button and then wait 5-10

seconds, it felt instantaneous. And I'm sure some others had done it. But that was one of the first

apps I saw that, you know, basically said, we're gonna, we're gonna blow up your bandwidth,

or your cell phone plan, but it, you know, in the background, and now that's a standard process,

anytime, I don't know, Tumblr or something, or Instagram, still, you know, when you're loading

an image, first thing you do is you load the image, and then you type in a whole bunch of

stuff in the background it's already processing so you can just click the button and go load

it yeah and I think the reality of you know you can you can Django gives you these caching

tools and you can use it to speed up your Django response but in a lot of cases the

real work is on the front end and you know do the basics and Django get it get it before

use that nginx caching layer that we talked about but don't sit there then worrying about

micro optimizations when you've got a front-end app that's likely to offer better return on

investment for that optimization time right and i guess in the last major point i would say is it

really it it depends it depends on the type of app that you have how often is the data updated

is it personalized for every user so if you think of facebook you know every you and i log not that

i have facebook um but if i did you and i log in there's different content being loaded there

i'm sure i know that in the background facebook is periodically loading those things into cache

so when you log in it's there but how often does that change if you have a timeline feed or twitter

right i mean that's updating quite a lot so that would be a little more challenging than

a blog or something that doesn't update as much where you can be a lot more

aggressive with the time limits that you uh yeah yeah exactly it's it's like how aggressively can

you cache it is like for a static blog post and do you have a mechanism for invalidating it right

so let's say you've cached a blog post in your you know redis whatever using the django backend

are you able to identify that by key such that when you update it you can you know use you can

when you in your save handler wherever you put that save handler you can say oh and invalidate

the cache so there's two problems in computer science right naming things cache invalidation

and off by one errors yeah yeah well and i guess the last point i would make is the cache is not

an infinite supply it's not the database so often you are finding yourself you're like well i'll

just cache everything all the time but um it's more expensive than uh database space yeah so

this but this is where file system caching comes back into its own right because everyone's like

right let's go straight from ram well ram can get expensive but file system can be cheap and

you know it's this it so there's this um you know when there's this idea about the different

latencies of different things you know l1 cache blah blah blah all the way down to yeah um reading

something off the disk and or getting something over the network it's like how far up that scale

can you move your relevant thing it's just a question of thinking about you know your requirements

and your performance things and all the rest, you know, it's like algorithm design all over again.

Yeah. And it's, I mean, again, it's for an engineering mind, it's sort of fun because

it feels black and white and you can see your progress. Um, I would, at last point I would

mention, so there's a, there's a book that's a couple of years old at this point, but still

very relevant called high performance Django by the folks at Lincoln that, uh, talks about

caching, but talks about a lot of these performance cause this all comes around performance. So

that's definitely worth a look. Um, we'll put the link for that in the show notes. Um, so yeah,

so i was just going to say there was some i remember when i was learning back in the day

and there was some there was some really good books on this sort of stuff and uh you know from

o'reilly and you know all the rest of them i don't know what the latest published books yeah i don't

know what the latest scaling books are you know what the latest you know high performance web

applications type things books are web scalability that well if i may indulge a slight rant so i was

updating my awesome jenga repo where i have a book section and there's still like almost no

up-to-date books up to date being you know actually written in the last couple years books

on django because it changes all the time so it's not that the advice especially around this stuff

is wrong but i would love to know about more up-to-date things i mean as far as i know um

so tango with django just released an updated book that's a classic that's been around

um but there's still i still think i'm almost the only one with 2.2 versions of my books

um so it's yeah if you think of if you find those we'll put them in the show notes but um

that you know it's nice in a way that that's the stuff that doesn't change as much i mean that's

the the challenge and opportunity for me as a content creator is i have to update things all

the time which can become tiring but also makes me do it better but it means that a lot of um i

sort of look longingly at these things like um algorithms and stuff that are more but i would

argue i would argue that the the principles of sort of web application scaling haven't really

altered in the 15 years that i've been sort of looking at it like it's it was that's true

now and you know maybe version numbers have changed but not you know not the actual way

you go about it the point is though since you already know how to do it quite well that five

ten percent that's changed doesn't throw you off whereas difficult for example people ask me all

the time what's the difference in the book between 2.0 2.1 2.2 it's about 10 15 actually different

content and if you already know django it won't throw you off but if you don't know django which

is why you bought the book it will definitely throw you know those those differences are fatal

oh what you know i've got it says 2.2 and i've got 1.8 what's going on here like you know yeah

i remember that i remember being in that exact position i couldn't i you know hating barriers

on my wall head on the wall for ages anyway that's a cheerful note to finish on yes all right so

caching it's important um hopefully this episode helped you all out we are as ever at the jango

chat.com website we are chat jango on twitter the episodes actually are also on youtube the audio

only if you prefer that i keep putting them up there and there are some subscribers but if you

prefer youtube uh go check it out we'll see you all next time bye