Transcript: URLs: PKs, Slugs, & UUIDs
Hello, and welcome to another episode of Django Chat. This week, we're going to talk about URLs,
IDs, slugs, UUIDs, and explain how it all works. I'm Will Vincent, and I'm joined as always by
Carlton Gibson. Hi, Carlton. Hello, Will. All right, so let's get into URLs. So I'm asked a
lot of questions around slugs in particular, which is sort of a Django-specific newspaper term,
but we'll get into why URLs are confusing. So basically, when you're building a web application,
and especially if you're new to Django, you will have two major types of views. You're going to
have a list view. So let's take a blog application. The list view will just show all the blog posts
or some subset of the blog posts. And then you will have, if you use a generic class-based view
in Django, a detail view. So that's an individual page for just that post. And this is where the
question comes up of what do you put in that url i would say the first place to start you're making
a no no i was just thinking about um i was just thinking through i was just listening to you and
thinking it through and thinking yeah i mean from like the urls like the the the interface right so
it's you've got it in your address browser and sometimes they browsers now they don't show the
url fully i'm old school i like to see i like to see it but a nicely designed url scheme it means
that you've got say posts and then a forward slash and then an id of some kind so like you know
github has github the the organization or the user that the repo the issue and then issue number and
you can just adjust you can get to a different issue number just by changing one little bit of
the url and it's like each of these path components is meaningful and you can kind of drive yeah and
you can share it and it helps with seo versus a fully you know a full javascript setup which is
what you're describing before which it which google's starting to be able to figure out how
a search but it's you know for it's part of one of the it's urls are like the one of the major
components of what the web is and what the internet is i guess but what the web is and it's
they're glorious i love them i'm just geeking out momentarily yeah well i think and i think that's a
i think until for me anyways until i started building building websites i never thought
about urls i think your average you know consumer doesn't think twice about it um right i mean now
they don't even have to remember them they just type the name of the site into whatever search
engine but they do matter they matter for developers who know what they're doing who
want to switch through the the architecture and it matters for seo and it just matters as a
cleanliness structural thing and and one of these huge questions is so how do you especially when
you have a detailed page um so an example would be so github example so you'd have you know github
slash for me, it's WSVincent. That's a list of all my repos. And then what does it look like
for that individual repo for Django X, which is my starter project? You could do that. You could
have it just be Django X. So that's what GitHub does. That's basically like a slug. So it's just
putting the title into the URL. And you would often, if it's multiple names, you would have a
dash, not an underscore. That's a point of confusion. Why not an underscore?
Well, isn't a dash...
You can do both, but I think it's...
I usually see dashes as the preferred method.
Do you have an opinion on that?
Not really.
Yeah, I mean, dash is probably a bit more readable or something.
There's something in the Django docs about...
And I know this is like an ongoing debate among core people around
like when to use underscores, when to use dashes.
And some examples use one and not the other.
I don't know if that's fully been resolved yet in the documentation.
For a Python identifier, a dash isn't a valid...
It has to be an underscore because otherwise it wouldn't...
So if it's like a variable name or a module name
or a package name in Python code,
it has to use an underscore.
That's why Python...
But a URL, I think the default is a dash, not a URL.
So if you're naming the URL, you would use an underscore.
If you're crafting the URL, maybe you'd have a dash in there.
Normally, I think it's that.
Yeah, if you've got a slug.
So what is a slug?
So if you create a blog post model in Django
and you add a slug field, and you can link it to the title, say.
And in the admin, for instance, it will automatically populate the slug field
if you tell it which field to base it on.
So you create a blog post with a title, My Great Post,
and then it will automatically put My-Great-Post in the slug field.
And that slug field is what will appear in the URL,
is what you later then use in the URL.
Right. Well, and there's a specific slug field in the Django models, and there's also a slugify function that will...
that will do that yeah it takes a text string and um it returns a slug like string and you know
yeah yeah so that's so you and you know i think with slugs too there's so there's two things
about it one is that you can do it in the admin so again if you take you know django is built for
a newspaper if someone's going in and making it it will automatically um do it for them or you can
have them let them manually do it actually if you're doing it automatically then you need to
get into the the save method right to override that to automatically do a slug i believe yeah
i guess i mean if the slug is required then yeah it needs to be set if it's required i mean so if
it's so if it's um if you're going to have the creator do it in the admin you can just have the
slug field and they'll just type it but you can also to your point what's more common is based
on the title or whatever you can auto populate it and you do that by overriding the save method
which is a little black magic-y, but you get used to that.
That's a common thing you'll do in, I don't know,
real-world Django applications, override the save method.
But it's confusing in any case.
And then when you get to your URL, so now you're in your urls.py file,
now after 2.0, you can use path instead of using regular expressions in there,
and you either will use the primary key or ID.
so so the django models will django databases will django orm will automatically add an id
an auto increment for you under the hood so your first blog post has a one the second has a two
so on and so forth you can use that uh in the as the id for a detail for a generic class-based
view detail view but confusingly you can also call it a primary key um pk and i've actually i
had a nice description of this but i've forgotten off the top of my head the difference between id
and a pk i guess the difference between id and a pk is you should probably use a pk a primary key
because a primary key can refer to something other than an id like a uuid which we'll get to
whereas an id is very specific also id has a specific meaning a lot in programming languages
so when you are used referencing the id you should probably call it a primary key yeah in general the
pk shortcut is it will it will pick out whatever field on the model happens to be the primary key
now if you don't specify a primary key it will be an auto increment integer field right um right
we'll get into changing that uh primary but if you've you might have used a text field for
instance if you're just writing a small um contacts database just for your friends and
family where you know you you're not going to have conflicts then there's no harm with just
using the text field for name the car the child field for name as the primary key because it's
unique it's just will and carlton and you know jessica and as long as you don't have conflict
that's the challenge with right with your slugs like you know hello world you know you write two
hello world posts or you know the reason to get back to github the reason github partially solves
this by it goes github.com slash user your username slash your repo name and so even if i have a repo
called hello world and you have one called repo hello world they're at different urls because our
usernames are different and you can't have two repos with the same name so that's a way that
you can so it's very rare to have a slug that is just you know example.com slash slug usually you
want something prepended to not make it unique but make it distinguishable so you don't have these
these conflicts and if you do have these conflicts you can do things like you can automatically add
integers or strings onto the end, but that gets really messy. I mean, so you want either unique
for user or unique for date or unique for some other field, right? And you can add validators
on forms or models to help you enforce these. So Django ships with unique for date and unique
together and other constraints on the model field. Yeah. And it used to be, I don't know the SEO
answer on this. It used to be common. You would have the date in your, for example, your blog,
But that's not great for SEO in some ways.
And if you update your blog post, then it gets out of date.
For me, I think if you're creating an actual web blog, an actual blog, like a historical record of what you've been doing that you put on the internet, then great.
Have the date in the URL because on this date, I wrote this post.
But people came to realize that they wanted evergreen content.
You know, particularly for marketing sites where they put out two or three really high quality posts, which are evergreen content, which they're going to hope to drive search traffic to and all these things.
You don't want the day in those posts because.
the data is irrelevant. They're evergreen. Right. Or if you write Django tutorials,
like on my personal site, and you update them to the latest version of Django,
you don't want, you know, Django 2.2 on a post from 2016.
It depends. Is it a web blog or is it evergreen content, which the data isn't relevant for?
Yeah. So that brings us, so we talked about, so the default is probably to use a primary key. I
think when I teach detail views to people, I usually start with a primary key and then
i'll discuss slugs slugs are a little bit more complicated you have to do with the admin or
override the save method but there is another option that is probably a better choice which
is a universally unique identifier a uuid which django added some really nice features around
recently so do you want to explain what a uuid is uuid is what this long string um you think of it
as a long string that's is universally unique right so it's part of the there's uuid for is
a particular algorithm for constructing these unique identifiers and part of it is like the
based on the mac address of the machine that it was generated on part of it is based on you know
the time part of you know and so the chances of a conflict between these are microscopically slim
and so in a way that what id1 or id2 might not be unique the uuid will be unique and this is super
good for if you're creating model instances in multiple places so let's say you've got
a django app with a server and everybody's creating they're using traditional web requests
and they're creating the instances on the server well primary key integer primary keys are no
problem because the database will ensure that the next one created will get the next primary key
and you don't have to worry about it but then let's say you add a mobile client and that has
offline capabilities where people are able to create instances on the mobile client and then
sync them to the server later on all of a sudden yeah all of a sudden you've got the danger of a
of the same right uuid or the same id being created both on the mobile client and on the
server at the same time by different requests and so the way you get around that is to use uuids
because you know no matter where it was created there's not going to be a conflict so later on
when you go to upload the the instance that was created on the client in an offline context to
the server you know there won't be a conflict of the id yeah and this is a bit similar the mobile
example when we talked about our authentication podcast why using tokens rather than session ids
again this the sync this uh syncing issue crops up another issue why a uuid is a good idea is
if you have your id hardcoded in the url like let's say for example i've got a list of
you know clients and each one is an id and someone you know creates a new account and sees
oh i'm client number 500 now they know exactly how many clients you have um if you're at a banking
site all these issues it's just too much information to display publicly it's it really
is a security concern uh in most cases to just put the literal database id in the url i mean for a
blog it doesn't matter but uh if you're building a certainly anything enterprise or anything
anything charging money you know just for a security standpoint a uuid is safer yeah i mean
you like and you don't want to people to just be able to guess the urls and uuids aren't really
guessable either right so you couldn't you couldn't yeah they're they're meant so if i give
you a url that's got id 500 in it and i wonder what id 501 is oh look does that come up yeah
Whereas if I gave you a UUID, you could type in any random string, and it's likely not to be an entry in the database.
Yeah.
And then there's a further level.
You actually shared this with me, Carlton, of hash IDs.
So there's a hashids.org site, and there's a Django hash ID field third-party package.
So hash IDs are really nice in that they enable you to have integer primary keys in the database,
but then they create a nice short slug which is you know half a dozen letters long um which isn't
guessable it's not you can't go from one to two to three by just incrementing it because that the
algorithm that generates the hash id isn't guessable but they're much shorter and nicer than
um uuids uuids are long and ugly and horrible whereas six seven characters brilliant that's
nice. So hash IDs are lovely. I like them. They kind of solve the, they're exposable in a way
that primary keys you might not worry about. They're not, you use a salt in them so that the
They're not predictable, but they still enable you to use integer primary keys under the
hood.
Yeah, they're nice.
And if you're, you know, so how do I change?
How do I go through, step through this process?
This is actually a chapter in my book, Django for Professionals, which should be out now
when this podcast is released.
Because it is, I think, you know, it really is tricky to go from ID to slug to UUID, let
alone to hash ID.
And once you've done it all, you can sort of make these trade-offs and think about what to do.
But I think a takeaway is you can do these things if you are in doubt, use a UUID or a hash ID.
And it's not really that much more work if you do it from the beginning.
Switching over is a little bit of a pain.
So if you've got a model which uses integer primary keys and you want to use a hash ID in, say, an API,
The Django hash IDs package has a REST framework serializer field, which will serialize the integer primary key to a hash ID and vice versa.
So that will handle exposing in your API.
That's great.
You can do the similar to put them into template context if you need to do that.
If you need to migrate to a UUID, well, first thing, when you're coming up with your model, ask yourself this.
Am I going to need to sync this?
If you are going to need to sync it from a mobile client,
then use a UUID to begin with.
If you're not going to, well, hey, just stick with an ID.
It's easier. It's simpler.
But if you do need to migrate, well, probably add the UUID field,
check everything's working when you've adjusted,
and then switch over the primary key in the field definition
and then create a migration which will remove the auto-created ID field for you.
So the Django migrations package will do that.
But do it slowly.
Add the UUID field first, adjust your API, make sure everything's working, and then switch over.
Yeah, it's not a small undertaking, not to mention existing API endpoints or existing pages.
Try and think about it in advance if you can.
It's slightly more complex, but I would say when in doubt, just default to a UUID or a hash ID, and you'll future-proof it.
A little bit like using a custom user model for most people.
You could also do profiles.
You can change it later downstream, but it's a little bit more work.
Yeah, and so I guess I think the general advice,
when you're designing your application, think about your URL structure.
Think what you want your URLs to look like.
Spend a bit of time because cool URLs, they don't change.
They stay the same forever, and they're reliable, and they're addressable,
and you can bookmark that, and you can go back to it.
So think about your URL structure
and try and design your application nicely around your URLs.
Yeah, and I would say that with a bit of experience,
know after the model after the schema the urls is the second most important thing i think about in
terms of architecting a project um because yeah the pages themselves that can change but really
it's you know that that yeah hopefully that doesn't change as much as even the you know the
views for what's displayed on the page itself that's more likely to change than your underlying
url structure yeah and i think of urls as well as like a power tool for power users of your site
like so it's like the command line interface on that's true you know on your computer if you fire
up the terminal and you can drive your computer from the command line you can do things very
quickly and very powerfully that you there might be more long-winded via the gui now the gui is
obviously easier and that's great it's more accessible and we love guis but sometimes that
power tool is exactly what you need and if you've got a really nice url structure it just enables
people who are really into your site and into your application to use it more efficiently agreed
So this wasn't the longest episode,
but I think it covers an important point
and something that trips people up.
And, you know, hash IDs in particular,
if you're already familiar with UUIDs,
hash IDs are really cool and worth looking into.
It's a nice little library.
It's a nice little tool if you, you know,
it's this middle ground between the two.
Yeah, exactly.
So as always, you can reach us
at the djangochat.com website.
We're on Twitter at ChatDjango.
If you like this podcast,
please also leave a review on whatever service you use.
We've received some really nice reviews,
but reviews help people find our work
and keep us motivated to keep doing these.
And that's it.
Anything else you want to add, Carlton?
Yeah, no, questions.
Well, you know, send in some things.
We should have an episode on user questions,
or listener questions.
Yes, actually, we should.
We've been getting a number,
and we've done a couple that are full-length episodes,
like the admin,
because we got asked a couple questions on that.
But especially if there's...
It can be a small question.
That would be fun to do a grab bag of user questions.
So send those in.
All right, we'll see everyone next time.
Thanks for listening.
All right, take care.
Bye-bye.
Bye.