← Back to Show Notes

Transcript: URLs: PKs, Slugs, & UUIDs

Hello, and welcome to another episode of Django Chat. This week, we're going to talk about URLs,

IDs, slugs, UUIDs, and explain how it all works. I'm Will Vincent, and I'm joined as always by

Carlton Gibson. Hi, Carlton. Hello, Will. All right, so let's get into URLs. So I'm asked a

lot of questions around slugs in particular, which is sort of a Django-specific newspaper term,

but we'll get into why URLs are confusing. So basically, when you're building a web application,

and especially if you're new to Django, you will have two major types of views. You're going to

have a list view. So let's take a blog application. The list view will just show all the blog posts

or some subset of the blog posts. And then you will have, if you use a generic class-based view

in Django, a detail view. So that's an individual page for just that post. And this is where the

question comes up of what do you put in that url i would say the first place to start you're making

a no no i was just thinking about um i was just thinking through i was just listening to you and

thinking it through and thinking yeah i mean from like the urls like the the the interface right so

it's you've got it in your address browser and sometimes they browsers now they don't show the

url fully i'm old school i like to see i like to see it but a nicely designed url scheme it means

that you've got say posts and then a forward slash and then an id of some kind so like you know

github has github the the organization or the user that the repo the issue and then issue number and

you can just adjust you can get to a different issue number just by changing one little bit of

the url and it's like each of these path components is meaningful and you can kind of drive yeah and

you can share it and it helps with seo versus a fully you know a full javascript setup which is

what you're describing before which it which google's starting to be able to figure out how

a search but it's you know for it's part of one of the it's urls are like the one of the major

components of what the web is and what the internet is i guess but what the web is and it's

they're glorious i love them i'm just geeking out momentarily yeah well i think and i think that's a

i think until for me anyways until i started building building websites i never thought

about urls i think your average you know consumer doesn't think twice about it um right i mean now

they don't even have to remember them they just type the name of the site into whatever search

engine but they do matter they matter for developers who know what they're doing who

want to switch through the the architecture and it matters for seo and it just matters as a

cleanliness structural thing and and one of these huge questions is so how do you especially when

you have a detailed page um so an example would be so github example so you'd have you know github

slash for me, it's WSVincent. That's a list of all my repos. And then what does it look like

for that individual repo for Django X, which is my starter project? You could do that. You could

have it just be Django X. So that's what GitHub does. That's basically like a slug. So it's just

putting the title into the URL. And you would often, if it's multiple names, you would have a

dash, not an underscore. That's a point of confusion. Why not an underscore?

Well, isn't a dash...

You can do both, but I think it's...

I usually see dashes as the preferred method.

Do you have an opinion on that?

Not really.

Yeah, I mean, dash is probably a bit more readable or something.

There's something in the Django docs about...

And I know this is like an ongoing debate among core people around

like when to use underscores, when to use dashes.

And some examples use one and not the other.

I don't know if that's fully been resolved yet in the documentation.

For a Python identifier, a dash isn't a valid...

It has to be an underscore because otherwise it wouldn't...

So if it's like a variable name or a module name

or a package name in Python code,

it has to use an underscore.

That's why Python...

But a URL, I think the default is a dash, not a URL.

So if you're naming the URL, you would use an underscore.

If you're crafting the URL, maybe you'd have a dash in there.

Normally, I think it's that.

Yeah, if you've got a slug.

So what is a slug?

So if you create a blog post model in Django

and you add a slug field, and you can link it to the title, say.

And in the admin, for instance, it will automatically populate the slug field

if you tell it which field to base it on.

So you create a blog post with a title, My Great Post,

and then it will automatically put My-Great-Post in the slug field.

And that slug field is what will appear in the URL,

is what you later then use in the URL.

Right. Well, and there's a specific slug field in the Django models, and there's also a slugify function that will...

that will do that yeah it takes a text string and um it returns a slug like string and you know

yeah yeah so that's so you and you know i think with slugs too there's so there's two things

about it one is that you can do it in the admin so again if you take you know django is built for

a newspaper if someone's going in and making it it will automatically um do it for them or you can

have them let them manually do it actually if you're doing it automatically then you need to

get into the the save method right to override that to automatically do a slug i believe yeah

i guess i mean if the slug is required then yeah it needs to be set if it's required i mean so if

it's so if it's um if you're going to have the creator do it in the admin you can just have the

slug field and they'll just type it but you can also to your point what's more common is based

on the title or whatever you can auto populate it and you do that by overriding the save method

which is a little black magic-y, but you get used to that.

That's a common thing you'll do in, I don't know,

real-world Django applications, override the save method.

But it's confusing in any case.

And then when you get to your URL, so now you're in your urls.py file,

now after 2.0, you can use path instead of using regular expressions in there,

and you either will use the primary key or ID.

so so the django models will django databases will django orm will automatically add an id

an auto increment for you under the hood so your first blog post has a one the second has a two

so on and so forth you can use that uh in the as the id for a detail for a generic class-based

view detail view but confusingly you can also call it a primary key um pk and i've actually i

had a nice description of this but i've forgotten off the top of my head the difference between id

and a pk i guess the difference between id and a pk is you should probably use a pk a primary key

because a primary key can refer to something other than an id like a uuid which we'll get to

whereas an id is very specific also id has a specific meaning a lot in programming languages

so when you are used referencing the id you should probably call it a primary key yeah in general the

pk shortcut is it will it will pick out whatever field on the model happens to be the primary key

now if you don't specify a primary key it will be an auto increment integer field right um right

we'll get into changing that uh primary but if you've you might have used a text field for

instance if you're just writing a small um contacts database just for your friends and

family where you know you you're not going to have conflicts then there's no harm with just

using the text field for name the car the child field for name as the primary key because it's

unique it's just will and carlton and you know jessica and as long as you don't have conflict

that's the challenge with right with your slugs like you know hello world you know you write two

hello world posts or you know the reason to get back to github the reason github partially solves

this by it goes github.com slash user your username slash your repo name and so even if i have a repo

called hello world and you have one called repo hello world they're at different urls because our

usernames are different and you can't have two repos with the same name so that's a way that

you can so it's very rare to have a slug that is just you know example.com slash slug usually you

want something prepended to not make it unique but make it distinguishable so you don't have these

these conflicts and if you do have these conflicts you can do things like you can automatically add

integers or strings onto the end, but that gets really messy. I mean, so you want either unique

for user or unique for date or unique for some other field, right? And you can add validators

on forms or models to help you enforce these. So Django ships with unique for date and unique

together and other constraints on the model field. Yeah. And it used to be, I don't know the SEO

answer on this. It used to be common. You would have the date in your, for example, your blog,

But that's not great for SEO in some ways.

And if you update your blog post, then it gets out of date.

For me, I think if you're creating an actual web blog, an actual blog, like a historical record of what you've been doing that you put on the internet, then great.

Have the date in the URL because on this date, I wrote this post.

But people came to realize that they wanted evergreen content.

You know, particularly for marketing sites where they put out two or three really high quality posts, which are evergreen content, which they're going to hope to drive search traffic to and all these things.

You don't want the day in those posts because.

the data is irrelevant. They're evergreen. Right. Or if you write Django tutorials,

like on my personal site, and you update them to the latest version of Django,

you don't want, you know, Django 2.2 on a post from 2016.

It depends. Is it a web blog or is it evergreen content, which the data isn't relevant for?

Yeah. So that brings us, so we talked about, so the default is probably to use a primary key. I

think when I teach detail views to people, I usually start with a primary key and then

i'll discuss slugs slugs are a little bit more complicated you have to do with the admin or

override the save method but there is another option that is probably a better choice which

is a universally unique identifier a uuid which django added some really nice features around

recently so do you want to explain what a uuid is uuid is what this long string um you think of it

as a long string that's is universally unique right so it's part of the there's uuid for is

a particular algorithm for constructing these unique identifiers and part of it is like the

based on the mac address of the machine that it was generated on part of it is based on you know

the time part of you know and so the chances of a conflict between these are microscopically slim

and so in a way that what id1 or id2 might not be unique the uuid will be unique and this is super

good for if you're creating model instances in multiple places so let's say you've got

a django app with a server and everybody's creating they're using traditional web requests

and they're creating the instances on the server well primary key integer primary keys are no

problem because the database will ensure that the next one created will get the next primary key

and you don't have to worry about it but then let's say you add a mobile client and that has

offline capabilities where people are able to create instances on the mobile client and then

sync them to the server later on all of a sudden yeah all of a sudden you've got the danger of a

of the same right uuid or the same id being created both on the mobile client and on the

server at the same time by different requests and so the way you get around that is to use uuids

because you know no matter where it was created there's not going to be a conflict so later on

when you go to upload the the instance that was created on the client in an offline context to

the server you know there won't be a conflict of the id yeah and this is a bit similar the mobile

example when we talked about our authentication podcast why using tokens rather than session ids

again this the sync this uh syncing issue crops up another issue why a uuid is a good idea is

if you have your id hardcoded in the url like let's say for example i've got a list of

you know clients and each one is an id and someone you know creates a new account and sees

oh i'm client number 500 now they know exactly how many clients you have um if you're at a banking

site all these issues it's just too much information to display publicly it's it really

is a security concern uh in most cases to just put the literal database id in the url i mean for a

blog it doesn't matter but uh if you're building a certainly anything enterprise or anything

anything charging money you know just for a security standpoint a uuid is safer yeah i mean

you like and you don't want to people to just be able to guess the urls and uuids aren't really

guessable either right so you couldn't you couldn't yeah they're they're meant so if i give

you a url that's got id 500 in it and i wonder what id 501 is oh look does that come up yeah

Whereas if I gave you a UUID, you could type in any random string, and it's likely not to be an entry in the database.

Yeah.

And then there's a further level.

You actually shared this with me, Carlton, of hash IDs.

So there's a hashids.org site, and there's a Django hash ID field third-party package.

So hash IDs are really nice in that they enable you to have integer primary keys in the database,

but then they create a nice short slug which is you know half a dozen letters long um which isn't

guessable it's not you can't go from one to two to three by just incrementing it because that the

algorithm that generates the hash id isn't guessable but they're much shorter and nicer than

um uuids uuids are long and ugly and horrible whereas six seven characters brilliant that's

nice. So hash IDs are lovely. I like them. They kind of solve the, they're exposable in a way

that primary keys you might not worry about. They're not, you use a salt in them so that the

They're not predictable, but they still enable you to use integer primary keys under the

hood.

Yeah, they're nice.

And if you're, you know, so how do I change?

How do I go through, step through this process?

This is actually a chapter in my book, Django for Professionals, which should be out now

when this podcast is released.

Because it is, I think, you know, it really is tricky to go from ID to slug to UUID, let

alone to hash ID.

And once you've done it all, you can sort of make these trade-offs and think about what to do.

But I think a takeaway is you can do these things if you are in doubt, use a UUID or a hash ID.

And it's not really that much more work if you do it from the beginning.

Switching over is a little bit of a pain.

So if you've got a model which uses integer primary keys and you want to use a hash ID in, say, an API,

The Django hash IDs package has a REST framework serializer field, which will serialize the integer primary key to a hash ID and vice versa.

So that will handle exposing in your API.

That's great.

You can do the similar to put them into template context if you need to do that.

If you need to migrate to a UUID, well, first thing, when you're coming up with your model, ask yourself this.

Am I going to need to sync this?

If you are going to need to sync it from a mobile client,

then use a UUID to begin with.

If you're not going to, well, hey, just stick with an ID.

It's easier. It's simpler.

But if you do need to migrate, well, probably add the UUID field,

check everything's working when you've adjusted,

and then switch over the primary key in the field definition

and then create a migration which will remove the auto-created ID field for you.

So the Django migrations package will do that.

But do it slowly.

Add the UUID field first, adjust your API, make sure everything's working, and then switch over.

Yeah, it's not a small undertaking, not to mention existing API endpoints or existing pages.

Try and think about it in advance if you can.

It's slightly more complex, but I would say when in doubt, just default to a UUID or a hash ID, and you'll future-proof it.

A little bit like using a custom user model for most people.

You could also do profiles.

You can change it later downstream, but it's a little bit more work.

Yeah, and so I guess I think the general advice,

when you're designing your application, think about your URL structure.

Think what you want your URLs to look like.

Spend a bit of time because cool URLs, they don't change.

They stay the same forever, and they're reliable, and they're addressable,

and you can bookmark that, and you can go back to it.

So think about your URL structure

and try and design your application nicely around your URLs.

Yeah, and I would say that with a bit of experience,

know after the model after the schema the urls is the second most important thing i think about in

terms of architecting a project um because yeah the pages themselves that can change but really

it's you know that that yeah hopefully that doesn't change as much as even the you know the

views for what's displayed on the page itself that's more likely to change than your underlying

url structure yeah and i think of urls as well as like a power tool for power users of your site

like so it's like the command line interface on that's true you know on your computer if you fire

up the terminal and you can drive your computer from the command line you can do things very

quickly and very powerfully that you there might be more long-winded via the gui now the gui is

obviously easier and that's great it's more accessible and we love guis but sometimes that

power tool is exactly what you need and if you've got a really nice url structure it just enables

people who are really into your site and into your application to use it more efficiently agreed

So this wasn't the longest episode,

but I think it covers an important point

and something that trips people up.

And, you know, hash IDs in particular,

if you're already familiar with UUIDs,

hash IDs are really cool and worth looking into.

It's a nice little library.

It's a nice little tool if you, you know,

it's this middle ground between the two.

Yeah, exactly.

So as always, you can reach us

at the djangochat.com website.

We're on Twitter at ChatDjango.

If you like this podcast,

please also leave a review on whatever service you use.

We've received some really nice reviews,

but reviews help people find our work

and keep us motivated to keep doing these.

And that's it.

Anything else you want to add, Carlton?

Yeah, no, questions.

Well, you know, send in some things.

We should have an episode on user questions,

or listener questions.

Yes, actually, we should.

We've been getting a number,

and we've done a couple that are full-length episodes,

like the admin,

because we got asked a couple questions on that.

But especially if there's...

It can be a small question.

That would be fun to do a grab bag of user questions.

So send those in.

All right, we'll see everyone next time.

Thanks for listening.

All right, take care.

Bye-bye.

Bye.