Transcript: Search
Hi and welcome to another episode of Django Chat. I'm Carlton Gibson. I'm joined as ever
by Will Vincent. Hi Will. Hi Carlton. This week we're going to talk about search,
which is something you might want in your Django application. Yes, it is something you
might want in your Django application and it is not a built-in feature. So that is a conundrum
that there's two stages I would say to the journey of search in Django. If you're like me,
my first major Django project, I stumbled through Django. And then at a certain point,
this was a school rating search. So I had 120,000 K through 12 schools in the US. And at some point,
I wanted a search bar. And I was like, okay, well, Django helped me out here. No. Okay.
How do I implement basic search? It's not that it's hard. It's that it requires a pretty deep
understanding of Django, which we'll get into. You have to understand forms and query sets and
passing things around. And then once you get past that hurdle, the second phase is
search is actually really, really, really hard. And everyone expects it to be really good because
they use Google or DuckDuckGo all day long. So Django happens to have now built-in support if
you use Postgres for full-text search, which we'll get into. So we'll get into all that,
starting with the very basics and then advanced options. But basically, search sounds easy until
you've tried it, and then you go hold the phone. Right. So what's the first step? You've created a
form and your html so you've got search and a button that says search next to it and you're
maybe the magnifying glass what yes so what would i say the first thing is you need some data so you
need to we'll start at first principles you need to spin up your project make an app then you need
some data um populate in the admin or a script to load it in so get your models and then i would
say the way i like to teach this is if you think think about what it is it's a form and it's a
a view of some kind. Basically, it's kind of a list view. So if you start by saying, okay,
I'm just going to list view. Let's say there's 10 rows in my database. List all 10. Okay. Now
I want to filter them because basically search is a filter. You're saying of all the things in
my database, how do I filter them based on the user query? So we'll get to the user query,
but first with filters. And actually, so I would say the basic way is use a list view,
And then you can use the filter method.
You can use contains, iContains, which is case-insensitive, all get excluded.
And you can do some filtering on your list.
And actually, you maintain the Django filter package, right, Carlton?
Do you want to give that a plug?
Yeah, OK.
So Django filter is great for this exact use case.
You've got a form, which then submits something probably by a get request.
So query parameters in the URL, right?
And you want to, first of all, validate that those are sane.
And then you want to take a query set and you want to filter it by whatever fields you're allowed to filter.
So let's say you've got an address book with some names, just one single name.
Like so Adam Johnson, Will Vinson, Carl Gibson, Tom Christie, you know, these are names that you've got in there and you just want something that matches it.
So you have a car filter, a character filter for the filters on strings, and you set it to filter on that name field and anything that matches will be returned.
and that's a nice way of filtering down from your address book of 300 to four matches say yeah and i
like the way you said that so i'm take it slightly different more abstract because i was thinking in
terms of if i show code to people but podcast i can't show code so the first step is you set up
your basic form right and then the key thing i think that's really hard to understand is you can
use the name field and you can set an input um which so this is the this is the query itself
you can call it whatever you want. Generally, you would call it queue. So if you go to Google
and type in a search, but you have the flexibility to call it whatever you want. It just happens to
be often called queue. And then if you're using a list view, you need to modify the query set.
So you can do that just as a field query set, or you can do get query set and have a little more
options. And you pass in the queue as the query that you then filter on. I have a post on this,
which we'll link to but so your description was great and correct but in terms of like the
logistics of like well how do i get the query that's a aha moment for people to see how it's
passed from the form to the view uh to be used yeah so i mean the i would use an actual django
form here and django filter does this under the hood creates a django form and then you you validate
the form so that you know you know it isn't some crazy value that might do harm to your
to your database it's at least going to look like a character field and then you get the form the
data out, so that comes in in the request.get, you put it into a Django form and you validate
it like you should do with all user input ever, you get it out of the form and then
you pass that value to the query set, to your filter method of the query set.
right and like for instance if you've got an if you've got an integer field um if you just pass
the raw value from the um the query set into an integer field it's not the right type because you
need an integer whereas if you pass it through a form field which is an integer form field it will
give you back an integer so it casts the or a date field it will give you back a date from a string
you know it if you need those kind of um that kind of casting if you're not dealing just with
plain strings, the form can do that transformation for you and can clean that step up. So that's
worth always, always, always all input from all users, all times through either a Django form or
a REST framework serializer or something to sanitize it. And yeah, Mozilla has a very in-depth
two guides to this, to sending the form data and then separately to validating it. So I'll link to
those. But the good news is if you use a Django form, you get most of that out of the box.
And you also get the form rendered in HTML too.
Yeah, nice. Okay, so you've got the data. Now what? So you can do these basic filters,
as I mentioned, and specify on, say, the name field and the title field, whatever you want.
And you can also chain them together so you can and them. But sometimes, and definitely,
you will want to do ors. So maybe search by the name of the author or the title. And for that,
you need a QObject. Do you want to take a stab at QObjects? Because that's a pretty cool,
i would say intermediate advanced um feature that you use all the time once you know it's there
yeah secure objects exactly exactly this allows you so a queue you will when you call filter and
you put us in um i don't know name equals adam right and that's that short for name underscore
underscore exact equals adam because there's an implicit lookup there the lookup is exact or
contains or I contains any of these lookups and the default one is exact so if you don't specify
one the the ORM puts that in under for you and you pass those in as keyword arguments into the
filter method or the exclude method of your query set a queue object just lets you create just take
that pair that lookup and create it as a queue object which you can then pass into the the filter
method or the exclude method in place of those keyword arguments so it takes queues if you pass
in just a keyword argument it creates it turns it into a queue under the hood and then process it
after but the great thing about queues is they can they can you can apply boolean logic to them
so they can be handed or ordered or combined in whichever way so they're really handy it's worth
checking out the docs we should link to those in the show notes yep we will um so and then the
form itself you talked briefly about this but get versus post so uh this is the two ways you can
send data so a get would so both so post bundles the form data encodes it and sends it directly to
the server a get bundles it but puts it into a string that is in the destination url so again
if you do a google search go look at the url and you will see it will be google.com google.com q
equals and a string that matches the query that you made so in general for a search the basic rule
of thumb is if something's going to touch the database and update it you should do a post or
if it's something that's secure credit card information we should do a post but if it's
just a search query a lot of the times you can get away with just using a get but not only get
away with it's much better because um you can bookmark that url yeah so say you do a search
and you've spent you know the django issue issue tracker is a classic example there's you know
1200 open accepted tickets and there's 48 000 ways of filtering those tickets and you find you get
You know, you learn to filter by component and then, you know, needs docs and you get it down and there's four tickets and you think, yeah, I can take on those four tickets and they're all related.
And then you can just get that URL and you can bookmark it and you can go back to that search.
And if you want to, it's there.
Whereas to go back and have to rebuild your search from scratch is, well.
Yeah.
That might be particular to your situation.
No, but that applies to any.
But I understand.
I've never done that in my life, but I understand that I could.
But have you ever taken a Google URL and sent that to your partner or to your mom?
Yeah, sure.
Well, DuckDuckGo, TeamDuckDuckGo, but yeah.
But you're doing exactly the same thing.
You're taking the URL with the query parameters already in, and you're sharing it with somebody else.
Now, in my case, I'm sharing it with my future self, but I could equally be sharing it with Marius.
Well, I have seen developers I respect implement search with posts and had this argument with them.
And to me, it seems like you would always use a get.
But next time I have this debate with them, maybe I'll share it.
If anyone wants to come on the show and tell us why you should use post for your URLs.
Yeah, I won't name names publicly, but I'll have that debate.
So what's the next stage?
So it'd be nice to go beyond a query to maybe an entire document, a book, an email, and
you can do that with what's called full-text search.
So this is a search we're used to.
This is what Google does, where you can search.
The technical term in Postgres is a document, and that can just refer to any kind of body
of knowledge.
So what are the options for this?
So there's a couple.
We'll get into how that works.
But standard solutions, you've probably heard of Elastic, Elastic Search.
You could use Solr.
There's hosted solutions like Algolia, SwiftType, which is now owned by Elastic.
So if you don't want to spin up your own server, there are some hosted solutions.
Or you could also use, there's a third-party app, Django Haystack, which actually I used for one of my companies back in the day that connects.
That's a really nice interface and connects via a driver to Solar Elasticsearch or even Woosh, which is built in Python.
So you can do it on SQLite.
So you can try out CoolSearch without having to get into Postgres SQL and installing that.
So yeah, so full-text search.
So what is it?
Actually, before we get what is it, it's been in Postgres since 2008 and in Django since version 1.10.
So 2016, Mark Tamlin led that charge.
Were you a Django fellow in 2016?
Were you involved with that?
No, that was, so those days I was, I mean,
being a Django user for quite a while,
I followed that project with great interest.
I was on the Contra Postgres stuff as soon as it came out.
Because you had a Kickstarter to fund it.
Yeah, yeah, and that was super, you know,
so they were good times because Tom Christie
had the DRF Kickstarter, REST Framework Kickstarter
that really pushed that forward.
And then there was the Contra Postgres Kickstarter
about the same time.
You know, Django was...
Can we do that for Async?
Because Andrew was talking about some...
I guess it ties into the DSF.
Well, this is exactly where Andrew's at at the moment,
thinking about fundraising for the Async stuff.
So, I mean, realistically, I think we can go...
You know, we'll get Async views in,
and then the question is, well, where's the money come for,
say, the ORM rewrite?
That's big.
And there are Async backends,
And there is the appetite for it, but it's a big job and that's going to need support somehow.
So it'd be a Kickstarter, a Mozilla grant, whether one of the big corporations can come in, you know, I don't know.
Yeah, but it's worked before. So in this case, there is now a dedicated module that will wrap up all these full text Postgres features for you.
And so full text itself, without getting into all of how it works, you can do things like
rankings, you can do indexing, which is important for performance, you can do phrase search.
So just more intelligent queries, you can do stop words.
So ignore things like the that are very common.
This is all language specific.
You can do stemming, which is also language specific.
So you can match.
So for example, if someone typed in ran, you can say, well, that matches to run.
Whereas if you're just doing a straight query, there's no intelligence there.
you can do accent different languages json support so all the things you would expect from a decent
search which really comes down to relevance um full text search gets you there and it's a really
deep and interesting field and the docs on the postgres site um chapter 12 which we'll link to
are fantastic and i'm giving a talk at django con on this so i'll get into this a little bit but
i was wondering have you seen anything um on like the the um the performance of postgres full text
search versus say something that's loosing back like elastic search or solar um because yeah you
know if you've already got postgres up and running it's quite good it gets you quite a long way and
what i'm not sure about is at what point it's like i know the quality is better here and then it's
worth the extra ops effort to get because elastic search is easy to run totally based on what i've
read and spoken to people about it is very very performant so i would believe that there are cases
where elastic and these other ones are better and they're it might depend on the structure of the
data but it is very performant and a lot of folks that i know are setting up using this module it's
all built in they don't have to rely on external service and they're very happy and these are large
sites but that is that would be a good question if you can avoid spinning up one extra service i
mean an elastic search isn't the easiest service to run in the world so if you
can avoid spinning up that extra dependency you've just got so much more
capacity exactly it's easy it's built-in it's free
yeah I think we're gonna have we'd like to have someone from there's a number of
prominent Django people who I think work for elastic we could have
them on and ask them that question yeah it's a good question i would definitely start with
someone knowledgeable search um and then you know see where you go from there uh so so the uh the
package in django has a number of fields that make some things easier um we'll link to this this is
in the docs so there's a search vector so you can query against more than one field in your database
search search vector is a search query so you can add stemming and stop words search rank so you can
get into all the ranking um and you can i guess the last big one is there's also search vector
field where for performance you can add that but um you need to do a manual trigger so with all
these things it really depends on is the data static or is it dynamic so how much is it changing
with indexing um postgres has a gist which is faster for dynamic data uh versus a gin for static
data and same thing with um just in general with search the question is how fast is your data
changing because you don't ever want to actually do a query against the data you want to do a
separate process that pre-process everything and creates the indexes but do you do need to do that
every hour every day every minute um trade-offs uh trade-offs they're kind of fun to do but that's
why it's hard to give a blanket statement on best way to do it depends on the data it depends on
your needs yeah i mean um that's that's always true i mean what's interesting is you get like
it's a massively complicated it depends right we got to our tagline what's the answer it depends
but like you you know my my experience is more with um elastic search and you will spend so long
filtering uh with indexes and you know you will re-index and it's like that will be a most painful
operation because you realized you needed to query your data in a slightly different way or you
weren't getting the results you want and so you re-index the whole thing and that takes hours and
then it's like the search is tough it just really is so yeah and if you're in it especially if
you're in like an e-commerce um situation you know there's huge teams that focus just on this
because it is a tough thing but it's speed of result as well right because i mean this is the
other factor but at some point um just doing a row scan on your database to see what what can
um the see the records which contain the text you're looking for there's it's not going to be
good enough but it's not going to be fast enough either you need it to be pre-indexed so that you
get a result quickly you know google found two million results in 0.004 seconds or whatever
yeah yeah um so there there is in addition to my talk which will um we'll link to i'm not sure if
will be out. I'm giving a talk at DjangoCon on this topic. There's some fantastic existing talks
that have criminally low view counts on YouTube. So we'll link them in the notes. But
Paolo Malciori, sorry if I said that wrong, has a full text search in Django talk he's given
at I think EuroPython. Marcus Holderman last year at DjangoCon Europe gave on the lookout for your
data a talk. I know there are some talks on Django and Elastic in particular that are worth
looking at that we'll link to. So it's a deep topic. You know, this podcast and my tutorial
and the talk, I really want to get people up to speed where they can start doing full-text search
instead of just saying, we'll just airdrop you in with a fully configured Django project.
And of course, you know how forms and views works. You know, I want to be able to help people ramp up
and see how you can progress. As well. It's a good learning curve because you don't need,
like we, you know, we finished off talking about full-text search and Postgres and Elastic and
these kind of things which are much bigger but you can get quite a long way with just you know
filtering on the orm yeah right until you've got thousands of records like many thousands of
records an i contains will nine times out of ten get you everything you need yeah that's a good
oh yeah that's the record i wanted yeah and unless you're an e-commerce site is it worth the time
you know maybe not um it is fun to play with uh so those are the highlights and definitely for
forms too um using a built-in django forms you know the forms is maybe one of the coolest i would
say least at least from the beginner perspective appreciate aspects of django because you don't
think about the fact that forms are really hard and there's so much security and thought put into
what we already have in django so yeah and who likes the writing that html my hand it's just
gonna kill you so it's not so bad i give an example but yeah it's better to use it as with
everything in django if you can rely on the work of millions of smart people you should do it so
all right that's the highlights for the talk if you have any feedback you can find us at chat
django on twitter we're at django chat.com site and we have a newsletter which you're welcome to
sign up for if you want regular updates so that's all for now we'll see you next time see you next
time bye-bye bye