Transcript: Ethical Ads - David Fischer
Hello, and welcome to another episode of Django Chat, a weekly podcast on the Django Web Framework.
I'm Will Vincent, joined by Carlton Gibson. Hi, Carlton.
Hello, Will. Long time no see.
Yeah. And this week, we are joined by David Fisher from Read the Docs. Hello, David.
Hello. How are you?
Marvelous. Thanks for coming on, David.
Yeah, thank you for coming on. There is a whole bunch of things I want to ask you about,
But maybe we'll just start with your origin story. How'd you get into programming, Python, Django, and then we'll go from there.
Oh, all right. So I have a pretty like standard origin story.
You know, I sort of studied math in college and loved programming before that, wanted to get into it, sort of had a programming heavy undergraduate and went right into like a job at a pretty big tech company right out of college.
So that was a while ago. But now I sort of decided, you know, I'm done working for these big companies. You know, I'm not like super negative about them or anything like that. But I wanted to get my hands a little dirtier, wanted to be a little bit closer, maybe to both the business side and the code side.
And so I started working for more on the startup, startup size companies and, and read the docs
actually a couple of years back, about not quite three years ago, it was a pretty natural
fit for me as they were sort of building out their advertising platform.
Got it.
And when did Python fit into your programming journey?
Oh man, I took like one class that touched on Python in college and it actually helped
me get my first job, which is sort of a random thing.
I did Python in one class, and then this job application was like, people who know Python.
I was like, that's me.
I know Python.
You know, as much as an undergraduate who doesn't know Python knows Python.
But it sort of helped me get my first job.
Didn't do Python at that job for years.
And then maybe four or five years into that job, they were like, do some Python.
So that was how it ended up working.
I was already working on some web stuff, and I ended up picking up Django.
I sort of like looked at a couple of the alternatives at the time and Django, this is in the 096 days, was the, it ended up being a great fit.
So that was how I got into Django.
It was a long time ago.
Carlton, what version was it for you?
I don't know, about 1, 1.1, 1, something around the 1.0 times.
Because I remember going to a conference and there was this, it was like Django, Django, Django.
So it must have been around the 1.0, around that time.
So maybe for listeners, what is maybe Read the Docs?
What's the quick story on that if they're not familiar?
I'm sure most of you listeners have seen Read the Docs, but maybe they don't know the story
of Read the Docs.
Yeah, it's sort of an interesting story.
And I'm not the main sort of protagonist in this story at all.
Sort of Eric Holscher and Anthony Johnson are sort of the main folks in this story.
I came much later, although I was an early user of Read the Docs.
I think I started using Read the Docs in like 2011.
I, you know, had an account.
You know, I sort of look back in my password manager.
It's like created date.
I think it was 2011 or something like that.
Yeah, exactly.
That's how you figure that out.
But it, I think it started at like Django Dash in 2010,
which was like a one weekend sprint project.
And it was basically like the automation around Sphinx.
That was exactly what it is.
So how do you do continuous documentation, not building off of every commit to master or, you know, something like that?
And so Eric Holscher, I think at the time, was probably one of the main people behind it.
There was a couple other people at that time who ended up not sort of transitioning, not launching it as a company.
And it just sort of, it took off in the Django community and in the wider Python community.
People started using it and Eric had a day job, you know, working a regular job, like a regular person.
And he would get these pages or calls being like, hey, the docs is down.
And he's like, well, you know, I got my regular job, guys, you know, like.
So eventually, it was so big, so many people relied on it.
I think Eric and Anthony decided to launch a company around it.
And basically, it's funded through a combination of advertising on open source documentation.
They sell sort of ad-free commercial documentation that has a few additional features so sort of companies can buy it.
And then there's also sort of like what's called Read the Docs Gold, but it's basically like regular people can pay a little bit of money and you get some extra perks on Read the Docs, like ad-free on Read the Docs.
And you can sometimes designate a project ad-free.
There's a few different things you can do.
So that's sort of where it all comes from.
And, you know, I sort of looked at this and Eric had produced sort of this guide on how their advertising was different from other people.
You know, it was sort of advertising without tracking people.
So pretty much the opposite of what most advertising does.
And I sort of just emailed Eric out of the blue.
I'd met him before at a conference like, you know, probably seven years before.
I'm sure that he had no idea who I was and I barely remembered who he was at that time.
but i emailed him to ask him a few questions about that and he was basically like you you should you
should work our advertising and i was like yes i should so that was that was how i got the job at
read the docs i sort of did a few projects as a consultant for them and then transitioned over to
to full-time before we go off into the ads story which is there's much more to talk about there
can i just wind back a little bit in the early days was it like eric just like hosting this
on his own server?
Like, because it grew up quite big quite quickly, right?
So where's the cost?
Great question.
It's all right if you've just got your blog,
but you're hosting, but when you're hosting-
And this is where a number of the problems came from.
You know, so it was, when I stepped in here,
I was like, man, this is hosted insanely cheaply.
You know, they're doing, you know, 50X or even more,
maybe a hundred X the traffic of some of the other properties I've worked on before. And they have a
budget, like a monthly, you know, infrastructure budget. That's like the same as other places I've
worked that did, you know, 2% of the traffic that Read the Docs did. So it was, it was built to run
insanely cheaply. Now it has some advantages. Read the Docs is almost entirely hosting static
content that is already built, you know, just straight HTML, CSS, JavaScript that's built
through something, built through a builder. So yes, you have to run builders. That's kind of
expensive. But the hosting and the serving is relatively inexpensive. So there are some
advantages there. But yeah, read the docs. Our stats are pretty public. We're up front about
them. It did something like in August, 45 million page views last month. So a lot. It does a lot of
traffic is the answer. And that's just the open source side of the hosting. That's not the
commercial hosting. And we have some privacy protection in there to like, you know, we don't
send anything to analytics if somebody has do not track marked on their browser, things like that.
So it doesn't count any of that. You run an ad blocker, you don't get counted. This is 45 million
page views discounting all of that. So it does an absolutely massive amount of traffic.
And tech people are quite likely to run.
So it would go down all the time because it was hosted on a shoestring budget.
Eventually, Eric sort of moved it over to AWS and had like load balancers, you know,
like things that you would do if you wanted something to stay up.
Yeah, crazy stuff.
Yeah, crazy stuff.
And this worked pretty well on still what was at the time a relatively small budget.
But they launched sort of some crowdfunding campaigns.
Some of these brought in like real amounts of money.
You know, they did like a big crowdfunding push and they got like a real amount of money.
it was like $30,000, which sounds like a lot until you realize that's like a few months of
infrastructure budget. But it was all one-time donations. And so then the month after they got
$30,000, the next month it was like, oh, we brought in $1,000, which is, that's going to
be underwater on the infrastructure budget. So they realized, what do we do here? And the answer
was mostly advertising. Well, I remember that post, maybe if I find it, I'll put it in the links
about, you know, talking basically kind of what you said in the, you know, not joyously jumping
into ads, but basically being like, we need to cover the costs. And I remember thinking it was
really well written and really sort of laid out the dynamic for a lot of people, which is, yeah,
it's hard to charge and can't lose money on something that's... Yeah, that's a side project,
essentially, or it was at one time, you know? Yeah, you're absolutely right. And there is sort
of the reality of the situation that, you know, the budget of Read the Docs is a rounding error
to somebody like Google. It might even be a rounding error to somebody like GitHub, even
pre-Microsoft acquisition. So, like, they could just launch it and if it loses money, whatever,
no big deal. But, you know, for us, it's like real money. You know, commercial hosting does
not bring in as much money as advertising for us right now. Wow. That's interesting because...
Yeah. Yeah. Huh. And, you know, like, we don't have venture capital backing us. There's no sort
of like money banks, daddy money bags behind us or anything. It's like, oh, we bring in a little
bit more money. That means we can hire one more person. So it's all sort of bootstrapped. There's
no venture capital at all. Right. Well, I think it's in some ways that forced discipline is the
best thing you can have. I mean, it's unpleasant in some ways, but I mean, when I back 10 years
ago i was working in a company called quizlet which was a top 100 website by traffic with what
was it two and a half engineers not a big budget and i think in many ways my tenure there which
was about three years what we the main thing we did is we didn't harm it we just because we i
spent so much time trying to recruit people you know we kept it free we kept it up and
you know those constraints were a good thing they were frustrating right because you always have
your long list and you're like, man, but it really does, uh, focus priorities. And I think can also
lead you away from sometimes if you have 20 engineers, it's like, well, they gotta be doing
something. And even though maybe that's not what your business needs. Yeah, you're absolutely right.
You know? So yeah, constraints sometimes are, are sort of, that's how invention happens.
For sure. Well, I, uh, I love that you're, so your code is open source. You're the ad
server you know client server uh server and client are both up there and so i was actually
prepping for this interview going through back to the first commit because i love seeing how
uh people build things and i wonder if i'll put the link in for people but essentially it's i love
actually how simple the project is even today it's essentially a single um ad server app within
Django, because it's quite easy to, I would say, bloat out a Django project. And you've been very
constrained. But I wonder if you recall, you know, building, you know, starting from the kind of that
process, right? Like, what did you start with? To the extent you can recall, like that, you know,
the changes over time, because there's a big difference between prototype to first stage
production to, you know, the scale that that you are at now. Well, you're probably looking at a
commit that's not that old. I think it was as of 2018, maybe? Yeah. So basically, when I started
at Read the Docs, the advertising was basically a Django app that was closed source, but got built
into Read the Docs at compile time. So there's sort of these private extensions in Read the Docs
that are in a private repo. The Read the Docs, the main Read the Docs repo is all public, but we have
a couple of private extensions repos. One is for the things that commercial hosting gives you,
but some other things, we have some other just closed source that are closed source for
a variety of reasons. And advertising was just one of those. It was just in a different repo.
It was just one Django app and it was closed source. And so if you looked at the first commit
of the ad server, it's probably just mostly taking a bunch of stuff from this private
it, read the docs repo, and bring it into here.
But yeah, a bunch of things were sort of renamed.
It was still pretty iterative.
Yeah.
I saw, I think, yeah, the first commit was just, you know, first commit, nothing. And then the
second commit was import ad server. But then, I mean, I could see you started with basic auth,
then at some point you added Django all auth, you know, kind of all the, to me, standard steps that
not every Django developer gets to do because often you just parachute into existing projects
and you don't have that flexibility
and you don't do it a lot from scratch.
So I always love seeing actually how
like a production site is done iteratively
because that's not an experience many people have.
Yeah, it's actually sort of a crazy experience
because I got dropped into this project
and we were basically like,
we're gonna break our ad server out from Read the Docs
because previously it was just a Django app in Read the Docs.
And, you know, it does a very large amount of traffic,
something like, you know, 30 million ads a month.
Most of those are not paid ads, but we're talking about 30 million API requests a month.
It's kind of expensive.
And then there might be additional requests on top of that.
So when we broke out the ad server, we had to build something that on day one is going to handle 30 requests a second sustained, 24 hours a day.
No pressure.
No pressure.
And yes, the first time we tried to stand it up, it absolutely fell.
Okay. What does 30 requests a second sustained look like in terms of infrastructure, in terms of the, you know, what are you running? What's, you know, how many workers are you running? How many, you know, is it using a threaded model or pre-fork? You're using Garnacorn, you're using UWSGI. How do you serve that much traffic?
It is pre-fork. We are using Gun and Corn. And it is, I think we're looking at four, it's either four or six workers. I know I've tweaked this setting, so I don't remember what it is. I could check for you. But it's either four or six or eight workers per sort of instance. And we're running, I think we're currently six instances.
Okay.
So that's about where it is. And that handles it fairly well. And right now we're hosted on Azure. We started out, we were prototyping on Heroku, but now for production, we're on Azure. That's where Read the Docs is. So we decided we'd just be on the same infrastructure.
It is set up slightly differently than how Read the Docs is set up.
Read the Docs is also on Azure, but it's basically just using like base VMs and what are called scaling sets where you can just sort of scale the number of identical VMs that you have.
So that's how Read the Docs is set up.
So Read the Docs actually auto scales.
But that's super interesting because, you know, a lot of people, you know, you'd have no idea what it takes to run a site at bigger scale, right?
You build your little thing locally, okay, fine.
You put up a worker, fine, you run it out.
How do I plan in advance if I want to grow to that kind of traffic?
Well, you need to think, okay, you're going to need half a dozen servers,
you're going to need this kind of infrastructure.
So it's really nice that you can come on and share that kind of information.
And it's much easier to build something that will handle that kind of infrastructure
than it was 10 years ago.
10 years ago, it would have been much harder.
Yeah, no, I mean, this is the cloud thing, right?
If I need six servers, I just get six servers.
It's not...
Yeah.
Click a few buttons in the AWS dashboard or whatever,
you know, the Azure dashboard.
Just drag the slider to the right,
and, you know, my bill also scales linearly.
Okay, so you're running the ad server
as a kind of a massive service on the side.
Yeah, yeah, yeah.
I mean, the big reason why we wanted to push it out of Read the Docs is that we had sort of this vision, which just sort of started to happen a couple months ago, of basically taking the ads that we've served for Read the Docs and making it so that we could help other projects, you know, other tech projects, other sort of similar places like Read the Docs that need to earn some money.
How do we help them run ads?
And we didn't want them hitting sort of readthedocs.org API endpoints.
so we sort of said hey we'll break this ad server out it'll be sort of its own thing yes it's part
of read the docs yes read the docs is like the primary user of this service but and we wanted
to break it out so it was separate infrastructure all that kind of stuff right and so this is where
it gets really exciting because it's ethical ads right yeah so i so i'm i'm putting up a site and
i'm thinking to myself oh i need to make some money but i can't bring myself to put the facebook
tracking pixels in and the Google tracking pixels in because I just can't bear to be part of the
massive surveillance capitalism world that we live in. And there's an alternative. So tell us
about the ethical side of it and why I might choose ethical. Right. So this is exactly what
I emailed Eric about. It was probably about three years ago now, almost exactly. And I had some
questions about this. How does it work? How does it work relative to something? Because I had some
familiarity with my last job. I wasn't in the marketing department in my last job. I was the
head of development. But the marketing team would come to me all the time. They needed help. And I
was sort of the liaison with the marketing department. I helped the marketing department
anytime they needed any tech stuff. And I can remember sort of like helping them set up
advertising. And I was basically horrified at everything that they were doing. I have a bit
of a background in security. Some of that extends into privacy. And basically, I was horrified.
know, sort of standard procedure in this world is take your customer list, upload it to Facebook
and tell them, Facebook, I would like to advertise to people similar to these people. That is like,
that is standard procedure in advertising, you know, for anything really, not just SaaS companies,
that's sort of anything. So that's sort of like, I was horrified at that. And so I talked to Eric
about this, what do we do differently? And there's a few different things that we do.
You know, one, we basically, as much as possible, don't set cookies.
There are some cookies that are like borderline unavoidable, like the Cloudflare cookie, if you want.
So if you want your site to be protected by Cloudflare, they sort of set a cookie and some things are sort of unavoidable.
But as much as possible, we try basically none of that.
And there's a few other aspects.
we try to like align ourselves with the site owners, not sort of against the advertisers,
but like advertisers are constantly pushing you to put more tracking. And since with read the
docs, we were sort of the publisher and we heard very much from our, from our users, like read the
docs, regular visitors, they don't want tracking. They don't want cookies. Even when I started to
read the docs, I would probably get an email a week about something privacy or security related.
You're running Google analytics. We hate that. You're, you know, whatever, something like this.
So I would get all these emails. So we were basically like, no cookies,
try to align with the site owners. Don't run any resources, nothing at all from the advertisers.
So not just scripts. You can't, you have to take the images from the advertiser and host them
yourselves. Otherwise they'll cookie your users. Um, yeah. Yeah. So just all these sorts of things
you, you start to realize like all the, I mean, cookies by themselves are, there's nothing wrong
with cookies but like you can do a lot of bad things with cookies if you if you really want to
so that's yeah all of this is it's just it it's a shady industry there's good players in this
industry there's bad players in this industry but like we we try to be one of the good guys
there you are there's can't ask for more than that i mean it's read the doc so you're dealing
with people who understand some of the implications of it um yeah you know not all developers care
about their privacy, but enough do that it's a meaningful differentiator for you.
We heard from them that they do. So, you know, a lot. I would get literally an email a week,
you know, something related to this, you know, either, you know, turn off Google Analytics or,
you know, respect do not track or, you know, something. So we took a lot of the steps there
around do not track. I don't know if you're familiar with it. It's sort of like a pseudo
standard. Um, it, it, I, it's not a real standard or rather there are real standards around it,
but like, there's no, there's no teeth around it. You can say, yeah, I support do not track.
I tell you that you're being tracked. Boom. Support. Yeah. Sounds like something, uh,
corporations would create. Absolutely. But, um, you know, the EFF sort of has their own idea of
what, of what do not track stands for. And, and there's, they sort of, if you, if you subscribe
to that, there's sort of real things that you're supposed to do, you know, keep server logs,
no more than 10 days if they contain an IP address, you know, things like this. Read the
Docs, actually, we did not originally support this, but we've moved and now we're in compliance.
With the EFF?
Yeah, with this sort of pseudo standard that the EFF has put out for Do Not Track,
both on the Read the Docs side and on the advertising side.
Yeah. Well, I mean, we can...
Sorry, go ahead, Carlton.
Well, what I'd like to ask is a kind of, if I'm a site developer, and can I ask you for your
expertise here i want to do some i don't want to put google analytics on my site and i'm not
i'm picking google analytics is the big one because of i'm concerned about these privacy
issues and i don't want to i just don't want to be part of that but i would like some analytics
i'd like to be able to you know like and i can grep my logs i guess and get some idea of how
busy my website is but what would you recommend if i'm a small um small website developer how
what can I put on that enables me to do some analytics, but ethically?
It's hard. I'm going to be 100% honest with you. It's very hard. There's this newer startup called
Plausible Analytics, and that's exactly what they bill themselves as. And some people will say,
well, that's all marketing, but it partially is and it partially isn't. They're doing some
good stuff there. So I don't want to speak negatively about them. They're doing really
good things. And having more alternatives to Google Analytics is a good thing. Not cooking
users is a good thing. People who say, oh, but just grep the server logs. Those people,
in my opinion, have not run a real business. They don't really understand what they're talking
about. You get so little from that compared with what you can get from JavaScript-based analytics.
Even if a number of users are blocking it, you still get just so much. You get things like
the time on the page, you get whether they scrolled or not. You can attach actions to
specific things like, did somebody click this button? Maybe that button doesn't trigger another
page view or something like that. You can get all this additional data. It's much easier to filter
out bots. Read the Docs traffic is like half bots. Some of those are malicious, but most of them are
just not. They're just search indexes indexing Read the Docs. And so, you know, we would have
so little data. We actually do use Google Analytics on Read the Docs still. We're always
sort of evaluating a few alternatives here. We've thought about maybe running it server-side,
where you actually hit a Read the Docs endpoint, and we basically strip out a bunch of stuff and
then send it off to Google Analytics from the server-side. And you could do things like drop
IP addresses, anonymize user
agents. You'd not have any
cookies for users, so
you wouldn't have any Google Analytics cookies for users
so Google wouldn't be able to tell
hey, this is exactly this person. So
these are advantages, but on a site like Read the Docs
you're talking about 45 million
additional server-side
API requests that then have to hit Google
while you wait for a response,
right? So there's real
drawbacks to this. So you've got to really care about your privacy.
Yeah, and you could host your own solution
but, you know,
we're talking about a service that's 45 million requests a month. Standing up something that's
going to run analytics on that is hard. It will be expensive. It will cost many hundreds of dollars
a month just in infrastructure. And we tried it a little bit and many of these solutions fell over
at that scale. So there's drawbacks. It's hard. It feels like to me, just broadly with advertising,
there's even more than usual there's just this divide between if you have a little bit of money
you can not see any ads and if you don't you're just gonna get bombarded i mean like youtube right
the last two years, YouTube went from almost no ads to now two ads on the start, ads every three
minutes. And probably like a lot of people, I don't watch regular TV really ever. And when I
do, it's for live sporting. And it's just awful because of the ads, right? Everything is a
streaming service. And which ties back to, so YouTube, they have this premium thing and they
have a free month trial and I've been using it. And it's just like, I'm almost tempted to spend
you know 12 bucks a month just to not have the ads so i guess which you know it was just to speak
if there was some global like chrome ad thing where i just never saw an ad on a chrome site
um you know that would be appealing but that's because i have some capital and i value that
time and i guess totally but i don't like that about the world i feel that's that's that shouldn't
shouldn't be the case i mean that's like capitalism to the whatever power put yourself
though into the advertisers mindset like the people who are willing to pay ten dollars a month
or twelve dollars a month or whatever it is to not see ads unfortunately those are also the most
valuable people to advertise to that there are so you can't charge a market rate for this you have
to actually charge way over a market rate for something like this uh because like the people
who will pay 12 a month to not see ads are worth more than 12 a month in advertising that's a good
point yeah it's it's it's a bad thing like i i don't know exactly like what the solution here is
i've had on my list i've had on my list to talk about a solution i've had on my list to try out
this pie hole thing where you you set up a raspberry pi to on your local network and you you set it up
as the dns resolver for your whole local for your router so that all traffic goes via it and it's
got a blacklist and it just won't load any ads or anything and i hear glowing reviews i haven't had
the you know day and a half this is your man of the people uh insight carlton well a local solution
it may not be you know it doesn't solve the you know i know a lot of people who run them so you
know they they work really well you will run into some issues where like some sites are just going
to not work and you're going to have to basically somebody's going to send you a link you're going
to see oh this doesn't resolve at all you're gonna have to log into your your pie hole and
fix something and check it out again but like you will not see ads essentially at all anywhere
and again like the advertising companies are probably unhappy about this but like
again those are the people who are the most valuable to advertise to yeah we at read the
docs we've sort of like decided you know ad blockers there's nothing you can really do about
it. People are going to ad block. You can try to sort of maybe urge them not to. You could try to
diversify your revenue stream. And that's essentially the tactic that we've taken.
You can think about this the other way too, is so many people are blocking ads. And yes,
as a result of that, the ad industry has gotten, they've had to get more intrusive, more ads,
bigger ads, et cetera. So that's sort of one response to this. But the other side of this is
like people are essentially boycotting advertising. Like, you know, it's a very wide boycott. A lot
of people, they hate it. And for a few different reasons, I think tracking is part of that
problem. I think just more intrusive ads is part of that problem. Although let me tell you,
I remember what ads were like on the internet 15 years ago. They were terrible, you know,
pop under ads, pop up ads, like all those things, you know, interstitials. I mean,
those are still there those are still there but like you know pop under ads are are literally the
bane of everything i still get probably an email a week from at the read the docs advertising email
to be like you're losing money by not running like the worst ads the internet possibly has
and and they're certainly right like could we make more money by selling out our user base
yes true yeah we could but you always have to remember you always have to remember that at
the end, your heart will be weighed against a feather, you know? Oh yeah. Yeah. So we don't do
that, but we, you know, I get an email a week probably about it. It's also that, that short
term, long-term thing where you can always, if you do any sort of analytics, it's always going
to show that you should do it because you can't measure the long-term, you can't measure the
subjective brand quality. I mean, even for someone who has ads on their site, like I was at Quizlet
top 100 website we had a new google rep every six months you know another 23 year old and every time
i was like is there actually something useful you can tell us and you know i don't blame the
rep but all they could say is a bigger ad more prominently up there and the problem is once you
put it on there you get addicted to the revenue from it and you can't take it off and then your
site looks like facebook looks like now yeah so it's a it's um it's i i admire sites that don't
have it i mean yeah i know i i understand how it happened yeah i mean you know if we put a second
ad on read the docs for example like that would probably double the revenue right right like it
and and the revenue is good but like we could probably double it by just putting a second ad
on there but we don't want to do that you know like there are drawbacks to this and you hit it
on the head like you got to think long term versus short term but i think users can't articulate that
right like sometimes people will tell me with my personal site they love the design and really what
they're saying is they love no ads yeah like i don't fool myself i'm like is it the design i
think it's that it's just there's no ads on it or there's very small ads on it but carlton you
were gonna you had a point well i was just gonna i was gonna perhaps segue back to the um the
django setup of the ad yeah yeah yeah server um because you know you must have some interesting
stories there perhaps we can go through the third party apps that you sure yeah i think i i i went
through them to re-familiarize myself with them uh for this because you know like some of them
you don't you don't work with like every single day so you don't even remember you're like what
was that for but uh yeah okay so here's one i've seen i see on the list we've got in the show notes
here is uh you've got django rate limit yeah yeah so that's a brilliant app and we should talk about
that, because I think Django doesn't come with rate limiting built into it. I know DRF has got
rate limiting on the API views, but tell us about Django rate limit. Yeah, we actually use it for
probably something different than what a lot of people use it for. A lot of people probably use
it for things like rate limiting logins or something like that. We actually use it for
rate limiting advertising. There's sort of a maximum amount of ads that anybody can click on
in any sort of amount of time. There's a maximum amount of ads that, you know, a real person could
view in a reasonable amount of time. And we sort of use it for those kinds of features. So it's,
for us, it's kind of a security feature or, you know, an ad fraud feature. Ad fraud is real. Like,
you know, I spend more than my fair share of time dealing with it. And, you know, like,
it's one of the easy things when you're just Google or Facebook, you just handle it at huge
scale. But for somebody like us, it's hard. And especially when all the advertising was run on
Read the Docs, and we have sort of an incentive to report things correctly. But when we now have
sort of these third-party publishers who were running ads on their site, that's only started,
I think I mentioned a couple months ago, and they sort of are getting paid out on this,
they have different incentives. And their incentives don't necessarily align with ours.
and we have to make sure that things are legitimate.
Yeah, okay.
You know, we basically have advertising for developers.
Developers know very well how to automate
clicking on ads or viewing ads or whatever, you know, and...
Well, yeah, you know, I've just learned AsyncIO
so now I can click on that ad, you know, concurrently.
Absolutely.
Hundreds of times a second.
well that's how you yeah i've been i've you know i've used some version of that to like we've had
to log in to like go to the local beach here and i'm just like i i almost wanted to just set up a
thing you know to slam the site i didn't know i think too too busy with kids but it's like it's
right there for you you know you can script kitty it and just it works i think i think uh what was
it django rate limit i think made the it made like the newsletter the django newsletter oh that thing
recently oh yeah i think so i think it was on there pretty recently but yeah so we use it actually
for something a little differently than that uh all off sort of has its own rate limiting sort of
built in so we we use that for authentication but that that's not as big of and we use that both on
our ad server and on read the docs all off that is but yeah rate limit is specifically for sort
of manually rate limiting advertising as a as an ad fraud feature can i ask so there's a number of
So you're an international platform, and there's a number of packages around, you know, country-specific things.
I guess broadly, can you talk about the challenges of moving beyond just being U.S.-based and a global supportive Django project?
Because that's a whole other thing.
Got it.
Yeah, it's hard, and I actually almost don't want to talk about it too much because I don't think that Read the Docs spends, like, an appropriate amount of effort there.
You'd be surprised, but something like 92% of all documentation on Read the Docs is just English.
What percentage is the U.S.?
Oh, you mean traffic-wise.
So traffic is totally different.
Yeah, there's internationalization of language and then there's traffic.
Yeah, so traffic-wise, it's about a quarter North America, U.S. and Canada.
So like 21%, 22% U.S. is sort of the U.S. percentage.
But like in terms of language, like written spoken language, it's 92% English and virtually all of the remainder is Chinese.
So everything else is a rounding error, sub 1%.
But like the support for multi-language sites, right?
If my docs are translated, I can serve them in English and in German and in French.
But the effort to do that is just monumental.
We do it on Django, but we have like literally a whole team of people who, you know, and volunteers who do the translation on each language.
And it's just the amount of effort for a solo, you know, for a smaller project like, you know, my Django filter.
There's no way I could translate the docs for Django filter.
It just couldn't happen.
Yeah.
And this is actually an area.
So Anthony, this is an area of interest of him.
One of my coworkers at Read the Docs.
He really, he like, this is an area that's definitely one of his interests.
And he has a few ideas here, like integrating with some of these third-party translation services, like Transfx or something like that, where you can maybe, you know, Sphinx supports something similar to what Django supports, where you have these sort of .po files that you can upload to a service and hopefully get a translation for them and then serve them.
It is a little tricky to set up a multi-language project on Read the Docs, but it is possible.
We have a few projects doing it.
Probably the biggest one is Godot, which is like a game engine, a C++ game engine.
They're a huge project on Read the Docs, totally not in the Python community.
Until you start looking at how does Read the Docs make money and what are our biggest traffic projects,
I would have not been familiar with them at all because, you know, it's not in the Python
community. It's not something that I work with on a daily basis, but they have translations for
probably a dozen languages for their documentation. Okay. They are also a very, they are very
expensive to host because their documentation builds take like 20 minutes each. And so it's
like, oh, we committed to the main repository. Now we have to build documentation for 20 languages.
Well, that's one thing I wanted to ask is how clever is the caching on the docs rebuild?
Like, do you kind of check sums up front on the docs folder or that kind of thing to say,
hey, no, do you know what, whilst there's been a commit, nothing's changed, we don't
have to rebuild?
We used to and we removed most of them.
And the reason why is because it's actually extremely hard to do it correctly, because
a lot of times Sphinx projects will use some auto API or something like this that's
referencing code.
So it's actually building,
it's building docs from code directly.
And so it might be a change that isn't in the docs
directory, but because of a change in the code directory,
it affects the output of the docs.
dock so we we just decided at some point this is too hard we'll just waste resources and have our
builds be correct so you actually you endanger using more energy trying to guess whether you
should rebuild them whether you would you know actually yeah yeah it ended up being just sort
of like a problem that we decided was was too hard um to solve and interestingly enough i think a lot
of the CI services went the same route. So we, we sort of model a lot of what we do off of some of
these CI services like Travis or, or CircleCI. And that's, that's what they're doing. You know,
they're, they're not trying to get clever and say, oh, well, we think the tests are going to pass
because you didn't change anything over here. They just rerun it. Wow. Okay. Okay. Interesting.
I really want to ask about Stripe, but maybe that just, it works brilliantly for you and the new
api has been no big deal switching over um we're actually not using most of the new apis we're
actually using it to pay out publishers it's sort of a little bit different connect or not so not
yeah connect connect yeah yeah so basically publishers can sign up for a stripe account
we don't get their bank account information but we can basically transfer money to them so that
that's what we're using it for probably most of our payouts are still paypal though and we don't
have like an a fully automated solution there yet this is one of the things when you stand up your
own advertising network over like a couple months lots of things are not automated yet well even um
carbon ads is a paypal i thought it would be a little more advanced but it's not i think a lot
of the new stripe stuff too is around um subscription and um blanking on the european
law but there's all sorts of things around that which is so it's less so for a sort of one-offs
though stripe is finally adding i think i think they just added in like tax support because for
a normal like someone my size if i added stripe it's like to collect tax per state per country is
impossible to do even if i tried to which i have it's just like well i guess it's a no-go
but stripe is slowly rolling up all these things around analytics you know around tax jar and all
these other third-party things you can sub in to do taxes appropriately but we're actually not
using it for subscriptions which i think is like a lot of what yeah i guess some of the new stuff is
yeah that causes a lot yeah we use it mostly for you know sending invoices to advertisers and paying
out publishers those are sort of our main things and the the stripe connect stuff has worked fairly
well although there's sort of like a beta for like if you want if you have your balance in stripe in
u.s dollars and you want to pay somebody whose balance is not in u.s dollars or actually even
just not in the U S it's a problem. Um, Stripe has beta support for it. So we are like applied
to join the beta. So right now we can only pay out publishers via Stripe in the U S, but maybe
that'll change in a month. Everybody else has to use PayPal or something else, or they have to give
us their bank information, which is as much as possible. We want to not do that. Well, that's
like my mom there was something not to pick on my mother but as an example of a you know she
wanted to write a check as opposed to enter her credit card and i was like you know your check
has your routing and your account number on it right like you know that and your address yeah
checks are way less secure oh my god they're amazingly insecure it's just oh my god handing
that to a stranger when the things i could do if someone gave me a check but anyways so uh while
Well, DjangoCon US has been in San Diego the last two years, will be again next year.
Can you just briefly talk about, you wrote the bid for that, right?
I wrote the bid for that.
So I'll combine this with sort of the San Diego Python stuff.
So we have a couple, there's a couple of people who are maybe bigger, they have a bigger sort
of maybe name in the Python community than I do.
I really focus my efforts locally.
I'm not big on social media.
I have very little social media presence.
But Trey Hunter, who's one of the other sort of San Diego Python organizers, he was basically like, you, you should, David, you should write the bid for DjangoCon US to come to San Diego.
I was like, all right, fine, Trey, I'll do this.
And so I wrote it.
It was pretty convenient, actually, because at the time I was working in this office in downtown San Diego.
And actually the San Diego Tourism Authority, which is like this quasi government, well, actually they are governmental.
They're part of the local government.
They were in the same building.
So I just sort of stopped by their office and was like, I have this conference, you know, it's going to, it's, it's this big, it's going to probably bring in this many dollars to San Diego. I need help writing a bid. I wrote the whole bid, but like they pointed me in the right direction so much, you know, Django, Django con was basically like, you know, here's, here's like our target budgets for, for hotels. Here's like our target budgets for some other things.
And I brought that to the San Diego Tourism Authority and they're like, this is perfect.
These whole hotels in this area, you can't even talk to them.
They helped me so much because I would have spent so much time talking to the wrong hotels that are just double the budget and stuff like that.
So, yeah, it helped a lot.
I ended up writing the bid.
I think there were a couple other bids, but we got accepted.
And I'm sure it's probably more expensive than some of the other places they've hosted, but it's certainly not Bay Area prices or anything like that.
No, for sure.
mean, San Diego is a lovely place to visit. I mean, I'm often, I've only recently come to
appreciate all the work that Defna and you, Jeff Triplett and others there do to organize these
conferences. Cause I mean, DjangoCon Europe just happened and it's so much work to do a conference
and it's, um, you know, very much in the, you know, not, not publicly seen. So it's interesting
just to hear about what it takes to put that on. Yeah. The bid is relatively minor compared to like
the operations and all the other things so really like jeff triplet and and i don't want to just
call that one person you know that whole team yeah is really you know they they're what make
django con a success django con us anyway um so i my my contribution is like so minor by comparison
but yeah so san diego python getting back to that yeah you know i'm i'm sort of one of the
co-organizers. I sort of took over as probably like the main organizer in maybe 2012. The person
who was the organizer before moved to the Bay Area, which is a common problem in San Diego for
organizers. You know, they just sort of like make a name for themselves or, you know, take the next
step in their career. And the logical next step is move to the Bay Area and make way more money.
So I was the person who, I'm never leaving San Diego. I love it here. It's great. So I was sort
of the person who ran it by default because I was not moving away. But now we have this great team,
you know, like that, that's one of the big, one of my big successes, in my opinion, is that
now there's a bunch of other organizers. And when I need a month off or like something else like
that, Django or San Diego Python continues without me. My, my, my wife and I had a daughter
four years ago and I sort of didn't do any San Diego Python stuff for a year and it worked.
like people still went and there were still talks and you know that's sort of a that's sort of the
other the other organizers really made that happen so i'd say that's my biggest success is that the
group will continue and i'm gone well i mean in any organization i mean here in um boston there's
there's a django boston group which i think is the third largest by members um after we modeled
ours after that one oh okay yeah yeah so it's um uh john baldwin and i'm i'm sorry i'm blanking on
the other. Um, but it's, you know, it's, it's one or two people who do, you know, all the work and
it's quite a lot of work. Um, but it's such a great, uh, attribute for the community. Um, and
even, I guess, even as big as Boston is, which is a pretty big developer community, you know,
we've had, we used to have 80, you know, 50 to 80 people. Um, the Python meetings of course are
like three, 400, but, um, it's a great, that's way bigger than us, but yeah. Okay. Well, well,
Well, but I would say, again, mindful of time, but in Boston, web is not that big a thing.
It's so much more data science or even hardware stuff.
There's not as much web stuff, whereas when I was in the Bay Area, it was very much consumer web kind of things.
There's much more of a, I don't know, PhD-level programming, and the web is sort of this thing you sprinkle on top to deploy it to customers.
sure carlton what's the what's the scene on the coast of spain well where i am there's there's me
um there's a few little local web dev shops um there's not much here they're down in barcelona
there's a good amount there's a python barcelona meet up there which you know is good i'm really
bad because i've got so many children i just don't go i'm like yeah no i could i could take
out this massive chunk of my life to go down to barcelona and then hang out till quite late at
night and then drive home would be totally i can't do that so i don't go down very often
um but they're a great crowd and you know it's active and they still keep going now i think
they've got a an online round table tomorrow discussing you know the changes in the remote
working and all the rest because they brought in new laws in spain yet like yesterday about remote
working and how that's going to affect everything so it's you know it's a really active community i
mean barcelona's yeah there's quite a good tech scene there i would love to ask you more questions
but i think we're we're close to time um we're gonna have links to the the code i definitely
recommend people take a look at the ethical ad server especially if you look at the models.py
file that alone is one of the rare readable production level code snippets i've seen
well commented and everything else so i definitely recommend that ethicalads.io right that's the site
if people want to have ethical ads on their site yeah um anything else as we head out you want to
promoter shout out uh you know no just i think san diego python is sort of like that is one of
my biggest successes i feel so i'm i'm happy that we talked about that i'm happy we were able to fit
that in read the docs is fantastic i love them too and and you know i'm really happy i mostly
i read the docs work on the advertising side so um you know i i work sometimes on like the security
and privacy stuff on read the docs as well uh probably my big my biggest success there and
maybe i'll shout out to cloudflare thank you for that but like all the stuff where we now have
https on the thousands of custom domains for docs on read the docs that's all like courtesy of
cloudflare that would cost us like thousands of dollars a month it would probably that alone
would double our infrastructure budget if we were paying like retail price for that
so i'll thank them for that i mean i use cloudflare i feel like there should be a whole
course on Django plus Cloudflare
since it's so powerful.
Yeah.
Read the Docs is actually doing
some really cool stuff there,
but maybe there's not time for that.
We have this whole thing
where when there's a new docs build,
we're purging from the cache
just those docs
and all those sorts of things.
When we rolled that out,
docs, especially if you were browsing them
in Asia or Europe,
they got so much faster,
especially Asia
because it's far from our main data center.
Well, maybe we'll have to have you on again.
I mean, yeah, just before this,
I was manually purging some pages on LearnDjango
that I updated and I'm like,
yeah, I know I need to automate this,
but I just can't be bothered.
Yeah.
So I can't even imagine at the scale that you all are at.
Oh, yeah.
And yeah, I won't go into it
because I know we're at time.
But yeah, it's super cool.
I didn't work on that.
So I don't want to take credit for that.
That was mostly Eric Holscher.
But the long and short of it
is you don't go through the UI there
clicking perch perch page by page copy paste each url oh absolutely not no well thank you so much
for coming on david um we'll have links to everything in the show notes um really appreciate
it i've been and glad uh jeff jeff connected us yeah jeff triplet because um i was saying uh i
was complaining about ads and he was like you should use ethical ads and i was like what's that
and then he's like and there's someone there you can email i was like oh okay so here we are yeah
Yeah. Advertising, it's a crazy business. Lots of bad stuff, but we're trying to do good there
as much as one can. So as ever, we are at DjangoChat.com,
ChatDjango on Twitter, and we'll see you all at the next episode. Bye-bye.
Join us next time. Bye-bye.
Thank you.