Transcript: Web Security - Mackenzie Jackson
welcome to another episode of django chats podcast on the django web framework carlton
gibson joined us by will benson hello will hi carlton hello will today we've got mackenzie
jackson from git guardian with us hello mackenzie thank you for coming on the show
hey guys great to be here thanks for having me welcome welcome to mackenzie oh go on go on go
on go on no say what you're gonna say go i'll just say that i've i briefly forgot that this
is an audio only podcast and i was waving at the camera like a lunatic but but i'm realizing now
that i it's audio only so i'll stop making obscene gestures no no no it's good i can wave back and
then you know you'll laugh the audience won't know why so mckenzie come we always get this
which who are you tell us why you're on this podcast how did you find how do you find python
how did we meet how the house you know what's your backstory yeah i got a interesting yeah
interesting interesting backstory i i was actually um i started off life in my first
my first life as an architect like a building architect and i hated it and i spent most of
my job trying to automate automate it by learning to code um in some of those big systems that we
have called you know bim systems and um and then i was kind of figuring out like why don't i just
skip this architecture thing and do what i want to do which is write software so then i uh kind
of went down that path and had my own startup for a while called compago which is it's still
around today it's headquartered in australia i haven't been involved i was in there for about
four years as a cto um and then i i left i guess when the company got big enough that it needed a
real cto and and so i and then but the one thing that i loved about um uh coding a bit of a
subsection is that we were a care provider building technology and kind of healthcare
space which meant we had to comply with a lot of different areas like HIPAA compliance and other
things and going down that I really learned how vulnerable software was to lots of different
things and how to secure it and I got really into that so when I left I decided to kind of focus on
security and that's been pretty much my jam. I work now for a company called GetGuardian which
is a code security platform and uh one of the the coolest things about my job is i get to work with
some research teams we discover you know how attackers are kind of exploiting code uh we
create talks about that and then we get to go to some cool conferences which is where i meet cool
people uh like carlton because we met at pi con in italy um which was yeah i think probably one of
a pizza i believe i think we'll let some after party and you were the keynote and so i was trying
to hassle you to use your keynote powers to get more drinks than the one drink limit that was
allowed at the party if i remember correctly something like that anyway that's not suitable
for the show so let's let's move on um so attacks are supporting so i've you know as part of my
fellow role i joined the django security team and we get quite a lot of um reports for you know you
know every month we'll have reports going so you do a lot of that forensic work trying to find
exports or you did you have them or you work with teams that do yeah we kind of work with work with
teams that um within giga and also external that really look at um some large-scale research
projects into how attackers are operating um so particularly around exploiting secrets exploiting
credentials uh also exploiting like misconfigurations and one of the cool things that we really like to
get into is when an attack happens to try and recreate that um attack path to try and figure
out exactly you know not only like how did they get in but what tools did they use can we recreate
the the system and point out exactly what we meant vulnerable because the nature of security
vulnerability reporting is that if an application has a security flaw that gets exploited
then the blog post that inevitably comes out from the company is extremely limited and will you know
won't give too much information so it's kind of taking that and then trying to figure out
what what's behind it really well i mean so even with the um the issues that are on django we
we just kind of like we'll put a post we'll say look there's a dos vulnerability or a militant
it's vulnerable to maliciously crafted input but we won't necessarily say exactly how you
you know you do the attack but for one reason we don't sort of want to make it too easy for
the people who are attacking unpatched jangos so
i guess it's difficult to know how much you should say because i kind of also in the same
ref i kind of think an attack is not going to be put off by the lack of detail in the report
they're just going to work it out i don't know yeah i think exactly so the the there's there's
varying levels of thought behind us and you have to be responsible on how you disclose you know
information um and we certainly wouldn't just put out um kind of like a cheat sheet of you know how
to do it but at least finding out you know at at what point at what point where it was initial
access made um where did they where did they actually was it a phishing campaign how did the
whole thing start and unravel um now when you're talking about cves or like vulnerabilities and
dependencies and things like that then well it's a lot uh you have to be a lot more responsible
and how you disclose that because as you would know uh two years on there's still people running
vulnerable vulnerable packages and things so you you have to be a bit more careful but in terms of
recreating it and not necessarily just just showing the logic behind which the attack is used because
i think a lot of people don't understand attackers logic if that makes sense well one perhaps one
example you might be able to give is um we had um various issues fixed over time about enumeration
attacks where you know people are able to they'll make a request and it's perfectly
legitimate request but it somehow reveals something about the the application or the
ids or something like that and then they they make the next one and they make the next one
and by doing that kind of thing they're able to kind of get a size of the application or predict
ids or predict urls and in it in and of itself that's not really a vulnerability but perhaps
then they use that as part of a bigger attack and you know i don't know yeah exactly like
exploiting logic flaws on how to how to do how to do things um and you know like a lot and a lot of
a lot of kind of attacks happen because people use you know the wrong the wrong function so the
wrong you know you're generating a random number there's multiple ways to generate a random number
some are predictable using maths but so if you're trying to use something to create access via these
random numbers then you you know you you have to be sure to use the the right thing now that
doesn't mean that the un the insecure version is pointless it's it has its reason um it's just kind
of the logic behind which you're implementing it is it's kind of flawed and people can often like
figure figure that out i've had one more question that came up from what you said and i'll you know
I see Will's got a question or something.
But I kind of always say that you must update.
You must be on a secure version.
You mustn't use an end-of-life version of Django.
And I say that simply because I kind of think as soon as these reports are out in the open, within a short period of time, there's just kits you can download, which you can automatically run, which test every available export known.
And it's not a question of if your system is cracked.
It's just when it's cracked.
Is that fair?
is that a reasonable approach or am i being a bit over cautious in your opinion well i mean like
there's there's uh no i don't think you're being overly cautious i think that's like you absolutely
have to patch uh regularly and definitely patch anything that has vulnerabilities against it
against it like it's absolutely vital to be able to do that um and what you i think what we typically
find in companies that can't patch regularly is that they if you you should always have a system
and post it to patch regularly regardless whether or not there's some critical cve because if you're
in a habit of doing it regularly it becomes easier when you need to do it and it's critical to do it
right and i think a lot of people kind of will patch when something's critical but you know that
can you know that can cause all kinds of havoc and it becomes a scary thing to do and i think people
often shy away from it but you know patching regularly is uh it is absolutely fundamental
especially when there's a vulnerability out against it the argument the kind of against
patching regularly it's a bit of a weak argument but just to make it is that it takes about when
a new version comes out some often it can take about three months to figure out if it's vulnerable
to anything especially if it's like a big change in something so then you know the arguments is
do you patch immediately when a new version is out or do you kind of wait and i think the solution
is you know like patch regularly whenever there's a vulnerability no one against it and then stick
to a regular patching routine for everything else where uh it's consistent and you you know like
you're you're doing it but if there is a vulnerability people will be able to find it
they build scripts to be able to like exploit these automatically you know it's we're not talking
about uh we're not talking about the most sophisticated actors if there's a cve against it
like you've you've basically given the cheat codes to how to exploit your application so you should
definitely patch yeah right okay okay and i guess just that point about new versions i guess for
django say um you know 5.0 is about to come out and you might wait for 5.0.1 or 5.0.2 if you're
particularly worried you know about those first regressions but meanwhile 4.2 point whatever is
still being released with the security updates so you should definitely be getting that each month
yes yes yeah i mean absolutely if you're lucky enough to hopefully people like here are using
django you know using well-supported frameworks like that they should be on this show i tell you
on our podcast yeah because i mean well there's a lot of frameworks that aren't
you know that don't go through that they don't have the security team
going through it so yeah you you're already on the good step if you're using django
oh well thank you we'll put that on the pull out quote on the website
well since since you mentioned that i'll i'll throw this question to you because carlton and i were
not sure how to address it. So Flask, which is another big Python web framework, recently there
was some discussion in the community about something unrelated to security. But that's
an example of something widely used that, as far as I'm aware, does not have a formal security team,
for example. And so I'm curious, if you saw any of that, there was an issue with Flask login,
which is essentially a third-party package where it hasn't been maintained.
And so there's a new version of Flask and all of a sudden login was breaking for everyone.
And there were some comments about that and people thinking that Flask was this, you know, Microsoft or something
and had all the support when really it's like one person and a handful of volunteers maintaining.
So I guess more like I just want to address that because it's been out there.
But I'm curious, like where you sit, when you see web frameworks, you see that, right?
there's the whole gamut there's django and then there's many widely used frameworks that are
you know may not have robust security things i'm not saying putting that on flask but like
a lot of these projects may have lots of users and it's a you know handful of people doing it
i mean absolutely and it's always just like um way up between using like what may be cutting
edge like at one point django was like cutting edge right out there you were a trendsetter if
you're using it you know and so um a lot of security often doesn't come into the equation
until later on too you know when i when we first built the compago the startup that i was in we had
a dot net back end and a react front end why because that was like we had a dot net guy and
we had a react guy exactly like you know that's what we like that's what we had right so we're
just you know everyone's like why aren't you using node and react because we didn't have a node guy
where we had a .NET guy.
So when it comes to like these types of frameworks
and things like that, you know, like if you have the foresight
and I think, you know, people that have been around
for a while will be able to know and they understand it
and, you know, certain different frameworks
will do different things.
They may be more secure.
They may be faster.
They may be able to handle large quantities of data better.
So there's like lots of considerations
and often security is a forethought.
but you you really there's a couple of things to look for uh when choosing a framework one does it
have a long history of of being maintained and i don't necessarily mean like years and years and
years i mean you know is it constantly being maintained is there a community around it
because if if those are the cases and you should you know you can you can start to
tick off and feel a bit feel a bit better is there a security team for it well that's
not normal you know for for everything but that should definitely help you kind of make those
decisions so if you are in the fortunate position to be able to pick frameworks then
looking at the community looking at a team that maintains it because you know you will be surprised
you know what happens is the example of the um of ua parser uh this is i apologize for everyone
this is a node you know package that's okay there'll be a there'll be a there'll be a python
equivalent but you know ua parcel was uh just a package that let you know what operating system
your users were you know viewing your app on what it's something very simple it was used by
had 10 million weekly downloads um and it was maintained by one guy named for shell and uh his
np his node his mping account got hacked and then someone created a military version for it you know
this is because you know this is this one guy maintaining this thanks you know thanks thanks
and it would have passed everything,
but you have to also consider the team behind these,
not just, is it popular?
Yeah, and think, go on, so thinking about that,
I mean, PyPI have done an awful lot recently
about tightening up, just within the Python ecosystem,
about tightening up, you need to use two-factor auth now
if you're a popular project,
and they've got this trusted publishers thing,
what's that?
So in terms of, so Py has actually,
was an npn and also get like github is also forcing two-factor authentication because this
was pretty much one of the main ways attackers were kind of creating malicious uh applications
is that what would typically happen is that people would specialize in kind of phishing
uh these you know these maintainers getting into their accounts um and if there's no two-factor
authentication you know like with a well-structured phishing campaign isn't easy but it's certainly a
lot easier there's no 2fa to get into it and um these supply chain type of cat attacks can have
massive uh implementations you know because you're not just attacking this one system you know you're
potentially uh creating a vulnerability on millions of applications so one thing to remember
about attack is is that they operate on economics like a normal business does where there's a risk
reward like is the risk of me attacking this system going to outweigh the reward that i you
know that i could get potentially and when you're talking about systems you know like pi pi packages
that are being used by millions of different applications then you know the rent the reward
is massive potentially because you could install you know even a crypto miner that gets released
onto a million websites you know can create something so simple can create you know pretty
good profits um so it's it's it's really good that these uh that these package managers are
actually implementing you know more security implementations around this because it it like
2fa may sit not seem like a lot but you will really strip back the amount of account takeovers
from that it's just significantly harder because of that extra step because they've got somehow
get trick you into entering that as well at the same time yeah yeah and and exactly and then like
the step above that is the trusted publishers i think by pi pi which you know like are they
which is another set of requirements that pi pi you know puts on these publishers to make sure
that that doesn't happen um so you know making sure that there's not long-lived passwords
that's being accessed in there so that when you look at that it's just another tick box
um and when you're choosing packages you know just you don't need to spend hours on it it should be
easy to look at something and say oh this is secure and that's why something like the trusted
publishers is really powerful because you know you can quickly look at that and know that it
meets the required like the criteria of at least that minimum so you can move forward it's actually
quite powerful in terms of being able to quickly make decisions of what to introduce into your
project okay and is it going to be secure okay um i'm getting a little niggle inside though because
there's some of some of these metrics that you you sometimes get on projects they can be like
your did you tick this box did you tick that box and it's like some of them are are you using a
particular feature on github and it's a bit like well no i'm not we're not using that particularly
say for instance django django's got its own release program um release process it's got its
own security process it's got security archive it does its way of handling cves it's all you know
top quality really but it's not going through the github security advisories panel therefore we don't
get the tick in the box on that metric and you're sorry i sometimes get a little bit like oh i hate
these metrics but i i totally i can totally understand that but when you're the the rebut
that i would have to that is that i find that these metrics are more for smaller packages yeah
right you know be able to use them uh with confidence rather than you know something
that's massive you know that's that's trusted by people if it doesn't have the tech box then
you know like you you can you can still get it passed but in terms of like if i'm just looking
for i just need a package that's going to be able to do the small job can i trust putting this into
my production or not if it has the tick box then that's a good step you know but it's not the be
all and end all and no security will will be the be all and end all and i work for a vendor
and you know we have people that come up to us and just be like okay so here's my list of
requirements that i need to take for for my sock certification or for this or for that you know
like where does your product fit in oh you don't fit into this tick box oh i don't need it then
it's kind of like so they're not worried at all about getting hacked they're worried about like
compliance and i guess that's important but so i understand yes the how how good is the stand
where you're coming from how good is the box on the questionnaire is it you know because if the
box on the questionnaire isn't right it's no good at all okay yeah and how do you make a questionnaire
that fits everyone right you like that yeah it's the same questionnaire box for for such wildly
different applications well the the two-factor authentication is interesting because this is a
thing in Django with there is no built-in two-factor auth. There is a third-party package
that Jazz Band maintains, but maybe there's been separate discussion around auth in Django
recently. Carl's been advocating for some changes because there's a number of, I'll just say off the
top, there's a number of things, Django, if you were going to do it today, like there's first
name, last name is the default. Well, that doesn't fit a lot of the world's population, for example.
Also, it defaults to username, email, username and password.
Most people want email, but maybe I'll, Carlton, I just wound you up, go.
Well, like literally, so like the whole point about Django is a batteries included framework,
right?
It's meant to provide the batteries and, but it's not any old battery because like, for
instance, it used to provide comments, but comments isn't something you can't build yourself
or can't be maintained in the ecosystem.
And it's, it's not, it's not, it's a bit of a burden to maintain because there's so
many you know opinions about what it might have and so many different ways it could go so jango
contract.com comments was pulled out and it's a third-party package now but auth auth really is a
battery that jango has to provide because it's so central and it's so hard and if you get it wrong
the consequences are so bad that that's a battery jango should provide and so yeah we we've got good
auth we've got good central central but we don't have this two-factor bit yet and for me it's it's
kind of like that's a missing battery that would be really nice if we could do something that's a
you know one-time passwords or i don't know what the pass keys are the new things tell us about
pass keys and one-time passwords what are all these things mckenzie because there's a there's
a lot happening in the changing on authentication because your authentication remains kind of uh
like a like a big a big weak weak leak and especially our reliance on different things
like api keys and uh and that that are kind of just sprawling everywhere because they're handled
by so many different people so some of the things that people are trying to do to essentially remove
these points of the vulnerabilities is to create basically the same systems but only valid once
and created for the purpose of that session so you know like you have something like a dynamic api key
that's managed by a trust you know like a vault or something where the api key is created you then
use it and then you then destroy it at the end and it's only valid for one time and its lifetime
is a matter of minutes or at most yeah seconds yeah yeah or you know whatever however long that
that it that it takes right you know um and so these are kind of really and i think we can expect
um these to to really start taking over along with kind of rule-based or authentication that's
being implemented in lots of different ways because one of the problems that we also have
been facing is that when you're trying to manage you know pass keys and passwords and api keys and
all of that you know you can be tempted to create you know you need to do multiple different jobs
if i create one admin key to be able to do all those different jobs then i don't have to manage
all these different keys right but then if that that key becomes so sensitive so uh you know having
having role-based authentication where you know you restrict what what you know to the absolute
bare minimums and your authentication is created you know for the purpose that you're trying to do
with all the minimum permissions that you're trying to do and with infrastructure when we're
kind of getting into you know infrastructure as code and on all of these different systems and
secrets faults and we can actually tie them all together so that it works really you know really
nicely and i think that um we're not using these two to the full extent but they're becoming more
and more uh expected and certain things that i mean like we i guess using your analogy i guess
we can expect you know frameworks to start putting in these different batteries um you know as we go
go down well i think i think that's one thing that's sort of it's it's been discussed a few
times and it hasn't quite happened yet but it's like django hasn't really got a solution for
secrets handling in in place um so you start off you get a you get a settings file and in there
the real secret is your your database password and your um your your this secret key which is
used for um signing and and whatnot so it's important that you you don't commit those to
git and we can talk about git guardian in a minute and and whatnot but the first part has
always been stick those in a setting in an environment variable and kind of what would
be what would be nice i think for django to have is a kind of um a pluggable um interface around
that so okay if you're using mvers you get it from there but then there's all these other mechanisms
like you know vaults and secret managers and things that we we could sort of swap out the
back end and you could be using those as well i think it'd be nice to have something like that
in the django space yeah yeah for sure i mean it's uh secrets management is going to persist to be a
problem that we're going that we have to kind of uh deal with and i mean people may have it's you
know it's funny one of the one of the common passwords that guardian detects once later you
know is the django is the django secret keys because often people get excited they created
their first django project they get at all and commit to to get and then all of a sudden they've
you know released this this secret django key now if it's probably not that interesting and to it to
an attacker at least if it's your first project and you're kind of having a play you know but
systems don't know that so uh they alert on it um but i kind of feel like that's a good process
because if you leak something on github you're going to get an email about it and then you kind
are forced to learn how to securely do it from the start.
And, you know, when you're talking about environment variables
and .env files, I mean, and vaults and secret managers,
I mean, there's huge arguments about what to use
and when to use them.
And I think I differ from most of the security community
from what I think.
No, cool.
Well, tell us what you think because this is one reason we get stuck
is that people say oh we want this but then there's a disagreement but what about that and
we can't quite agree on what we should have so we don't do have any anything it's like we're not
yeah so so a funny so a funny story about this is that at pycon italy where i met carlton i had my
talk um and you know my talk was on how to securely manage secrets in python projects and
you know one of i mentioned multiple ways to do it but one of the ways and the way that i like
particularly like is using environment variables and dot env files the talk before me i don't know
if this was like organized by the plan by the organizers but the talk before me the entire talk
was about why you shouldn't use environment variables for it so and so i i like environment
variables but i want to start off the bat and say that they're not the most secure thing to use them
so if you if you ask a security person you say how should i manage my secrets then the official
answer is you should use something like a vault or a secrets manager that's dedicated to that
that's a server it's going to create dynamic secrets so just in time you can connect to that
to you know to to authenticate your developers so that they have access to secrets or their apps
have access to secrets that no developers you know and it becomes like this heavily complicated thing
and what will happen most of the time is that they will interact with that once i go this is
such a pain in the ass to interact with the system and then they create secrets.txt on their desktop
and they store all their api keys there because then they don't have to deal with this heavy
system that some security person spent a whole year implementing that has you know 400 pages
of docs to go through of how to correctly use it and and that creates another problem and then
that's why secrets end up in your history because you you've been told okay i need you to create
this feature and you need to connect to some kind of data bucket to do it so you to start off with
you just hard code the secrets because to do it properly such a pain yeah that you will you know
that it's a pain but don't worry because by the time code review comes around you'll have removed
that not knowing that that secret is now in your git history but no one's seen it so no one actually
knows that it's there and that creates like a a big problem i know this is getting very long
no no that's why gone but that's brilliant though because you've got an intermediate commit that
doesn't appear in the pull request for you but it's still got the secret in it but you just
exactly yeah yeah exactly because you know like because you know like you because you just you've
you're under time pressure you're trying to use it quickly and people don't understand that and
that's why uh an attacker if they make it into your git repository and they'll fish there's lots
of ways for them to do this um they'll scan your history now the top layer of gets probably not
going to have any secrets in it um like by what i mean by the top layers is kind of like what's
on the main branch you know what's in the the latest commits on on everything but when you go
deeper you're going to find all these secrets that have been added and removed from people because
dealing with these heavy systems is is a nightmare now does that mean you shouldn't use the heavy
systems i i really think it it you know like it's gonna it depends like everything but why i like
environment variables is because for most people that's an adequate you know solution and it's easy
enough to secure you create a dot env file in your repository you create a dot get ignore file
and that's going to solve a lot of your problems the argument against that is that there are ways
to dump out your environment if you're kind of if your infrastructure
your server or you know yeah your other your has been your operating systems have been compromised
then if the first thing any attack is going to do is type in env and dump out the environment
variables from a running application and see what's and what's it that that is 100 going to
be the first thing that they're going to do and so the argument against it is that if you're using
environment variables you've just created a nice package for every you know for the attackers but
my argument is if an attack is made at that far like let's face it like you're not in a good
position anyway like i can i have access to your ram i can find secrets in other ways maybe it's
not wrapped up but let's not pretend that like the env file was the problem here like like you've got
bigger problems that you need to deal with yeah right by the time they're on your server you're
in trouble yeah so like my kind of thinking behind this is like look it is great to it is great to
have these heavy systems in there and i think they have their place um but you have to understand
like are you mature enough to effectively use them is that does the team have enough training
you know around around that because um when you get to a large enough organization you could segment
people out so that you know small small number of people have access to these machines they know how
to use it these systems um then that's fantastic if you're a startup of 10 people you know environment
variable files are great it will put them in local memory and at least you're going to you know
prevent them being exposed in other ways like on get so you know that's that's yeah my my rant over
because i mean it's yeah yeah there's there's lots of things but i on my personal opinion is that
it's not the most secure way in the world to do it but it's so much better and easier and it
reduces the friction which is part of the problem with security i was i wanted to mention like i put
my old man hat on when i worked at startups in san francisco github was like down the street and
back in the day you could just search for anything so you could search you know we as a like game we
like get aws keys you could get stripe keys because they were brand new you know so wild wild west
like you just hey the search is powerful boom there it is and like you could literally see it
for every company
because there was no automated,
there was no hiding it,
there was no email,
there was none of this stuff.
It was just like, yep, search, search it all.
So that was sort of a fun game we would do.
It's still basically like that.
It's a little better, but not.
If anyone's listening,
if you go to api.github.com forward slash events,
then what that will take you to
is the GitHub events API.
everything happening in real time oh yeah you will get rate limited but you can create lots of tokens
and to the first like you don't even need authentication for the first time
like you can do it in xornita that will still come up in that you have all the commits that
are happening but you also have the the email addresses from people they're locally configured
get email address so if you're interested in targeting like a specific company you can just
filter out of domains for people that are committing with a i don't know pick a company
at twilio.com email address
and find their personal GitHub IDs
and start scanning all their stuff.
It's still, the search feature is a little bit harder.
Dawking's got harder for sure.
But in terms of finding keys,
we found 10 million secrets on GitHub
in public repositories last year.
And 2 million of them were for cloud provider keys.
so like it's a which we're all valid because we validate them those particular keys we validate
so like it's like it's bloody wild what you will find publicly but i will say github out there
implementing other things that make it better along with like some companies okay so you've
said we then you've mentioned git garden so let's get going okay tell us so what is git garden
you know can i be should i be using this if i'm a django developer is this this helpful to me
of course of course you're not using it carlton oh man wait i know i never i never commit secrets
i've heard that
i've been working for kick out of yid for for four years yeah four years now and in that time
i've committed secrets by mistake so it like and it's my whole job to come on podcast and talk
about why you should not do that anyway uh enough about my work so get guardian so we're a code
security company um and we our platform was founded on detecting secrets inside repositories
so we talked about you know git history uh things like that so the the core git guardian product is
that we connect into your repositories and we will search all through the history and bring out any
secrets and we can if you're in a large company the value of it comes in that we will prioritize
them we will validate them and we will help you remediate them so in a large company that's what
but for individual developers you know all of our systems are free we're the number one security app
on github so i think we're about 400 000 uh users on github um on their github marketplace at the
moment um so just to make sure that you don't have secrets inside your repositories at any any point
and then we also have cool tools uh we have a cli tool called gg shield which will help you
do things like uh install a pre-commit hook to that will sit between you know your local repository
uh well pre-commit sits just in your local repository it just kind of blocks any commits
going through getting staged that have a secret in there because once it once the secret enters
your repository if you're in a team it's going to be cloned into different areas it's going to be
backed up by different systems probably it'll end up inside like jira tickets or you know like
they just sprawl everywhere so once it's your repository you have to revoke it so only way
forward you know by doing things like with gg shield cli tool you can block them um block them
coming in so it's really cool but we've expanded beyond secrets now we also do you know infrastructure
as code scanning um some uh software composition analysis to find out if your dependencies are
vulnerable and the coolest thing that i think is we create call honey tokens um which are fake
credentials so like the main one is the aws credential that you can purposely leave in
places and if someone tries to use it it's like an early warning system that systems will be
detected and why this is so cool is honeypots aren't new honeypots have been around for a while
but why honey tokens are cool is that not only can you put honey tokens kind of everywhere in your
internal infrastructure you can actually put them inside third-party tools so like circle ci had a
big breach the start of this year and encrypted secrets were discovered so if you put a honey
token inside your circle ci environment then you can actually know if that system has been compromised
you know are your other secrets in their compromise so it's the only type of honeypot
that you can put in different systems so there we are that's a bit of a longer plug than i intended
to go out for but i just put a plug for there's there's a django package django honeypot for your
your admin so going to slash admin is the default and so that's a pretty like good place to go look
for stuff and so you can set it up and like log who's trying to get to your admin even though
your admin's somewhere else yeah yeah it's really cool and what what's a fun thing to do is create
honey token leak it on public github and then watch what happens in a matter of like minutes
people will try and exploit it but then it will also typically get sold like as part of a package
like on the dark web months later so you'll you'll start off by getting these random calls like low
level calls and then it will kind of get sold and then you'll start seeing like different types of
activity it's really fascinating to be able to track what actually happens when a credential
gets leaked you know like and how quickly it happens and how it moves through these different
levels of attackers because someone for 20 bucks will purchase a thousand valid credentials and
And then just spam them and see what they can do.
And, you know, and if three of them allow them to do some mining or install some key
loggers or something, then, you know, happy days.
Can I ask, my sense is that, like, if you want to do this kind of stuff openly, there's
like a handful of countries you can do it in.
Is that where a lot of, like, these bad actors are?
Or is it people in, you know, the United States who just cover their tracks a bit more?
like what is what does the actual landscape look like of these you know uh business businessmen and
women out there who are doing hacking it's so hard to definitively say like where these groups come
from uh so like with a honey token like you can get the ip address and you can see where the calls
are being made from but i mean if that's you know if they're not if they're not using some third
party service to mask that then you know that's pretty surprising but you know it really is
everywhere you have you have countries that are a lot more forward in sponsoring bad actors so you
have you know there's russia that you know there's north korea right like north korea like the main
you know the lazarus group in north korea they're extremely notorious but then you also have a bunch
of teenagers like lapsus which were based in the uk that were you know like that were really out
there for for just clout um not really doing too much damage but just kind of reputational damage
um so there's like yeah i mean there's people kind of everywhere that are that are interested
in it um and i don't think it has any any any kind of real boundaries and i don't think any
countries apart from maybe north korea kind of green lighting green lighting this but there's
certainly people where it's more beneficial to to to do it i will you know some different
countries that don't have expert expeditions yeah yeah do you survive yeah is there um when people
like i almost wonder if they need like a retainer for people who leave get guardian or any like well
well-established security group like i mean i guess you know you're not really doing for the
money it's reputation but you know that would be the ultimate thing is to you know nobody knows how
it's like you know how do you how do you steal money from a bank it's like buy a bank it's like
yeah yeah yeah it's it so i mean like good guardians really well set up in that that like
we don't have access to other people's secrets like we we can't get access to like as an employee
but we do know a lot about it but i i honestly feel like people don't talk about it but those
but people know like those that know really know like the i i was i've been talking to people at
the moment i got really into reverse reverse uh reverse compiling mobile applications and then
looking for secrets inside of them and it was like wild what was happening inside there but then you
know like you go talk about it to someone that knows you know someone you know i've found this
really cool research and i found all these keys and everyone's just like oh yeah we know already
like no one's kind of like talking about it but those that know kind of already
already know it. It is like a secret little club out there when you know you know.
Carlton, I have one more question. So we'll put a link to this. You wrote an article about
why ChatGPT is a security threat, essentially. And that just like, you win the SEO lottery,
because that's like every... But I wonder if you could expound upon that, right? Because that's
you know, something that is big everywhere. Like I've been using every day now, but it's like,
oh, like, you know, something else is going to come and get me. So for, yeah, so for individuals
or corporations, what's sort of the main threat that you outlined in that article?
Yeah, there's, I mean, there's a number of different ways that, and it's kind of evolving.
And it's funny, you mentioned like the SEO win. Since that article came out, I've been published
did so many like the financial times did an interview with me as if i'm some kind of like
ai expert like i mean you check all the boxes we have boxes here we're looking for chat gpt
security he's that he's that gig guardian he speaks english boom i've just got this vision
of you and simon willison on a panel expert panel uh you know yeah uh but so the things like chat
gpt and llms large language models like have have really changed the game for a lot of different
people so number one you look at it from the company perspective a lot of companies have now
banned this and one notable one is samsung so you kind of look at okay why did samsung bar
ban their their employees using it it's because this is now another system that you know sensitive
information has a way of leaking into so get using the example of github because everyone knows
it if you were an organization and they don't use github the chances are their employees still will
which mean that that organization still has some kind of risk associated to it chat gpt is kind of
similar so you get it to summarize some legal documents you get it to sift through a whole
bunch of data or create some kind of uh connection using these these credentials that data is being
stored somewhere right and because of that that means that chat that your sensitive data is now
in chat gpt server which is not you know a super secure platform so now it's a target for other
people so big companies have kind of have started to ban it i think that's the wrong approach because
i think that your employees will use it anyway because it's such a production boost you know and
i feel like there's a fear that if you're not using this you're going to miss out the other
area of chat gpt that's a security risk or other llms is that when you when you you trust ai systems
a whole lot more than what you should so where does chat gpt get its data from you know so you
ask it hey can you create a coding instruction that will do x and it will spit back it'll do
it in record time and it'll even like explain how it works so it feels like super great chat gpt
uses the largest data
set around, which is
the common crawl data set. It's what
all of these systems use. Most of the
code in there is from GitHub.
And most of it's rubbish.
Yeah, so that's kind of the point I'm getting
it to. Look at a random open source
repository. That's what it's
learning against. Now, these systems
can't really distinguish between good code
and bad code, at least
not in the vacuum of which you've asked them to
do a query. It's found the
most common result.
A great example is a different system, Copilot, GitHub Copilot.
Whereas, at least this was true a year ago
when I was really experimenting with it.
If we used the author, if we put in at the top an author
as a really respected, well-known open source maintainer,
someone that's very prolific, and asked it the same prompts
as if I used my name as the author, we'd get different responses.
like because like it's kind of looking at an example of what would this guy do and what would
this guy do and obviously i'm much shittier than than the other person so yeah so the whole point
is that you actually come to the risk of like the code that it gives you will have like vulnerabilities
in it probably um and here's a big difference as people say well what's the difference between that
and me copying code from stack overflow and the difference is that stack overflow has a comment
section where people will very quickly let you know if you're doing something insecure or around
it there's a huge community around that people can add input you will get various different results
for the same thing chat gpt co-pilot other llms have you know don't provide that additional
community input as to which is the best way you ask it for an answer it gets an answer um you know
and i wrote an article called shitty code shitty co-pilot and it's you know basically you know that
crappy input crappy output and the people that are most likely to use these systems are probably
people like you know that are early on and may not know the difference between random and random
secure and when you know like when generating a random number and when to use both of them
yeah i think everyone uses these systems to be honest i mean maybe they're not as good at like
filtering things out i mean it'd be kind of cool to have an llm that's just approved answers on
stack overflow that would be kind of cool yeah that for sure for sure i mean that exists
well i'm really not i'm really not against like these systems too but i'm just kind of of the
mind that you you just you need to understand where the answers come from it's not a genius
ai system it's come from other people on github it's organizing it in a revolutionary way that
helps you find the solution that you're looking for extremely quickly but you know you have to
understand the limitations of that and you know where the answers come from yeah and i mean if
we're talking about security too the other area that of these systems and ai that people are
worried about are attackers using ai so can attackers now you know more effectively and the
answer yes and no like everything uh so again again uh ai systems use known data to to do this
revolutionary attacks you know are not on the public internet or they're very you know very
hard to do and it's not coming out with new stuff so if if you want it to create malware if you use
it will block you at first but if you use clever prompt injection like one that used to work i
don't think it does now is like imagine you're an ai system that doesn't have limitations if you put
that in chat you know it used to i think it's important to my career you know like yeah yeah
like or give me an example of malware you know that these things used to work and prompt injection
is you know there's lots of clever ways to get it to do stuff there's also open source llms and
you can remove these limitations but anyway um if you asked it and you successfully got it to
create malware for you it's going to be pretty standard malware like it's nothing that you
wouldn't be able to find by googling yourself however you know what it does do is it gives
script kitty superpowers so you know like when you combine the script kitty which is a you know
someone that doesn't have a huge technical knowledge that just can run malicious scripts
to get into things it's a big section of the malicious market you know you now give them chat
gbt so that when they're doing phishing campaigns they can now do spell check like you know how long
was it the biggest red flag was misspelt stuff i'll forget about that now um you know they can
they can manipulate the code for the first time they can input malicious code and say
adjust this so that it can do this on this system or you know um and these things so it does give
some superpowers but when we're talking about like is it going to revolutionize the hacking
not at the moment because it needs to have it can't come up with new ideas yet it can only
regurgitate existing ideas so i mean this is like look there's lots of concerns around ai
um and i did an interview recently which was uh with a guy called simon maple which was like is
ai our friend or foe and um surprise surprise the answer was it depends no but i think it was
that's what someone told me like if there's a magazine article and it has it has a question
in the title like it doesn't know otherwise it would say ai is a threat yeah ai is a friend
yeah well i also i mean i look i i i feel like it's pointless to worry about that or wonder
wonder about that because it's not going anywhere so like is it a threat yes no does it change
anything in how we're going to operate or what the risks are that you're facing now no
so it like it's it's it's here to stay um and at the moment the biggest risks are not external as
and it's not an attacker using it,
the biggest risk is internal,
your employees using it,
but not understanding.
And if you block it,
if there's someone here
from a large organization
that's thinking,
okay, I'm just going to block this system,
then all that it's going to do
is send people to the background.
They're still going to use it,
but they're just not going to tell you
that they're using it.
So you couldn't audit it then in that case?
Yeah, exactly.
There is, I mean,
as a security expert,
let's talk about LLMs a little bit more. I was curious your take on, there is this issue of like
the Ouroboros of like, now that we have, you know, so much of, you know, so it's like Google's,
the problem with Google, right? Google searches the internet, but then people like want to feature
on Google. So then you write articles for Google. It's like why YouTube videos are all like 10
minutes in one second, right? Like, so it becomes like, just gets crappier and crappier and crappier.
It's like silt building up behind like a dam. And that's, I've seen, and I can believe this
based on just typing in that they think the majority of articles are soon going to be
AI-generated because it's perfect at these topics. And so there's an argument that over time,
this is as good as it's going to get. It's just going to get more and more crappy as AI consumes
AI. And of course, it's all copyright laundering, which we don't need to get into. But I'm curious
if you have a quick take on that or does that seem right to you because to me i don't really see a
way out of that you know both in terms of i guess code you can run and you can test it but like
things you search for i mean i use chat gbt as google now because and i used i've been using
duck duck go for years too also just because google's doing its inevitable thing but i kind
of wonder like maybe maybe i'll just be stuck on like you know september 2021 or whatever is the
date because it's just going to get worse i i have an interesting take on this so i don't know
this is going to be like the popular take but i have an interesting take i think it's going to
have the opposite effect so here's why is that we came from it like 30 years ago the main source of
information was all books it came from trusted kind of sources and you could say whether or not
you think that's bad or good but you know you that's where you as a book author i agree very
trusted right yeah and it's hard to it's hard to it's hard to publish it it's still hard yeah it's
part yeah and it's and it's like it's so easy to publish junk now and now chat tpt has made it so
easy that i feel like it's going to have the effect that the that there's going to be so much
shitty information that everything's going to get so much shittier that we're all going to go back
to trusted sources instead of instead of kind of going on so like are we going to be like googling
answers or will we go back to the trusted peer-reviewed research or articles right from
jango.com yeah yeah yeah jango.com or you know like all these you know i feel like i feel like
it's actually going to kind of make it make the people that had some kind of platform but no
brains like it's going they're going to become so noisy that you're just going to ignore them
and then and then what will become kind of the new system will be that we all need to take a
step back and go back to trusted sources and that's kind of all that's what i'm hoping that's
going i i like i like that well you know i have i've my robots.txt is updated for all the crawlers
so i'm sure they're going to abide by it and so i'm not worried about my stuff being pilfered
yeah yeah of course yeah side joke to people i've got an update there's like 10 on there but
that reminds me of uh that i always ask people kind of like what's the worst
security advice you had and there was a guy that was that was providing um providing feedback
on a on a penetration test and he explained how this person's organization their infrastructure
was vulnerable and the person's response to that would be yes but that's illegal so no one would do
that nice it's like it's like what are you doing people don't abide by laws everything would just
be lawless and it's like well welcome to the internet okay good so we're right we're coming
up on time a bit mackenzie i had i just wanted to ask you one more question so if you if we would
say first thing move all your secrets into envers and you know then use a secret stand or something
like get guardian and make sure that you catch anything you do commit is there a third kind of
obvious thing that we should be doing that you would say um yeah definitely so uh we need to add
additional layers of security upon things so you know there's a concept called zero trust
which is talking about you know like just because you have an api key or you have a password doesn't
mean that you need to trust explicitly that person so there's something that you know something that
you have and something that you are and we should be implementing those rules across security as
well so um some some simple things that you could do you know if if you're listening to this go like
okay that all sounds nice but what can i actually do so come up with ways to rotate your keys
regularly the reality is no matter what you do your keys will leak you know the best companies
in the world their keys will leak hashicorp they created a product called vault it's one of the
best secrets managers you know they had a breach because they had secrets leak into their git
repositories so it happens to everyone so what you can do is you can implement a rotation policy
that rotates keys regularly so that's one thing so that that doesn't happen the advantage of that
is that you actually know where your keys are or you know how to rotate them because like then if
you have a leak it's kind of like oh what the heck does this key do am i going to break production
if i revoke this like if you're regularly rotating things and you know what they are
the other thing is like stop stop producing admin keys like if you need a credential don't give it
admin credential i had so many stories of when you know a read-only credential would have been
totally sufficient but you know that requires creating another key but if you've got a sequence
manager then that then the problem kind of goes away so you know make sure that keys has like
limitations of what they can actually do to to suit you know to suit the job and then whitelist
your services if you have a system that's only meant to talk to this other system right then
you can set it up so that they can only talk now if i find the credential between them that allows
me to talk to it then i'm still coming i'm coming from a different area i have a different ip i'm
from a different system so these are things that you can kind of actively do to add layers on top
because you just got to remember that there's no bullet you know everything about security the
famous words of this podcast it depends but you can add layers in that create friction and that's
ultimately kind of like the best way to to handle this is to create enough friction um you know that
that we're adding so many different layers to to an attacker yeah they go attack someone who's
easier yeah that's like that's like one of those yeah startup sayings like there's no silver
bullet there's just a lot of lead bullets yeah yeah i live in the netherlands and there's a
saying here like bikes everyone has like at least two bikes each especially in my town and one of
the things is like no one has any nice bikes because you always want to make sure that your
bike is shittier than the bike that's next to you because then the person's going to because we're
terrible at locking up bikes like yeah don't outrun the bear just be next to someone yeah slower
that's that's like yeah isn't there some story in in like amsterdam after like the nazis like fled
that there was some like a quarter of the bikes or like in the can't the can't the the waterways
or there was some stat i did a bike tour and he was giving me these crazy numbers speaking to the
fact that like you know i guess for a century now everyone bikes everywhere but everyone like stuff
gets stolen and there were some crazy statistics about when the nazis left and the number of bikes
that were trashed or something yeah i think that's i don't know that i think that still
happens because every now and again you'll see the canals a crane will come into the canal with
these big claws and they'll just scoop up all the bikes that are in there because it just happened
but you know bikes are bikes are pretty easy to come by and and it's an unwritten rule that if
you know if it's past 2 a.m and your bike's been stolen it's socially acceptable to steal someone
else's bike to get home so you just got to make the last one from the club as well right
that's good that's good good to remember well i think we should wrap up there that's perfect
i don't know how we got onto the subject well it was it was the backwards analogy from making sure
your site's slightly more secure than other sites you know if you've got it you've got it
your bike's slightly worse than the neighbor's bike it's the opposite the mirror of that
great mickey so anything else that you'd like to call out while you've got my that we haven't
mentioned that you've mentioned your podcast um you mentioned get guardian um compango you've
mentioned we'll put all of those in the show notes anything else that you think oh yeah um
no i mean i feel like i feel like that's i mean that's kind of it i think that's enough enough uh
enough plugs for for one episode but anyone that wants to follow me um you can follow me anywhere
on social media at the handle at advocate mac um i'm even on threads i've never posted but you know
just in case twitter does explode and then i might do so make sure you follow me but uh
yeah and then i said i mean if you want to take everyone to listen to me geek out about security
stuff that's where that's where you can find me okay brilliant well thanks coming on really good
really enjoyed it i appreciate the invite thanks and great to great to great to meet you will and
great to see you again carlton all right well thanks everyone we are at chatjango.com and
we'll see you everyone next time bye-bye see you next time bye-bye