← Back to Show Notes

Transcript: Web Security - Mackenzie Jackson

welcome to another episode of django chats podcast on the django web framework carlton

gibson joined us by will benson hello will hi carlton hello will today we've got mackenzie

jackson from git guardian with us hello mackenzie thank you for coming on the show

hey guys great to be here thanks for having me welcome welcome to mackenzie oh go on go on go

on go on no say what you're gonna say go i'll just say that i've i briefly forgot that this

is an audio only podcast and i was waving at the camera like a lunatic but but i'm realizing now

that i it's audio only so i'll stop making obscene gestures no no no it's good i can wave back and

then you know you'll laugh the audience won't know why so mckenzie come we always get this

which who are you tell us why you're on this podcast how did you find how do you find python

how did we meet how the house you know what's your backstory yeah i got a interesting yeah

interesting interesting backstory i i was actually um i started off life in my first

my first life as an architect like a building architect and i hated it and i spent most of

my job trying to automate automate it by learning to code um in some of those big systems that we

have called you know bim systems and um and then i was kind of figuring out like why don't i just

skip this architecture thing and do what i want to do which is write software so then i uh kind

of went down that path and had my own startup for a while called compago which is it's still

around today it's headquartered in australia i haven't been involved i was in there for about

four years as a cto um and then i i left i guess when the company got big enough that it needed a

real cto and and so i and then but the one thing that i loved about um uh coding a bit of a

subsection is that we were a care provider building technology and kind of healthcare

space which meant we had to comply with a lot of different areas like HIPAA compliance and other

things and going down that I really learned how vulnerable software was to lots of different

things and how to secure it and I got really into that so when I left I decided to kind of focus on

security and that's been pretty much my jam. I work now for a company called GetGuardian which

is a code security platform and uh one of the the coolest things about my job is i get to work with

some research teams we discover you know how attackers are kind of exploiting code uh we

create talks about that and then we get to go to some cool conferences which is where i meet cool

people uh like carlton because we met at pi con in italy um which was yeah i think probably one of

a pizza i believe i think we'll let some after party and you were the keynote and so i was trying

to hassle you to use your keynote powers to get more drinks than the one drink limit that was

allowed at the party if i remember correctly something like that anyway that's not suitable

for the show so let's let's move on um so attacks are supporting so i've you know as part of my

fellow role i joined the django security team and we get quite a lot of um reports for you know you

know every month we'll have reports going so you do a lot of that forensic work trying to find

exports or you did you have them or you work with teams that do yeah we kind of work with work with

teams that um within giga and also external that really look at um some large-scale research

projects into how attackers are operating um so particularly around exploiting secrets exploiting

credentials uh also exploiting like misconfigurations and one of the cool things that we really like to

get into is when an attack happens to try and recreate that um attack path to try and figure

out exactly you know not only like how did they get in but what tools did they use can we recreate

the the system and point out exactly what we meant vulnerable because the nature of security

vulnerability reporting is that if an application has a security flaw that gets exploited

then the blog post that inevitably comes out from the company is extremely limited and will you know

won't give too much information so it's kind of taking that and then trying to figure out

what what's behind it really well i mean so even with the um the issues that are on django we

we just kind of like we'll put a post we'll say look there's a dos vulnerability or a militant

it's vulnerable to maliciously crafted input but we won't necessarily say exactly how you

you know you do the attack but for one reason we don't sort of want to make it too easy for

the people who are attacking unpatched jangos so

i guess it's difficult to know how much you should say because i kind of also in the same

ref i kind of think an attack is not going to be put off by the lack of detail in the report

they're just going to work it out i don't know yeah i think exactly so the the there's there's

varying levels of thought behind us and you have to be responsible on how you disclose you know

information um and we certainly wouldn't just put out um kind of like a cheat sheet of you know how

to do it but at least finding out you know at at what point at what point where it was initial

access made um where did they where did they actually was it a phishing campaign how did the

whole thing start and unravel um now when you're talking about cves or like vulnerabilities and

dependencies and things like that then well it's a lot uh you have to be a lot more responsible

and how you disclose that because as you would know uh two years on there's still people running

vulnerable vulnerable packages and things so you you have to be a bit more careful but in terms of

recreating it and not necessarily just just showing the logic behind which the attack is used because

i think a lot of people don't understand attackers logic if that makes sense well one perhaps one

example you might be able to give is um we had um various issues fixed over time about enumeration

attacks where you know people are able to they'll make a request and it's perfectly

legitimate request but it somehow reveals something about the the application or the

ids or something like that and then they they make the next one and they make the next one

and by doing that kind of thing they're able to kind of get a size of the application or predict

ids or predict urls and in it in and of itself that's not really a vulnerability but perhaps

then they use that as part of a bigger attack and you know i don't know yeah exactly like

exploiting logic flaws on how to how to do how to do things um and you know like a lot and a lot of

a lot of kind of attacks happen because people use you know the wrong the wrong function so the

wrong you know you're generating a random number there's multiple ways to generate a random number

some are predictable using maths but so if you're trying to use something to create access via these

random numbers then you you know you you have to be sure to use the the right thing now that

doesn't mean that the un the insecure version is pointless it's it has its reason um it's just kind

of the logic behind which you're implementing it is it's kind of flawed and people can often like

figure figure that out i've had one more question that came up from what you said and i'll you know

I see Will's got a question or something.

But I kind of always say that you must update.

You must be on a secure version.

You mustn't use an end-of-life version of Django.

And I say that simply because I kind of think as soon as these reports are out in the open, within a short period of time, there's just kits you can download, which you can automatically run, which test every available export known.

And it's not a question of if your system is cracked.

It's just when it's cracked.

Is that fair?

is that a reasonable approach or am i being a bit over cautious in your opinion well i mean like

there's there's uh no i don't think you're being overly cautious i think that's like you absolutely

have to patch uh regularly and definitely patch anything that has vulnerabilities against it

against it like it's absolutely vital to be able to do that um and what you i think what we typically

find in companies that can't patch regularly is that they if you you should always have a system

and post it to patch regularly regardless whether or not there's some critical cve because if you're

in a habit of doing it regularly it becomes easier when you need to do it and it's critical to do it

right and i think a lot of people kind of will patch when something's critical but you know that

can you know that can cause all kinds of havoc and it becomes a scary thing to do and i think people

often shy away from it but you know patching regularly is uh it is absolutely fundamental

especially when there's a vulnerability out against it the argument the kind of against

patching regularly it's a bit of a weak argument but just to make it is that it takes about when

a new version comes out some often it can take about three months to figure out if it's vulnerable

to anything especially if it's like a big change in something so then you know the arguments is

do you patch immediately when a new version is out or do you kind of wait and i think the solution

is you know like patch regularly whenever there's a vulnerability no one against it and then stick

to a regular patching routine for everything else where uh it's consistent and you you know like

you're you're doing it but if there is a vulnerability people will be able to find it

they build scripts to be able to like exploit these automatically you know it's we're not talking

about uh we're not talking about the most sophisticated actors if there's a cve against it

like you've you've basically given the cheat codes to how to exploit your application so you should

definitely patch yeah right okay okay and i guess just that point about new versions i guess for

django say um you know 5.0 is about to come out and you might wait for 5.0.1 or 5.0.2 if you're

particularly worried you know about those first regressions but meanwhile 4.2 point whatever is

still being released with the security updates so you should definitely be getting that each month

yes yes yeah i mean absolutely if you're lucky enough to hopefully people like here are using

django you know using well-supported frameworks like that they should be on this show i tell you

on our podcast yeah because i mean well there's a lot of frameworks that aren't

you know that don't go through that they don't have the security team

going through it so yeah you you're already on the good step if you're using django

oh well thank you we'll put that on the pull out quote on the website

well since since you mentioned that i'll i'll throw this question to you because carlton and i were

not sure how to address it. So Flask, which is another big Python web framework, recently there

was some discussion in the community about something unrelated to security. But that's

an example of something widely used that, as far as I'm aware, does not have a formal security team,

for example. And so I'm curious, if you saw any of that, there was an issue with Flask login,

which is essentially a third-party package where it hasn't been maintained.

And so there's a new version of Flask and all of a sudden login was breaking for everyone.

And there were some comments about that and people thinking that Flask was this, you know, Microsoft or something

and had all the support when really it's like one person and a handful of volunteers maintaining.

So I guess more like I just want to address that because it's been out there.

But I'm curious, like where you sit, when you see web frameworks, you see that, right?

there's the whole gamut there's django and then there's many widely used frameworks that are

you know may not have robust security things i'm not saying putting that on flask but like

a lot of these projects may have lots of users and it's a you know handful of people doing it

i mean absolutely and it's always just like um way up between using like what may be cutting

edge like at one point django was like cutting edge right out there you were a trendsetter if

you're using it you know and so um a lot of security often doesn't come into the equation

until later on too you know when i when we first built the compago the startup that i was in we had

a dot net back end and a react front end why because that was like we had a dot net guy and

we had a react guy exactly like you know that's what we like that's what we had right so we're

just you know everyone's like why aren't you using node and react because we didn't have a node guy

where we had a .NET guy.

So when it comes to like these types of frameworks

and things like that, you know, like if you have the foresight

and I think, you know, people that have been around

for a while will be able to know and they understand it

and, you know, certain different frameworks

will do different things.

They may be more secure.

They may be faster.

They may be able to handle large quantities of data better.

So there's like lots of considerations

and often security is a forethought.

but you you really there's a couple of things to look for uh when choosing a framework one does it

have a long history of of being maintained and i don't necessarily mean like years and years and

years i mean you know is it constantly being maintained is there a community around it

because if if those are the cases and you should you know you can you can start to

tick off and feel a bit feel a bit better is there a security team for it well that's

not normal you know for for everything but that should definitely help you kind of make those

decisions so if you are in the fortunate position to be able to pick frameworks then

looking at the community looking at a team that maintains it because you know you will be surprised

you know what happens is the example of the um of ua parser uh this is i apologize for everyone

this is a node you know package that's okay there'll be a there'll be a there'll be a python

equivalent but you know ua parcel was uh just a package that let you know what operating system

your users were you know viewing your app on what it's something very simple it was used by

had 10 million weekly downloads um and it was maintained by one guy named for shell and uh his

np his node his mping account got hacked and then someone created a military version for it you know

this is because you know this is this one guy maintaining this thanks you know thanks thanks

and it would have passed everything,

but you have to also consider the team behind these,

not just, is it popular?

Yeah, and think, go on, so thinking about that,

I mean, PyPI have done an awful lot recently

about tightening up, just within the Python ecosystem,

about tightening up, you need to use two-factor auth now

if you're a popular project,

and they've got this trusted publishers thing,

what's that?

So in terms of, so Py has actually,

was an npn and also get like github is also forcing two-factor authentication because this

was pretty much one of the main ways attackers were kind of creating malicious uh applications

is that what would typically happen is that people would specialize in kind of phishing

uh these you know these maintainers getting into their accounts um and if there's no two-factor

authentication you know like with a well-structured phishing campaign isn't easy but it's certainly a

lot easier there's no 2fa to get into it and um these supply chain type of cat attacks can have

massive uh implementations you know because you're not just attacking this one system you know you're

potentially uh creating a vulnerability on millions of applications so one thing to remember

about attack is is that they operate on economics like a normal business does where there's a risk

reward like is the risk of me attacking this system going to outweigh the reward that i you

know that i could get potentially and when you're talking about systems you know like pi pi packages

that are being used by millions of different applications then you know the rent the reward

is massive potentially because you could install you know even a crypto miner that gets released

onto a million websites you know can create something so simple can create you know pretty

good profits um so it's it's it's really good that these uh that these package managers are

actually implementing you know more security implementations around this because it it like

2fa may sit not seem like a lot but you will really strip back the amount of account takeovers

from that it's just significantly harder because of that extra step because they've got somehow

get trick you into entering that as well at the same time yeah yeah and and exactly and then like

the step above that is the trusted publishers i think by pi pi which you know like are they

which is another set of requirements that pi pi you know puts on these publishers to make sure

that that doesn't happen um so you know making sure that there's not long-lived passwords

that's being accessed in there so that when you look at that it's just another tick box

um and when you're choosing packages you know just you don't need to spend hours on it it should be

easy to look at something and say oh this is secure and that's why something like the trusted

publishers is really powerful because you know you can quickly look at that and know that it

meets the required like the criteria of at least that minimum so you can move forward it's actually

quite powerful in terms of being able to quickly make decisions of what to introduce into your

project okay and is it going to be secure okay um i'm getting a little niggle inside though because

there's some of some of these metrics that you you sometimes get on projects they can be like

your did you tick this box did you tick that box and it's like some of them are are you using a

particular feature on github and it's a bit like well no i'm not we're not using that particularly

say for instance django django's got its own release program um release process it's got its

own security process it's got security archive it does its way of handling cves it's all you know

top quality really but it's not going through the github security advisories panel therefore we don't

get the tick in the box on that metric and you're sorry i sometimes get a little bit like oh i hate

these metrics but i i totally i can totally understand that but when you're the the rebut

that i would have to that is that i find that these metrics are more for smaller packages yeah

right you know be able to use them uh with confidence rather than you know something

that's massive you know that's that's trusted by people if it doesn't have the tech box then

you know like you you can you can still get it passed but in terms of like if i'm just looking

for i just need a package that's going to be able to do the small job can i trust putting this into

my production or not if it has the tick box then that's a good step you know but it's not the be

all and end all and no security will will be the be all and end all and i work for a vendor

and you know we have people that come up to us and just be like okay so here's my list of

requirements that i need to take for for my sock certification or for this or for that you know

like where does your product fit in oh you don't fit into this tick box oh i don't need it then

it's kind of like so they're not worried at all about getting hacked they're worried about like

compliance and i guess that's important but so i understand yes the how how good is the stand

where you're coming from how good is the box on the questionnaire is it you know because if the

box on the questionnaire isn't right it's no good at all okay yeah and how do you make a questionnaire

that fits everyone right you like that yeah it's the same questionnaire box for for such wildly

different applications well the the two-factor authentication is interesting because this is a

thing in Django with there is no built-in two-factor auth. There is a third-party package

that Jazz Band maintains, but maybe there's been separate discussion around auth in Django

recently. Carl's been advocating for some changes because there's a number of, I'll just say off the

top, there's a number of things, Django, if you were going to do it today, like there's first

name, last name is the default. Well, that doesn't fit a lot of the world's population, for example.

Also, it defaults to username, email, username and password.

Most people want email, but maybe I'll, Carlton, I just wound you up, go.

Well, like literally, so like the whole point about Django is a batteries included framework,

right?

It's meant to provide the batteries and, but it's not any old battery because like, for

instance, it used to provide comments, but comments isn't something you can't build yourself

or can't be maintained in the ecosystem.

And it's, it's not, it's not, it's a bit of a burden to maintain because there's so

many you know opinions about what it might have and so many different ways it could go so jango

contract.com comments was pulled out and it's a third-party package now but auth auth really is a

battery that jango has to provide because it's so central and it's so hard and if you get it wrong

the consequences are so bad that that's a battery jango should provide and so yeah we we've got good

auth we've got good central central but we don't have this two-factor bit yet and for me it's it's

kind of like that's a missing battery that would be really nice if we could do something that's a

you know one-time passwords or i don't know what the pass keys are the new things tell us about

pass keys and one-time passwords what are all these things mckenzie because there's a there's

a lot happening in the changing on authentication because your authentication remains kind of uh

like a like a big a big weak weak leak and especially our reliance on different things

like api keys and uh and that that are kind of just sprawling everywhere because they're handled

by so many different people so some of the things that people are trying to do to essentially remove

these points of the vulnerabilities is to create basically the same systems but only valid once

and created for the purpose of that session so you know like you have something like a dynamic api key

that's managed by a trust you know like a vault or something where the api key is created you then

use it and then you then destroy it at the end and it's only valid for one time and its lifetime

is a matter of minutes or at most yeah seconds yeah yeah or you know whatever however long that

that it that it takes right you know um and so these are kind of really and i think we can expect

um these to to really start taking over along with kind of rule-based or authentication that's

being implemented in lots of different ways because one of the problems that we also have

been facing is that when you're trying to manage you know pass keys and passwords and api keys and

all of that you know you can be tempted to create you know you need to do multiple different jobs

if i create one admin key to be able to do all those different jobs then i don't have to manage

all these different keys right but then if that that key becomes so sensitive so uh you know having

having role-based authentication where you know you restrict what what you know to the absolute

bare minimums and your authentication is created you know for the purpose that you're trying to do

with all the minimum permissions that you're trying to do and with infrastructure when we're

kind of getting into you know infrastructure as code and on all of these different systems and

secrets faults and we can actually tie them all together so that it works really you know really

nicely and i think that um we're not using these two to the full extent but they're becoming more

and more uh expected and certain things that i mean like we i guess using your analogy i guess

we can expect you know frameworks to start putting in these different batteries um you know as we go

go down well i think i think that's one thing that's sort of it's it's been discussed a few

times and it hasn't quite happened yet but it's like django hasn't really got a solution for

secrets handling in in place um so you start off you get a you get a settings file and in there

the real secret is your your database password and your um your your this secret key which is

used for um signing and and whatnot so it's important that you you don't commit those to

git and we can talk about git guardian in a minute and and whatnot but the first part has

always been stick those in a setting in an environment variable and kind of what would

be what would be nice i think for django to have is a kind of um a pluggable um interface around

that so okay if you're using mvers you get it from there but then there's all these other mechanisms

like you know vaults and secret managers and things that we we could sort of swap out the

back end and you could be using those as well i think it'd be nice to have something like that

in the django space yeah yeah for sure i mean it's uh secrets management is going to persist to be a

problem that we're going that we have to kind of uh deal with and i mean people may have it's you

know it's funny one of the one of the common passwords that guardian detects once later you

know is the django is the django secret keys because often people get excited they created

their first django project they get at all and commit to to get and then all of a sudden they've

you know released this this secret django key now if it's probably not that interesting and to it to

an attacker at least if it's your first project and you're kind of having a play you know but

systems don't know that so uh they alert on it um but i kind of feel like that's a good process

because if you leak something on github you're going to get an email about it and then you kind

are forced to learn how to securely do it from the start.

And, you know, when you're talking about environment variables

and .env files, I mean, and vaults and secret managers,

I mean, there's huge arguments about what to use

and when to use them.

And I think I differ from most of the security community

from what I think.

No, cool.

Well, tell us what you think because this is one reason we get stuck

is that people say oh we want this but then there's a disagreement but what about that and

we can't quite agree on what we should have so we don't do have any anything it's like we're not

yeah so so a funny so a funny story about this is that at pycon italy where i met carlton i had my

talk um and you know my talk was on how to securely manage secrets in python projects and

you know one of i mentioned multiple ways to do it but one of the ways and the way that i like

particularly like is using environment variables and dot env files the talk before me i don't know

if this was like organized by the plan by the organizers but the talk before me the entire talk

was about why you shouldn't use environment variables for it so and so i i like environment

variables but i want to start off the bat and say that they're not the most secure thing to use them

so if you if you ask a security person you say how should i manage my secrets then the official

answer is you should use something like a vault or a secrets manager that's dedicated to that

that's a server it's going to create dynamic secrets so just in time you can connect to that

to you know to to authenticate your developers so that they have access to secrets or their apps

have access to secrets that no developers you know and it becomes like this heavily complicated thing

and what will happen most of the time is that they will interact with that once i go this is

such a pain in the ass to interact with the system and then they create secrets.txt on their desktop

and they store all their api keys there because then they don't have to deal with this heavy

system that some security person spent a whole year implementing that has you know 400 pages

of docs to go through of how to correctly use it and and that creates another problem and then

that's why secrets end up in your history because you you've been told okay i need you to create

this feature and you need to connect to some kind of data bucket to do it so you to start off with

you just hard code the secrets because to do it properly such a pain yeah that you will you know

that it's a pain but don't worry because by the time code review comes around you'll have removed

that not knowing that that secret is now in your git history but no one's seen it so no one actually

knows that it's there and that creates like a a big problem i know this is getting very long

no no that's why gone but that's brilliant though because you've got an intermediate commit that

doesn't appear in the pull request for you but it's still got the secret in it but you just

exactly yeah yeah exactly because you know like because you know like you because you just you've

you're under time pressure you're trying to use it quickly and people don't understand that and

that's why uh an attacker if they make it into your git repository and they'll fish there's lots

of ways for them to do this um they'll scan your history now the top layer of gets probably not

going to have any secrets in it um like by what i mean by the top layers is kind of like what's

on the main branch you know what's in the the latest commits on on everything but when you go

deeper you're going to find all these secrets that have been added and removed from people because

dealing with these heavy systems is is a nightmare now does that mean you shouldn't use the heavy

systems i i really think it it you know like it's gonna it depends like everything but why i like

environment variables is because for most people that's an adequate you know solution and it's easy

enough to secure you create a dot env file in your repository you create a dot get ignore file

and that's going to solve a lot of your problems the argument against that is that there are ways

to dump out your environment if you're kind of if your infrastructure

your server or you know yeah your other your has been your operating systems have been compromised

then if the first thing any attack is going to do is type in env and dump out the environment

variables from a running application and see what's and what's it that that is 100 going to

be the first thing that they're going to do and so the argument against it is that if you're using

environment variables you've just created a nice package for every you know for the attackers but

my argument is if an attack is made at that far like let's face it like you're not in a good

position anyway like i can i have access to your ram i can find secrets in other ways maybe it's

not wrapped up but let's not pretend that like the env file was the problem here like like you've got

bigger problems that you need to deal with yeah right by the time they're on your server you're

in trouble yeah so like my kind of thinking behind this is like look it is great to it is great to

have these heavy systems in there and i think they have their place um but you have to understand

like are you mature enough to effectively use them is that does the team have enough training

you know around around that because um when you get to a large enough organization you could segment

people out so that you know small small number of people have access to these machines they know how

to use it these systems um then that's fantastic if you're a startup of 10 people you know environment

variable files are great it will put them in local memory and at least you're going to you know

prevent them being exposed in other ways like on get so you know that's that's yeah my my rant over

because i mean it's yeah yeah there's there's lots of things but i on my personal opinion is that

it's not the most secure way in the world to do it but it's so much better and easier and it

reduces the friction which is part of the problem with security i was i wanted to mention like i put

my old man hat on when i worked at startups in san francisco github was like down the street and

back in the day you could just search for anything so you could search you know we as a like game we

like get aws keys you could get stripe keys because they were brand new you know so wild wild west

like you just hey the search is powerful boom there it is and like you could literally see it

for every company

because there was no automated,

there was no hiding it,

there was no email,

there was none of this stuff.

It was just like, yep, search, search it all.

So that was sort of a fun game we would do.

It's still basically like that.

It's a little better, but not.

If anyone's listening,

if you go to api.github.com forward slash events,

then what that will take you to

is the GitHub events API.

everything happening in real time oh yeah you will get rate limited but you can create lots of tokens

and to the first like you don't even need authentication for the first time

like you can do it in xornita that will still come up in that you have all the commits that

are happening but you also have the the email addresses from people they're locally configured

get email address so if you're interested in targeting like a specific company you can just

filter out of domains for people that are committing with a i don't know pick a company

at twilio.com email address

and find their personal GitHub IDs

and start scanning all their stuff.

It's still, the search feature is a little bit harder.

Dawking's got harder for sure.

But in terms of finding keys,

we found 10 million secrets on GitHub

in public repositories last year.

And 2 million of them were for cloud provider keys.

so like it's a which we're all valid because we validate them those particular keys we validate

so like it's like it's bloody wild what you will find publicly but i will say github out there

implementing other things that make it better along with like some companies okay so you've

said we then you've mentioned git garden so let's get going okay tell us so what is git garden

you know can i be should i be using this if i'm a django developer is this this helpful to me

of course of course you're not using it carlton oh man wait i know i never i never commit secrets

i've heard that

i've been working for kick out of yid for for four years yeah four years now and in that time

i've committed secrets by mistake so it like and it's my whole job to come on podcast and talk

about why you should not do that anyway uh enough about my work so get guardian so we're a code

security company um and we our platform was founded on detecting secrets inside repositories

so we talked about you know git history uh things like that so the the core git guardian product is

that we connect into your repositories and we will search all through the history and bring out any

secrets and we can if you're in a large company the value of it comes in that we will prioritize

them we will validate them and we will help you remediate them so in a large company that's what

but for individual developers you know all of our systems are free we're the number one security app

on github so i think we're about 400 000 uh users on github um on their github marketplace at the

moment um so just to make sure that you don't have secrets inside your repositories at any any point

and then we also have cool tools uh we have a cli tool called gg shield which will help you

do things like uh install a pre-commit hook to that will sit between you know your local repository

uh well pre-commit sits just in your local repository it just kind of blocks any commits

going through getting staged that have a secret in there because once it once the secret enters

your repository if you're in a team it's going to be cloned into different areas it's going to be

backed up by different systems probably it'll end up inside like jira tickets or you know like

they just sprawl everywhere so once it's your repository you have to revoke it so only way

forward you know by doing things like with gg shield cli tool you can block them um block them

coming in so it's really cool but we've expanded beyond secrets now we also do you know infrastructure

as code scanning um some uh software composition analysis to find out if your dependencies are

vulnerable and the coolest thing that i think is we create call honey tokens um which are fake

credentials so like the main one is the aws credential that you can purposely leave in

places and if someone tries to use it it's like an early warning system that systems will be

detected and why this is so cool is honeypots aren't new honeypots have been around for a while

but why honey tokens are cool is that not only can you put honey tokens kind of everywhere in your

internal infrastructure you can actually put them inside third-party tools so like circle ci had a

big breach the start of this year and encrypted secrets were discovered so if you put a honey

token inside your circle ci environment then you can actually know if that system has been compromised

you know are your other secrets in their compromise so it's the only type of honeypot

that you can put in different systems so there we are that's a bit of a longer plug than i intended

to go out for but i just put a plug for there's there's a django package django honeypot for your

your admin so going to slash admin is the default and so that's a pretty like good place to go look

for stuff and so you can set it up and like log who's trying to get to your admin even though

your admin's somewhere else yeah yeah it's really cool and what what's a fun thing to do is create

honey token leak it on public github and then watch what happens in a matter of like minutes

people will try and exploit it but then it will also typically get sold like as part of a package

like on the dark web months later so you'll you'll start off by getting these random calls like low

level calls and then it will kind of get sold and then you'll start seeing like different types of

activity it's really fascinating to be able to track what actually happens when a credential

gets leaked you know like and how quickly it happens and how it moves through these different

levels of attackers because someone for 20 bucks will purchase a thousand valid credentials and

And then just spam them and see what they can do.

And, you know, and if three of them allow them to do some mining or install some key

loggers or something, then, you know, happy days.

Can I ask, my sense is that, like, if you want to do this kind of stuff openly, there's

like a handful of countries you can do it in.

Is that where a lot of, like, these bad actors are?

Or is it people in, you know, the United States who just cover their tracks a bit more?

like what is what does the actual landscape look like of these you know uh business businessmen and

women out there who are doing hacking it's so hard to definitively say like where these groups come

from uh so like with a honey token like you can get the ip address and you can see where the calls

are being made from but i mean if that's you know if they're not if they're not using some third

party service to mask that then you know that's pretty surprising but you know it really is

everywhere you have you have countries that are a lot more forward in sponsoring bad actors so you

have you know there's russia that you know there's north korea right like north korea like the main

you know the lazarus group in north korea they're extremely notorious but then you also have a bunch

of teenagers like lapsus which were based in the uk that were you know like that were really out

there for for just clout um not really doing too much damage but just kind of reputational damage

um so there's like yeah i mean there's people kind of everywhere that are that are interested

in it um and i don't think it has any any any kind of real boundaries and i don't think any

countries apart from maybe north korea kind of green lighting green lighting this but there's

certainly people where it's more beneficial to to to do it i will you know some different

countries that don't have expert expeditions yeah yeah do you survive yeah is there um when people

like i almost wonder if they need like a retainer for people who leave get guardian or any like well

well-established security group like i mean i guess you know you're not really doing for the

money it's reputation but you know that would be the ultimate thing is to you know nobody knows how

it's like you know how do you how do you steal money from a bank it's like buy a bank it's like

yeah yeah yeah it's it so i mean like good guardians really well set up in that that like

we don't have access to other people's secrets like we we can't get access to like as an employee

but we do know a lot about it but i i honestly feel like people don't talk about it but those

but people know like those that know really know like the i i was i've been talking to people at

the moment i got really into reverse reverse uh reverse compiling mobile applications and then

looking for secrets inside of them and it was like wild what was happening inside there but then you

know like you go talk about it to someone that knows you know someone you know i've found this

really cool research and i found all these keys and everyone's just like oh yeah we know already

like no one's kind of like talking about it but those that know kind of already

already know it. It is like a secret little club out there when you know you know.

Carlton, I have one more question. So we'll put a link to this. You wrote an article about

why ChatGPT is a security threat, essentially. And that just like, you win the SEO lottery,

because that's like every... But I wonder if you could expound upon that, right? Because that's

you know, something that is big everywhere. Like I've been using every day now, but it's like,

oh, like, you know, something else is going to come and get me. So for, yeah, so for individuals

or corporations, what's sort of the main threat that you outlined in that article?

Yeah, there's, I mean, there's a number of different ways that, and it's kind of evolving.

And it's funny, you mentioned like the SEO win. Since that article came out, I've been published

did so many like the financial times did an interview with me as if i'm some kind of like

ai expert like i mean you check all the boxes we have boxes here we're looking for chat gpt

security he's that he's that gig guardian he speaks english boom i've just got this vision

of you and simon willison on a panel expert panel uh you know yeah uh but so the things like chat

gpt and llms large language models like have have really changed the game for a lot of different

people so number one you look at it from the company perspective a lot of companies have now

banned this and one notable one is samsung so you kind of look at okay why did samsung bar

ban their their employees using it it's because this is now another system that you know sensitive

information has a way of leaking into so get using the example of github because everyone knows

it if you were an organization and they don't use github the chances are their employees still will

which mean that that organization still has some kind of risk associated to it chat gpt is kind of

similar so you get it to summarize some legal documents you get it to sift through a whole

bunch of data or create some kind of uh connection using these these credentials that data is being

stored somewhere right and because of that that means that chat that your sensitive data is now

in chat gpt server which is not you know a super secure platform so now it's a target for other

people so big companies have kind of have started to ban it i think that's the wrong approach because

i think that your employees will use it anyway because it's such a production boost you know and

i feel like there's a fear that if you're not using this you're going to miss out the other

area of chat gpt that's a security risk or other llms is that when you when you you trust ai systems

a whole lot more than what you should so where does chat gpt get its data from you know so you

ask it hey can you create a coding instruction that will do x and it will spit back it'll do

it in record time and it'll even like explain how it works so it feels like super great chat gpt

uses the largest data

set around, which is

the common crawl data set. It's what

all of these systems use. Most of the

code in there is from GitHub.

And most of it's rubbish.

Yeah, so that's kind of the point I'm getting

it to. Look at a random open source

repository. That's what it's

learning against. Now, these systems

can't really distinguish between good code

and bad code, at least

not in the vacuum of which you've asked them to

do a query. It's found the

most common result.

A great example is a different system, Copilot, GitHub Copilot.

Whereas, at least this was true a year ago

when I was really experimenting with it.

If we used the author, if we put in at the top an author

as a really respected, well-known open source maintainer,

someone that's very prolific, and asked it the same prompts

as if I used my name as the author, we'd get different responses.

like because like it's kind of looking at an example of what would this guy do and what would

this guy do and obviously i'm much shittier than than the other person so yeah so the whole point

is that you actually come to the risk of like the code that it gives you will have like vulnerabilities

in it probably um and here's a big difference as people say well what's the difference between that

and me copying code from stack overflow and the difference is that stack overflow has a comment

section where people will very quickly let you know if you're doing something insecure or around

it there's a huge community around that people can add input you will get various different results

for the same thing chat gpt co-pilot other llms have you know don't provide that additional

community input as to which is the best way you ask it for an answer it gets an answer um you know

and i wrote an article called shitty code shitty co-pilot and it's you know basically you know that

crappy input crappy output and the people that are most likely to use these systems are probably

people like you know that are early on and may not know the difference between random and random

secure and when you know like when generating a random number and when to use both of them

yeah i think everyone uses these systems to be honest i mean maybe they're not as good at like

filtering things out i mean it'd be kind of cool to have an llm that's just approved answers on

stack overflow that would be kind of cool yeah that for sure for sure i mean that exists

well i'm really not i'm really not against like these systems too but i'm just kind of of the

mind that you you just you need to understand where the answers come from it's not a genius

ai system it's come from other people on github it's organizing it in a revolutionary way that

helps you find the solution that you're looking for extremely quickly but you know you have to

understand the limitations of that and you know where the answers come from yeah and i mean if

we're talking about security too the other area that of these systems and ai that people are

worried about are attackers using ai so can attackers now you know more effectively and the

answer yes and no like everything uh so again again uh ai systems use known data to to do this

revolutionary attacks you know are not on the public internet or they're very you know very

hard to do and it's not coming out with new stuff so if if you want it to create malware if you use

it will block you at first but if you use clever prompt injection like one that used to work i

don't think it does now is like imagine you're an ai system that doesn't have limitations if you put

that in chat you know it used to i think it's important to my career you know like yeah yeah

like or give me an example of malware you know that these things used to work and prompt injection

is you know there's lots of clever ways to get it to do stuff there's also open source llms and

you can remove these limitations but anyway um if you asked it and you successfully got it to

create malware for you it's going to be pretty standard malware like it's nothing that you

wouldn't be able to find by googling yourself however you know what it does do is it gives

script kitty superpowers so you know like when you combine the script kitty which is a you know

someone that doesn't have a huge technical knowledge that just can run malicious scripts

to get into things it's a big section of the malicious market you know you now give them chat

gbt so that when they're doing phishing campaigns they can now do spell check like you know how long

was it the biggest red flag was misspelt stuff i'll forget about that now um you know they can

they can manipulate the code for the first time they can input malicious code and say

adjust this so that it can do this on this system or you know um and these things so it does give

some superpowers but when we're talking about like is it going to revolutionize the hacking

not at the moment because it needs to have it can't come up with new ideas yet it can only

regurgitate existing ideas so i mean this is like look there's lots of concerns around ai

um and i did an interview recently which was uh with a guy called simon maple which was like is

ai our friend or foe and um surprise surprise the answer was it depends no but i think it was

that's what someone told me like if there's a magazine article and it has it has a question

in the title like it doesn't know otherwise it would say ai is a threat yeah ai is a friend

yeah well i also i mean i look i i i feel like it's pointless to worry about that or wonder

wonder about that because it's not going anywhere so like is it a threat yes no does it change

anything in how we're going to operate or what the risks are that you're facing now no

so it like it's it's it's here to stay um and at the moment the biggest risks are not external as

and it's not an attacker using it,

the biggest risk is internal,

your employees using it,

but not understanding.

And if you block it,

if there's someone here

from a large organization

that's thinking,

okay, I'm just going to block this system,

then all that it's going to do

is send people to the background.

They're still going to use it,

but they're just not going to tell you

that they're using it.

So you couldn't audit it then in that case?

Yeah, exactly.

There is, I mean,

as a security expert,

let's talk about LLMs a little bit more. I was curious your take on, there is this issue of like

the Ouroboros of like, now that we have, you know, so much of, you know, so it's like Google's,

the problem with Google, right? Google searches the internet, but then people like want to feature

on Google. So then you write articles for Google. It's like why YouTube videos are all like 10

minutes in one second, right? Like, so it becomes like, just gets crappier and crappier and crappier.

It's like silt building up behind like a dam. And that's, I've seen, and I can believe this

based on just typing in that they think the majority of articles are soon going to be

AI-generated because it's perfect at these topics. And so there's an argument that over time,

this is as good as it's going to get. It's just going to get more and more crappy as AI consumes

AI. And of course, it's all copyright laundering, which we don't need to get into. But I'm curious

if you have a quick take on that or does that seem right to you because to me i don't really see a

way out of that you know both in terms of i guess code you can run and you can test it but like

things you search for i mean i use chat gbt as google now because and i used i've been using

duck duck go for years too also just because google's doing its inevitable thing but i kind

of wonder like maybe maybe i'll just be stuck on like you know september 2021 or whatever is the

date because it's just going to get worse i i have an interesting take on this so i don't know

this is going to be like the popular take but i have an interesting take i think it's going to

have the opposite effect so here's why is that we came from it like 30 years ago the main source of

information was all books it came from trusted kind of sources and you could say whether or not

you think that's bad or good but you know you that's where you as a book author i agree very

trusted right yeah and it's hard to it's hard to it's hard to publish it it's still hard yeah it's

part yeah and it's and it's like it's so easy to publish junk now and now chat tpt has made it so

easy that i feel like it's going to have the effect that the that there's going to be so much

shitty information that everything's going to get so much shittier that we're all going to go back

to trusted sources instead of instead of kind of going on so like are we going to be like googling

answers or will we go back to the trusted peer-reviewed research or articles right from

jango.com yeah yeah yeah jango.com or you know like all these you know i feel like i feel like

it's actually going to kind of make it make the people that had some kind of platform but no

brains like it's going they're going to become so noisy that you're just going to ignore them

and then and then what will become kind of the new system will be that we all need to take a

step back and go back to trusted sources and that's kind of all that's what i'm hoping that's

going i i like i like that well you know i have i've my robots.txt is updated for all the crawlers

so i'm sure they're going to abide by it and so i'm not worried about my stuff being pilfered

yeah yeah of course yeah side joke to people i've got an update there's like 10 on there but

that reminds me of uh that i always ask people kind of like what's the worst

security advice you had and there was a guy that was that was providing um providing feedback

on a on a penetration test and he explained how this person's organization their infrastructure

was vulnerable and the person's response to that would be yes but that's illegal so no one would do

that nice it's like it's like what are you doing people don't abide by laws everything would just

be lawless and it's like well welcome to the internet okay good so we're right we're coming

up on time a bit mackenzie i had i just wanted to ask you one more question so if you if we would

say first thing move all your secrets into envers and you know then use a secret stand or something

like get guardian and make sure that you catch anything you do commit is there a third kind of

obvious thing that we should be doing that you would say um yeah definitely so uh we need to add

additional layers of security upon things so you know there's a concept called zero trust

which is talking about you know like just because you have an api key or you have a password doesn't

mean that you need to trust explicitly that person so there's something that you know something that

you have and something that you are and we should be implementing those rules across security as

well so um some some simple things that you could do you know if if you're listening to this go like

okay that all sounds nice but what can i actually do so come up with ways to rotate your keys

regularly the reality is no matter what you do your keys will leak you know the best companies

in the world their keys will leak hashicorp they created a product called vault it's one of the

best secrets managers you know they had a breach because they had secrets leak into their git

repositories so it happens to everyone so what you can do is you can implement a rotation policy

that rotates keys regularly so that's one thing so that that doesn't happen the advantage of that

is that you actually know where your keys are or you know how to rotate them because like then if

you have a leak it's kind of like oh what the heck does this key do am i going to break production

if i revoke this like if you're regularly rotating things and you know what they are

the other thing is like stop stop producing admin keys like if you need a credential don't give it

admin credential i had so many stories of when you know a read-only credential would have been

totally sufficient but you know that requires creating another key but if you've got a sequence

manager then that then the problem kind of goes away so you know make sure that keys has like

limitations of what they can actually do to to suit you know to suit the job and then whitelist

your services if you have a system that's only meant to talk to this other system right then

you can set it up so that they can only talk now if i find the credential between them that allows

me to talk to it then i'm still coming i'm coming from a different area i have a different ip i'm

from a different system so these are things that you can kind of actively do to add layers on top

because you just got to remember that there's no bullet you know everything about security the

famous words of this podcast it depends but you can add layers in that create friction and that's

ultimately kind of like the best way to to handle this is to create enough friction um you know that

that we're adding so many different layers to to an attacker yeah they go attack someone who's

easier yeah that's like that's like one of those yeah startup sayings like there's no silver

bullet there's just a lot of lead bullets yeah yeah i live in the netherlands and there's a

saying here like bikes everyone has like at least two bikes each especially in my town and one of

the things is like no one has any nice bikes because you always want to make sure that your

bike is shittier than the bike that's next to you because then the person's going to because we're

terrible at locking up bikes like yeah don't outrun the bear just be next to someone yeah slower

that's that's like yeah isn't there some story in in like amsterdam after like the nazis like fled

that there was some like a quarter of the bikes or like in the can't the can't the the waterways

or there was some stat i did a bike tour and he was giving me these crazy numbers speaking to the

fact that like you know i guess for a century now everyone bikes everywhere but everyone like stuff

gets stolen and there were some crazy statistics about when the nazis left and the number of bikes

that were trashed or something yeah i think that's i don't know that i think that still

happens because every now and again you'll see the canals a crane will come into the canal with

these big claws and they'll just scoop up all the bikes that are in there because it just happened

but you know bikes are bikes are pretty easy to come by and and it's an unwritten rule that if

you know if it's past 2 a.m and your bike's been stolen it's socially acceptable to steal someone

else's bike to get home so you just got to make the last one from the club as well right

that's good that's good good to remember well i think we should wrap up there that's perfect

i don't know how we got onto the subject well it was it was the backwards analogy from making sure

your site's slightly more secure than other sites you know if you've got it you've got it

your bike's slightly worse than the neighbor's bike it's the opposite the mirror of that

great mickey so anything else that you'd like to call out while you've got my that we haven't

mentioned that you've mentioned your podcast um you mentioned get guardian um compango you've

mentioned we'll put all of those in the show notes anything else that you think oh yeah um

no i mean i feel like i feel like that's i mean that's kind of it i think that's enough enough uh

enough plugs for for one episode but anyone that wants to follow me um you can follow me anywhere

on social media at the handle at advocate mac um i'm even on threads i've never posted but you know

just in case twitter does explode and then i might do so make sure you follow me but uh

yeah and then i said i mean if you want to take everyone to listen to me geek out about security

stuff that's where that's where you can find me okay brilliant well thanks coming on really good

really enjoyed it i appreciate the invite thanks and great to great to great to meet you will and

great to see you again carlton all right well thanks everyone we are at chatjango.com and

we'll see you everyone next time bye-bye see you next time bye-bye