Transcript: Auphonic - Georg Holzmann

00:00.0

Hello and welcome to another episode of Django Chat. This week we're joined by Georg Holzman,

00:10.5

the creator of Ophonic, which handles audio processing and uses Django. We'll dive into

00:15.0

all those things, how to scale it, and it should be a great discussion. So Carlton,

00:18.8

how are you doing? I'm very well, Will. Thank you very much. How are you, Georg?

00:23.3

Yeah, thanks. I'm fine. And thanks for the invitation. I'm curious what to speak about

00:28.1

here. Oh, I think there's too many things to speak about. I guess let's start with how did

00:32.9

you get into programming? And there's so much we can speak about the audio part, but also around

00:37.0

the Django part. So maybe what's your, you know, what's your what's your background? How did you

00:40.8

get into programming? Yeah, so well, I get into programming already quite early. So I think I was

00:48.5

in primary school and my dad always had a old computer and I always managed to destroy the

00:55.5

computer and play around with it and at some point yeah i started programming i don't know why

01:02.1

actually i think some some friend was was showing it to me and the first programming language was

01:08.9

turbo pascal and then i learned c++ and c and all these more low level things and yeah and into

01:17.8

python i came actually quite late it was at the university okay did you study computer science

01:24.7

there then not really i actually started audio engineering okay but this was a lot it was not

01:33.7

really only audio it was more like electrical engineering and computer science and a lot of

01:39.8

signal processing machine learning and all these things and so basically it was computer science

01:45.8

yes okay so you're doing like high low filters and you know all those fancy things there yeah

01:52.5

exactly okay and this is also why i got into python at some point because uh we were using

01:59.2

for for a project numpy and scipy and all these things when they were at the early stage and yeah

02:05.9

so so i came into python and i loved the language a lot and so i got into programming okay and um

02:14.6

this led straight on to old phonic then which we should say is your so tell us about alphonic what

02:19.1

is a phonic yeah a phonic a phonic was was built a little bit later then and yeah this is basically

02:26.4

also about audio processing and some other things so it started as a web service now there's there

02:33.6

are some desktop apps but you can basically upload an audio file and then this file is analyzed on

02:39.7

our servers and then according to the analysis it is processed and encoded into different formats

02:46.9

and then distribute it to other services and things like that.

02:51.3

Right. But this sounds, I mean, this is using exactly the techniques

02:55.1

that you studied at school, right?

02:58.7

The signal processing, the cleaning up.

03:00.8

Yes, exactly.

03:01.5

The sound qualities. Yeah, okay.

03:03.4

And so I guess the question is, is most of that done in Python

03:06.8

or is that done in, you know, other languages?

03:09.6

Yeah, the signal processing and machine learning things

03:12.3

are mainly built in Python.

03:15.2

So we use a lot of NumPy and machine learning libraries, different machine learning libraries.

03:22.0

And, yeah, also some parts are also written in C, which are optimized a little bit.

03:28.3

And for that, we mostly use Cyton, if you know this.

03:33.0

Yeah, yeah, yeah.

03:33.5

Which is basically, I guess everyone on your podcast knows this, but it's basically a compiler from Python to C.

03:41.3

And yeah, then we also had to build a web system around it.

03:47.0

And I think this was 2009 or 2010.

03:51.1

And back then, there was basically everyone was using Ruby on Rails.

03:56.1

And this was very popular.

03:58.0

And I was thinking, yeah, I don't want to get in a whole nother software stack.

04:04.6

And I was looking at what's going on in the Python world as well.

04:08.2

And found Django.

04:09.9

And yeah, since then, I'm a Django user.

04:13.2

Okay, fantastic.

04:14.1

So you've been through the long haul.

04:15.7

You've seen the introduction of, I don't know,

04:19.9

let's say migrations and the big changes around 1.5 and then 1.7.

04:24.5

I started with South.

04:27.0

This was the framework before the migrations were introduced.

04:30.6

Right.

04:30.8

And yeah, but my main focus was never mainly on the web tools.

04:39.1

It was more on the signal processing and machine learning part

04:42.2

and the web tools.

04:44.5

I just discovered them next to the other things.

04:48.7

It was never my main focus.

04:50.9

Yeah, okay.

04:51.6

I just had to learn it because I needed them.

04:56.3

And so, go on, Will.

04:58.7

No, you go ahead, Carlton.

04:59.4

Well, I was going to say, so the main uses for Django here

05:04.6

is to build the API around the data processing pipelines that you have.

05:09.0

Yes, so there are various things in our system.

05:12.7

So, of course, the first thing is the web GUI.

05:16.4

So how you see the website or how you can create these productions, these audio productions.

05:23.3

So this is the client side, front end.

05:26.3

Then there is the server side.

05:30.0

So after you upload an audio file, you have to process and analyze it, which takes quite some processing power.

05:36.8

So we need to have a distributed system where we can process all the audio files.

05:43.8

So at DasQ, for that, we use this DJ Celery and RabbitMQ.

05:51.0

Okay, and how have you found working with Celery and scaling RabbitMQ and things like that over the years?

05:56.3

Because, I mean, you've been obviously at this for quite a while now, and so you'll have seen it evolve.

06:02.6

You must have some war stories there.

06:05.1

Oh, yeah. I'd just like to add, too, I found Ophonic. I've never had an issue with it.

06:10.1

I mean, just last night I was having an issue with our podcast host, which seems far simpler than what you're doing.

06:16.2

So however you're doing it, it's very speedy and I've never had an issue, which is unusual dealing with audio for me online.

06:24.9

Okay, that's great. Well, there were, of course, issues in the past, but not for you.

06:34.3

Nope.

06:35.1

Yeah, so, yeah, for us, RabbitMQ and Celery worked quite well.

06:41.5

I mean, there were some issues at some point.

06:45.1

Also, at some migrations, there were some problems, but no major things.

06:51.1

So I cannot tell anything bad about it.

06:55.7

No, but it's not so much bad.

06:57.8

It's more like as you scale out these things.

07:01.8

Well, Carlton has feelings on Celery.

07:03.8

no and on rabbit on amq you know i've um i've had scars of trying to make that work how i've wanted

07:11.5

to and um yes now i'm a bit for quite a few a while now i've been firmly in the redis camp and

07:17.8

using that and working with that well and finding that's very scalable but um you know rabbit mq is

07:24.7

still there and still has lots of adherence so i'm you know just wondering you know the reason i ask

07:30.1

is to try and draw out your your experience which sounds very positive yeah so our setup is maybe

07:35.6

a bit special because we have only very long running tasks right because because if you

07:42.1

process one audi file so this usually takes it depends on your file but it takes quite long

07:47.7

it's not yeah there are not thousands of small tasks there's just few long running tasks so

07:53.9

So yeah, I think for our use case, it does not matter much if you use Revit MQ or something else.

08:00.9

So it's basically the same, I guess.

08:03.8

Yeah, okay, interesting.

08:04.6

Do you think that with your setup, it seems like if there was ever a case for potentially serverless, it would be what you have, right?

08:09.9

Because I imagine you have spikes of activity and then inactivity.

08:13.6

I wonder, what does your traffic look like?

08:16.2

Do you have those spikes or is it more kind of level?

08:19.0

How does it look?

08:19.6

Because I know you're global.

08:20.5

yes we have these spikes but um they are not that big i would say so it's the average is

08:27.9

so the the standard deviation is not so big i would say because yeah you know in europe they

08:33.9

are processing now maybe and then in the us it's it's a little bit later so it's it's not such a

08:40.9

problem but of course the the infrastructure so maybe let's speak a bit about the infrastructure

08:47.1

yeah yeah so we we do not use uh amazon aws or things like that so we have um root servers which

08:57.2

are rented because they are much cheaper for our use case because we need a lot of uh a memory and

09:04.2

a lot of processing power yeah therefore of course a lot many root servers don't do much

09:09.7

most of the time but then when we have spikes in processing we need all of them so of course

09:16.9

these these spikes are important and determine how many servers we have to use but in the end

09:23.0

it's still i would say it's still much cheaper compared to to aws or other services and so i

09:30.2

are you managing the actual hardware yourself there or is that hardware managed because this

09:34.5

is one of the sort of selling points that amazon tell us or you know it's not just amazon it's all

09:38.9

these providers they sort of say look that you're not you're not having to go to the data center to

09:43.7

change the hard disk to you know these kind of activities that you had to do back when you were

09:48.4

renting space in the colo yeah no we don't manage the hardware so the hardware is managed but

09:54.2

we have to manage the whole software so from the operating system up to all the stacks you have to

10:00.7

install but the hardware is managed so i mean maybe you know this hostel it's heads now from

10:06.2

germany they are yeah they are very big hostel in germany and very cheap for for their for their

10:12.0

route service right okay can you give us a sense of how big is the ophonic team and kind of the

10:17.9

sort of traffic you're dealing with now versus i guess 10 years ago or what eight nine years ago

10:22.9

when you when you started because you know to me you're just a service that i've seen recommended

10:27.6

and that works um so i'd love to hear more about yeah it's the company if there are other people

10:32.6

involved and as much as you feel like you can say about that yes sure so uh yes we are about five

10:40.1

people okay and uh i mean you you mean uh the the amount of audio we process sure just how's how has

10:48.6

traffic um evolved okay how is that how has that changed over time i mean are you this has been

10:54.3

linear have you spiking uh well that's interesting uh it's not linear i mean it is a little bit

11:01.7

linear as well of course but then there are always spikes if some other user groups find out about

11:08.1

this service and then they bring in a lot of more people, then there is a spike.

11:13.9

And, but then it just was linear.

11:16.7

So, so it's.

11:18.0

it spikes and linear and yeah it will sorry i was going to ask because i know that um

11:24.0

so when we when we started using this podcast we started using a service called zencaster

11:30.2

which i found out uses you so i'm curious how is there what's sort of the mix of services that

11:37.0

use ophonic versus people like us who do it directly do you have a sense of what that

11:41.5

that mixes kind of companies versus individuals using it directly okay yeah this is maybe half

11:47.5

half so we have an api where for the other listeners we have an api where other services

11:54.0

can integrate our services and algorithms so that the users of the third-party services don't have

12:01.2

to know that there is a phonic inside like you did with zancaster for example right and so there

12:06.4

are some companies which are integrating our services into their systems and yeah then there

12:12.1

is our website and direct interface

12:15.3

like you are using now, I guess.

12:17.7

And so, yeah, the share is half-half, I would say.

12:20.8

What's your sense of how people find you?

12:22.3

Because I think the only reason I found you

12:25.1

is because I think I saw Tim Ferriss mention

12:27.6

that he used you directly.

12:28.9

And I thought, oh, that sounds like

12:30.2

what we're using with Zencastr.

12:32.6

And then Carlton and I had a number of issues

12:34.7

with doing it web-based.

12:36.0

And so it was really kind of circumstantial

12:38.1

that I even found Ophonic itself.

12:40.4

is there a standard path for how clients kind of find you no i i don't know so there's basically

12:46.5

we don't make any marketing or things like that so usually it's it's word of mouth especially in

12:53.2

the bottom it's kind of like the classic if you build it they'll come you built it and they came

12:57.2

yeah so so at the beginning so at the beginning we had um so all all started with a podcaster

13:05.3

from germany because he i was listening to his podcast and he always told yeah everything is

13:11.1

so complicated all the audio things and generating all the file formats and then i thought well

13:17.4

that's actually quite easy so let's build something and it started with some small scripts

13:22.3

and this is really one of the famous podcasters here in germany so after we released it a lot of

13:29.0

people used us already right okay so you got you had a kind of critical mass to begin yeah this was

13:34.3

and and we got a lot of good feedback of course and lots of training data and test data and yeah

13:40.3

and then we got a grant from the government here in austria to so we got some money basically and

13:47.6

could build a prototype or continue to build the system and with this money i could hire someone

13:56.3

who helped to build the system and then actually another one as well and then after after this this

14:02.5

grant already some some other podcasters got to know our system i don't know why but somehow it

14:09.4

spread over the ocean to the us and there there were other people which were playing around with

14:16.6

it and yeah afterwards after this grant we had to to introduce a pricing model so in the beginning

14:22.1

everything was free for everyone right and then we had to to introduce some pricing because

14:29.1

otherwise we could not live from it anymore yeah because you initially you were funded by the

14:34.1

government grant yes exactly right okay fine and that but that seems like that maybe lets you

14:39.1

take a i mean you seem to have a long-term approach to this whereas i mean i used to be

14:43.2

out in san francisco and if you take venture capital money you pretty much within 18 months

14:48.5

and really 12 months you have some very high goals you need to hit you're not allowed to

14:53.6

um take your time yeah really it sounds like that's that's fantastic that you you had that

14:58.7

ability instead of being pressured to introduce things faster before they were ready yeah sure

15:03.7

and um well the the nice thing we we had in our situation is that we did not have to do any fancy

15:13.8

marketing or other kind of strategies how to get more users i mean that's that's good to do if

15:20.6

you're interested in these things but we were all engineers based kelly and we did not know anyone

15:26.0

who can do this these other things so we just try to to avoid them well this is marketing now i've

15:35.3

seen you've been on a couple of podcasts um so that that's marketing of course um that's yeah

15:41.6

that's amazing well so my a question i had is are there um when you create this were there any

15:48.1

competing services that do that and kind of what do you see in terms of competition because it

15:52.1

seems i'm not aware of any but it seems like such a now that we have it as part of our post-production

15:58.4

flow i can't imagine doing our podcast without it there are no direct competitors which which

16:05.4

build the same because we are of course very very specialized into this podcasting use case or

16:13.1

also other spoken word recordings i would say like conference recordings and lectures and things like

16:19.8

that i mean there are of course other audio software companies like isotope which which

16:27.1

more built their tools for for editing so so basically there is no such automated way like

16:35.1

like we do it so that's that's why we try to focus on this automation and workflow aspect

16:41.0

and and not to not try to build an editor which would be of course also very useful

16:47.6

but it's of course a lot of work if you want to build it right and and then you get in the

16:54.4

direct competition to audition or isotope or other companies yeah and those tools those tools

17:00.6

are super um super powerful but super complicated and yes you know i consider myself an audio

17:06.7

beginner and there's just no way i can apply the right filters and you know in anything like a time

17:12.1

efficient way like so for me to be able to upload the file and it comes back and the with the

17:18.7

noticeable difference in sound quality you think yeah that's amazing that's just brilliant yeah so

17:22.8

this is yeah of course also disadvantages so in our system basically you get uh this is exactly

17:29.6

the the initial this was exactly the initial goal when we started the system so that users which

17:35.4

don't have a lot of audio knowledge or also users which have a lot really lots of audio which cannot

17:42.4

be handled manually so that they can just use this tool and get out audio which is okay but of course

17:48.7

if you have very specialized use cases and you want to get out every detail then these

17:55.4

this editorial tools are of course much more powerful because you you can really work on

18:02.9

every detail but of course it needs more time and knowledge yeah um but i guess one thing they always

18:08.4

talk about in product development is to focus on your particular niche and yes don't worry about

18:12.6

trying to serve the other niche so you know that you don't have these features is it's a selling

18:17.8

point right yes yeah it's very clear what you do well so maybe we can talk about some of those

18:22.1

features because um there's a lot that are awesome i wonder what would you say what are the main

18:27.2

features that people kind of come to you with i mean i mean for us i mean when i'm doing the audio

18:32.2

files I love that you have the presets just in terms of the web base that's great that you know

18:36.8

because we do the same thing for every podcast so I can just load our presets and then we usually do

18:42.0

you know basically go for almost everything so the compression leveling normalization noise hum

18:47.8

I guess that's a broad question but as you think of the features that you have now on the site what

18:52.9

do you think are the core features and then we can get into some of the more advanced ones because I

18:56.6

know you have some new features you've just launched yes so i would say the most important

19:02.2

feature where everything started is our leveler so what does it do it levels out the audio which

19:12.2

means that if you have multiple speakers like like we are having a conversation now then all

19:18.5

these speakers can have different levels and loudness values and some you have to balance

19:25.6

them so otherwise you would always have to to to use your volume control and adjust for the levels

19:31.8

but this is actually a very complicated task because it is it is easy if you just have speakers

19:38.1

but in audio you also have then bigger parts where there is for example just background noise

19:44.5

then of course you should not amplify this background noise like you would do with the

19:48.5

speakers or then there are also music parts which should be handled quite differently because music

19:54.3

in music you want to have more inner dynamics and if you have speakers you want to sound them

20:01.7

equally loud but music should have some differences of course and therefore you you have to analyze

20:08.1

the audio first and see there are different speakers where music parts there are just

20:13.2

background noises and things like that so basically like an audio engineer would do it and then

20:20.0

you have to balance these different parts and use compressors and limiters and things like that so

20:26.5

you said you're using machine learning for a lot of this so i when you're talking about the different

20:30.7

parts i'm imagining you know my machine learning thinking and think okay so are you using are you

20:36.0

tagging particular parts as this this kind of this is speech this is music this is you know some other

20:42.3

category and then you'll apply a different filter or set of filters depending on what gets tagged by

20:46.9

the machine learning algorithm yes exactly so we we analyze the audio and and classify various

20:54.9

things like different speakers music bands and different noise parts or if the background noise

21:01.9

changes for example this yes this is another algorithm we we have a noise it's called noise

21:08.0

reduction so basically this algorithm first analyzes the audio and sees where are different

21:14.2

background noise scenarios for example if you record in a room and then we go outside

21:21.7

then there is another background noise scenario outside so we have to segment the audio first in

21:27.9

these parts and then do noise reduction in the first part and then the second one right and

21:32.5

there's no there's no tooling to do that in something like audacity you'd have to you'd

21:36.6

have to do it by hand you'd have to sort of manually identify the the segments yes in in

21:41.8

most of the editors you do this by hand and basically we we always try to automate these

21:49.4

steps you have to do by hand we try to automate these these things by machine learning and then

21:55.7

just apply the algorithms like you would do it in an audio editor for example okay okay fantastic

22:01.1

wow and then you also have i mean there's um chapter marks which i believe is that's that's

22:07.7

something you had before the sort of Apple Podcasts and stuff had as a feature.

22:12.8

Is that right?

22:13.6

Yes, I think in Apple Podcasts, it's, no, that's not really true.

22:17.9

So this is a very complicated topic because there is so much confusion about it.

22:24.4

So what this chapter marks, a chapter mark is basically just a timestamp and a title.

22:29.7

So you say at this time, this chapter starts like video does, it's quite long already.

22:36.1

and the problem is with audio that there are many file formats like mp3 mp4 or opus or whatever

22:45.4

and usually these this chapter marks were only defined for video so in the mp4 container

22:51.3

everything was defined quite well it is still very complicated but at least it was defined

22:55.8

and usually at the beginning apple was mostly using mp4 audio so aac audio in an mp4 container

23:05.8

And the Apple podcast app always, or since very long, supported chapter marks in MP4 files.

23:13.7

But most podcasters use MP3 files, so they did not recognize chapters in MP3 files.

23:20.0

And there was a very old specification for MP3, so actually for ID3, which is the metadata standard for MP3,

23:28.6

how to put chapter marks also in MP3 files, but nobody used this specification.

23:33.9

and yeah then we just implemented the specification basically and then more and more podcatchers added

23:41.9

support for it and i think now since last year also the apple podcast app supports mp3 chapters

23:48.8

as well right good because if if enough files they're using it then they'll have to support it

23:54.3

yes it seems so yeah fine yeah well speaking of mobile apps i mean you have i'm so impressed by

24:00.6

I keep finding new features that you have.

24:02.4

I mean, you have mobile apps as well, right?

24:04.6

For Auphonic, for Android and iOS.

24:07.6

And so what's the, how recently was that?

24:10.3

How did that come about, right?

24:11.3

Because that's another thing to build

24:12.4

and maintain on top of everything else.

24:15.1

Yes, well, the thing, basically we started the mobile app

24:20.1

because a friend of the first developer at Auphonic

24:24.0

wanted to do a project with us

24:26.4

and he's a web and mobile developer

24:29.5

and yeah now he works at facebook so he's quite quite good in these things so he he said yes he

24:36.6

wants to build a ios app for us and the problem in ios was was long that you cannot upload audio

24:44.2

files in in a web site i think they changed it now but for a long time you could not select

24:51.5

audio files and upload them to a phonic for example so the so the idea was uh to build just

24:57.7

simple recorder and then use our API in the mobile app so that you can basically export files from

25:05.7

your phone to to aphonic and yeah this this was the start basically and then we also did an android

25:13.6

version and on android the situation was even more complicated because there is no usable audio

25:22.8

editor on android and then we thought well we could also build a little audio editor right

25:28.3

brilliant and yes how does that go with device compatibility because that's the great that's

25:34.3

the great challenge on android right is it works on your samsung that you've got in the office but

25:38.8

not on you know yeah device out there in the street yeah but actually there are not so many

25:45.6

problems but of course that's that's that's more difficult okay interesting and so you talk about

25:50.7

your api there and i i guess that's built with django and django rest framework yes okay yeah

25:57.4

so if you have problems talk to carlton ah you are developing this yeah he's the co-maintainer

26:02.9

okay very nice so thanks a lot i don't do too much but um if you um so you've been using that

26:12.5

from i mean rest framework for well it wasn't quite around in 2010 but like 2012 2013 i think

26:20.6

yeah i would i have to look when we released the exact date right when you think of all these

26:27.2

technologies that you're juggling is the i assume the the web piece just sort of follows the

26:32.4

the audio part or how much time do you spend on just scaling up the the web part because that's

26:37.9

sort of the front door for everything um i'm just curious of you know where you where you spend your

26:42.5

time now given that you're a scale and you're still interested we can talk about i know you

26:46.3

just introduced um a new a new leveling algorithm yeah i mean this is always different so sometimes

26:53.8

we work more on the web part or other parts and sometimes more on the algorithms but yeah i mean

27:00.8

scaling scaling was not so much of an issue i would say so well basically we always build some

27:07.8

we always have to fix some things if if you see uh it will get very hot but but yeah actually i

27:16.2

mean you know if the traffic goes up we just rent some more servers so basically that's not so

27:22.3

complicated right because in our case i mean the website itself does not need much scaling because

27:31.4

we don't have hundred thousands of users every minute of course yeah so i'm imagining you could

27:37.5

run the website on you know a medium-sized server and it would chug away quite happily

27:41.5

and give you lots of spare capacity.

27:43.6

And then the need to scale is the back-end processing units.

27:47.7

Exactly.

27:48.1

So the database or all the web front-end and things,

27:53.4

they are just on one big root server.

27:57.2

And then we have various other servers for the audio processing

28:01.0

and for the long-running tasks.

28:04.5

Okay.

28:05.1

Processed audio files, I believe, is it 30 days that you store it

28:08.3

and then it goes away?

28:10.0

Is that correct now?

28:10.5

Yes, 21 days.

28:12.0

I assume that came about because you looked around

28:15.6

and said, oh my God, we have all this data.

28:17.9

Or did you have that from the beginning,

28:19.4

sort of a limit on how long you would host processed audio?

28:22.7

I think we had this quite early

28:24.9

because otherwise you would need a lot of storage.

28:30.3

Yeah.

28:31.9

And yet, there are, of course, also data protection reasons.

28:38.0

Can I ask there, do you store on disks, physical disks on your rented servers, or are you using cloud storage?

28:45.9

No, we store on our root servers, on the disks.

28:48.7

Right, okay.

28:49.6

So it really is all hardware in the data center.

28:54.5

Yes, exactly.

28:55.5

Fantastic.

28:56.3

I always like to ask this question.

28:57.6

So if you could just wave a magic wand and add a new feature to Ophonic, what would it be?

29:04.1

Oh.

29:05.2

Because I assume you have, I mean, obviously, I don't have a sense of what is truly challenging or what you think your customers are demanding.

29:11.7

I mean, but yeah, I'm curious where you see that need or, you know, if you had all this time, you would spend it.

29:18.7

Well, that's difficult to say.

29:21.2

Our future feature list is very long.

29:24.4

Okay, yeah, sure.

29:25.8

Or maybe there's something that you don't even know how to tackle it, but it's sort of an unresolved problem in processing audio.

29:33.0

Because, again, Carlton and I don't know that space at all.

29:36.0

Yeah, okay, into this direction.

29:38.1

What would be really cool is if you have, let's say,

29:41.1

you have very bad audio like Ampli3 with 32 kilobits,

29:48.4

so everything is compressed already completely

29:52.1

and you can't hear anything about it anymore.

29:57.0

And then one could build an algorithm

29:59.6

which makes the audio as the original again.

30:03.0

right okay this would be great yeah that really is a magic wand but yeah at some point you you

30:13.1

will lose information and then it's difficult to restore the information again but there are um

30:19.3

there are um you do see these kind of ai or ml applications where they kind of guess

30:26.1

what the missing data is and interpret like that and they you know sometimes come up with good

30:31.0

results yes so there are of course also people which are trying exactly that with

30:37.0

badly encoded audio but yeah this is of course not possible for every situation

30:43.0

unfortunately one of the features among the existing features you have that actually we're

30:48.4

not using which people have asked for is you you link in with speech recognition right where

30:53.6

someone can link up a third-party transcription so this this service that does use so a perfect

30:59.6

speech recognition system yeah what and was that something that you had again from the beginning

31:08.4

or something that users asked for um that integration because it looks it looks really

31:14.4

nice because that is another step in the tool chain of producing a podcast where do the leveling

31:19.1

and then transcription um i mean it's something we should we should add yes this this was not from

31:24.7

the beginning so but this was always very interesting for us because especially for

31:31.7

podcasting you know podcasting are not search podcasts are not search able so if you would

31:36.8

have a transcript then you can search within the podcast so that's of course perfect but

31:43.9

yeah some years ago there were no services which produced a reasonable output for a reasonable

31:51.1

price well said yes it was actually i think two years ago or three years ago when

31:58.9

the first api was the no it was not the first one but one one of the first ones

32:05.3

was the the google cloud speech api which which had an acceptable quality and also a reasonable

32:13.0

price and yeah now there are some there are various others other apis as well which can do that

32:19.3

So and back to the beginning. So actually, we wanted to build our own speech recognition system. But then we thought we cannot do everything. Because it's really very time consuming, especially if you want to build that for multiple languages. So you have to have specialists for every language and a lot of data for every language.

32:43.4

That's why we decided to integrate various third-party services for speech recognition.

32:50.4

So we still do our own pre-processing and slice the audio into small parts,

32:56.9

so cut out the music parts and take out only the speakers,

33:00.8

and then send these slices to the external services, to the speech recognition services,

33:05.5

and then combine it again so that you have the time codes in the transcript

33:09.3

and so that you know at which time who is speaking what.

33:14.6

We should be using this, Will.

33:15.8

Why aren't we using this?

33:16.9

This sounds amazing.

33:18.2

Use the royal we, Carlton.

33:19.7

You're welcome to hop on that.

33:23.7

Yeah, I think it's so brilliant

33:25.6

how you've stayed kind of within your lane

33:28.1

yet still expanding features

33:29.8

because, I mean, podcasts in general

33:32.7

are going through quite a bit of consolidation right now.

33:35.6

But it seems like what you're doing is so much better

33:38.4

and so hard to do well, that it's not really as maybe appealing for Spotify or someone

33:44.7

to say, oh, we'll just do Ophonic. Or perhaps maybe, or is that an existential threat? Do

33:49.8

you think down the line that one of these places will say, we'll just scoop that part

33:54.1

in because again unless you really know what you're doing i don't it seems very difficult to

33:58.1

have a high quality podcast without using a tool like ophonic well that's not true if you're an

34:03.7

auto engineer you can just do it yourself of course yeah sure sure sure but most but that

34:09.0

you know that's what one percent if that you know everyone else who's um you know because yeah yeah

34:15.9

thanks but uh to your question um yeah i think our our service is just very specialized and

34:22.9

there is just a lot of time already put into it how to how to really optimize it for this use case

34:31.7

i mean of course someone else can build similar things but it's it's of course a lot of work and

34:37.8

you would need i think quite a bigger team than than we are if you don't build it for yourself

34:44.2

and sure and yeah i don't think that there is so much money in it that you will throw a

34:51.4

20 people team on it but i don't know right so it's a nice niche and that if you as long as you

34:57.6

stay the right size and don't don't scale up too big yeah maybe you can you you can survive in that

35:03.5

space can you give us a sense of how many what metric do you use for for size is it um the size

35:09.8

the files processed is it you know hours what when you internally look at your metrics what are the

35:15.9

how do you manage growth because i think that there's a number of different ways to potentially

35:20.3

measure that you usually usually we take i mean also for our users the the important thing is the

35:27.9

the length of of the audio file so in our system you you have you need so two hours of audio is

35:35.4

free for everyone in our system and uh if you process more audio then you can buy additional

35:41.9

credits and the credits are in in hours of audio so if you have one hour of credits you can process

35:48.6

one hour of audio file so therefore therefore of course the most important measure i would say is

35:53.5

the hours of audio we process maybe well because i guess yeah because audio files can be one hour

36:00.7

can be i mean yeah very different in size was was that uh yes so that seems smart from a marketing

36:07.9

perspective but i as you know for someone like you who really knows the audio i mean the costs

36:12.2

could be wildly different for one hour of um you know one file versus another so yes the size of

36:18.4

the audio file itself does not say much because if you have an mp3 file compared to an uh wave file

36:25.2

with a very big bit size,

36:29.5

then the MP3 file is very small, of course.

36:32.7

But then if you process the file,

36:34.7

you have to decode the file again to get the raw data.

36:38.0

And then it doesn't matter if it was an MP3 or WAV file

36:41.6

because you need the same processing amount

36:45.0

or processing power for it.

36:46.4

Oh, interesting.

36:47.2

So I'm curious, because you went to university

36:50.2

for audio engineering,

36:52.0

what did you think you were going to be doing

36:53.9

at this point in your career

36:54.8

And kind of what are your friends from university doing?

36:57.0

Because I'm sure they're not all doing startups.

36:59.4

No, it's very different.

37:01.3

So, well, actually, after university, I started, so I did my master's studies here in Austria.

37:09.1

And then I started a PhD in Germany, in Berlin, also about machine learning and audio.

37:16.5

It was, it is called, this field is called music information retrieval.

37:19.9

so where you try to extract information out of audio with machine learning techniques but yeah

37:25.7

then i did not finish this phd because basically i was the only one there which did these things

37:32.8

and then i started the job at the web company and get to know all these web things also django

37:39.6

so they also use django there and yeah then i thought okay i can also build something for

37:45.7

myself and i also get to know this podcaster which always had this problem with audio processing and

37:51.8

encoding and i just tried to build these these things myself yeah no i mean it sounds like the

37:58.8

perfect pipeline you know you've you've done the signal processing you've done the machine learning

38:02.5

you've done you you're into the audio stuff you've done the web programming it's like you've got all

38:07.3

the parts together and you've just created this awesome business out of it that's exactly i still

38:11.9

had no idea about doing business but but yeah that's okay well yeah that's not the lesson

38:18.1

so did you go to a web place that was doing audio stuff or did you just separately um go to a web

38:26.1

place and then later said oh i can combine these these two no it was not about audio it was

38:31.0

something completely else but at that time i was interested in in web things because i did never do

38:39.4

to big web projects so i just wanted to get into this topic and yeah for that it was quite well

38:46.7

yeah well it's still the case in the certainly in the u.s that you can't it's very difficult to

38:53.0

learn web uh in school uh there's sure some some efforts on it but basically it has to be hands-on

39:00.5

and even something like django is largely not taught at all and if it is taught it's by an

39:06.4

adjunct and it's sort of an elective so that's hopefully that will change but it's it's hard to

39:11.0

get a backbone in web technologies even though the majority of uh undergraduate computer science

39:17.2

graduates probably go work on something web related so there's definitely a uh yeah educational

39:23.1

mismatch there right in the end you just have to do it because all the resources are out in the web

39:28.0

of course and you just have to do it and and learn it by yourself yes spoken like an engineer

39:35.4

So what are you working on right now?

39:36.8

Yeah, so as I said in the beginning, so Afonik started as a web service with this Django

39:44.4

part and et cetera.

39:46.5

But we also have a desktop application with our algorithms.

39:51.0

So they don't include all the features of the web service.

39:55.1

But the advantage here is, of course, that you don't have to do the processing in the

40:00.6

cloud and you don't have to upload files.

40:03.5

So the processing is just done on your computer, which needs, of course, processing, but you don't have to upload, download.

40:10.3

And, yeah, we also have a different pricing model for this desktop app.

40:14.7

So it's just a one time purchase.

40:16.5

There is in the web service, you have to pay credits.

40:19.3

So based on how much you process.

40:21.7

But, yeah, what I wanted to tell is that we are working on a new version of the desktop software at the moment.

40:28.7

because it was very difficult to get all these

40:31.9

Python and machine learning tools to desktop computers.

40:36.2

And especially difficult was the GUI part

40:39.7

because we used in the desktop apps,

40:43.8

we used this WX widgets framework with their Python bindings.

40:49.3

And it was so frustrating and so difficult.

40:52.3

And now we did rewrite everything with web-based tools.

40:56.4

So we use now Electron and...

40:59.0

Okay, so it's Electron.

41:00.2

...combined with Python.

41:01.4

But yeah, Electron is still a very big framework

41:05.5

and not so nice to handle,

41:07.9

but at least now everything is HTML-based,

41:11.3

the whole user interface,

41:13.3

and the backend is all done in Python.

41:15.0

And so as a long-term goal,

41:18.3

we could maybe use the same interface

41:20.4

for the web and for the desktop version

41:22.4

and just reuse all the components

41:24.5

and use python as the processing engine and are you bundling python in the application yes

41:30.5

so you're using like the pyb project there or something similar no we are using well there

41:38.9

are various ways how you can do that first we are compiling everything to c with titan and and all

41:44.5

these tools and we are using this tool is called by installer which which creates such a bundle

41:50.8

with Python and all the dependencies you have.

41:55.3

And then this is bundled into one binary

41:56.9

and you can afterwards distribute this binary.

42:00.6

Okay, fantastic.

42:01.5

And then because that's Electron-based,

42:04.0

that will be on Windows and Mac and Linux as well?

42:07.3

Yes, exactly.

42:08.7

Okay.

42:09.1

And then the Electron app just also takes this Python binary

42:14.4

and includes it in their framework.

42:17.4

Wow.

42:17.8

Yeah.

42:18.1

That's really cool.

42:19.2

Too many tools.

42:20.5

yeah i mean python doesn't really have a good story for i mean it's good on the command line

42:27.7

it's good for building application it doesn't have this story about desktop integration or

42:32.1

even front-end web integration so yeah well it has it is good for desktop i think but

42:38.4

the missing part is the gui part yeah yeah awesome well well thank you so much for coming on and

42:45.2

and sharing this story, we love using Ophonic. And I actually, what I love in a way too,

42:50.1

is that the only reason I thought that maybe use Django is I saw your, the signup page and

42:54.2

you're using slash accounts and username, email, password. And I thought, Oh, that looks,

42:59.1

that looks like Django. And I just sent you an email. It's like, Oh yeah, it is Django.

43:03.3

So there's all these, there's all these, you know, a big thing for us in this podcast is try to

43:07.2

highlight all the different ways Django is being used in all these different realms, which are

43:12.0

largely kind of hidden because it's you know for you it's a sort of a secondary thing but still a

43:16.7

key part of your you know pipeline of your process yeah and and what i find quite uh important is

43:22.9

that jenga is now already quite old and it still evolves and it's still a good choice if you would

43:29.5

start today i would say that is the perfect end this is of course very important if you are a

43:34.3

small company like we are so you cannot rebuild everything from scratch after five years or yeah

43:40.1

Yeah, no, but also as well, you need to know that Django is going to continue working.

43:43.5

You know, the next release isn't going to totally change the API or, you know.

43:47.9

Exactly.

43:49.1

All the such things.

43:50.0

And those stability guarantees and the deprecation policy are important.

43:53.8

Yeah.

43:54.3

Well, actually, maybe one last one.

43:56.4

If there's something you could change about Django, what would that be?

44:01.8

Or do you have thoughts on maybe how the future async wave, would that impact Ophonic at all?

44:08.3

Or is that not so much?

44:10.1

i don't know i i did not do so much with it uh since now because i mean the thing is our

44:17.7

processing is done in a different stage so it's done offline basically in the queue so here we

44:25.0

don't would not need this async thing but yeah i think for for user interface improvements it's

44:32.2

of course very very very interesting but i have to to play with around with it a little bit more i

44:38.2

think um so if listeners want to use ophonic should they just go to the the main website or

44:43.0

where where should they be directed yes sure so our website is aphonic.com and yeah just try our

44:50.0

system so two hours per month for free and if you have any questions or feedback we are always very

44:57.1

very happy to to get feedback and also error reports and whatever great okay well thank you

45:04.7

so much for coming on and giving us the time for your time really interesting georg thank you bye

45:10.2

bye thank you ciao