transcript-site/content/post/Latent Space/Latent-Space-Truly-Serverless-Infra-for-AI-Engineers---with-Erik-Bernhardsson-of-Modal.lrc
2024-05-22 14:52:44 +08:00

1927 lines
98 KiB
Plaintext

[by:whisper.cpp]
[00:00.00] (upbeat music)
[00:02.58] - Hey everyone, welcome to the Late in Space podcast.
[00:07.96] This is Alessio, partner in C2 and residents
[00:10.40] and decibel partners, and I'm joined by my co-host,
[00:12.66] Swix, founder of Small AI.
[00:14.60] - Hey, and today we have in the studio
[00:16.56] Eric Bernhardsen for Modal, welcome.
[00:18.56] - Hi, it's awesome being here.
[00:20.64] - Yeah, awesome seeing you in person.
[00:22.40] I've seen you online for a number of years
[00:24.96] as you were building on Modal.
[00:26.28] And I think you're just making a San Francisco trip
[00:29.16] just to see people here, right?
[00:31.04] I've been to two Modal events in San Francisco here.
[00:33.48] - Yeah, that's right, we're based in New York,
[00:34.84] so I figured sometimes I have to come out to
[00:37.72] capital of AI and make a presence.
[00:40.20] - What do you think is the pros and cons
[00:41.94] of building in New York?
[00:43.44] - I mean, I never built anything elsewhere.
[00:45.32] Like I lived in New York the last 12 years.
[00:47.60] I love the city, obviously there's a lot more stuff
[00:49.80] going on here and there's a lot more customers
[00:51.16] and that's why I'm out here.
[00:52.52] I do feel like for me where I'm in life,
[00:54.32] like I'm a very boring person,
[00:55.84] like I kind of work hard and then I go home
[00:57.76] and hang out with my kids, like I don't have time
[01:00.60] to go to like events and meetups and stuff anyway.
[01:03.28] So in that sense, like New York is kind of nice.
[01:04.92] Like I walk to work every morning,
[01:06.52] it's like five minutes away from my apartment.
[01:07.88] It's like very time efficient in that sense.
[01:09.84] - Yeah, yeah.
[01:11.28] Sounds like a good life.
[01:12.44] So we'll do a brief bio and then we'll talk about
[01:14.76] anything else that people should know about you.
[01:16.64] Actually, I was surprised to find out real from Sweden.
[01:19.28] You went to college in KTH.
[01:21.32] - Yeah, yeah.
[01:22.60] - And your masters was in implementing
[01:24.36] a scalable music recommender system.
[01:26.16] - Yeah. - I had no idea.
[01:27.28] - Yeah, yeah, yeah.
[01:28.12] So I actually started physics,
[01:29.20] but I grew up coding and I did a lot of programming
[01:31.24] competition and then as I was like thinking about,
[01:33.48] you know, graduating, I got in touch with an obscure
[01:36.60] music streaming startup called Spotify,
[01:39.32] which was then like 30 people.
[01:40.72] And for some reason I convinced them like,
[01:42.08] why don't I just come and like write a master's thesis
[01:44.00] with you and like I'll do some cool collaborative
[01:45.44] filtering despite not knowing anything
[01:46.76] about collaborative filtering really, I sort of, you know,
[01:48.36] but no one knew anything back then.
[01:49.76] So I spent six months at Spotify,
[01:51.80] basically building a prototype of a music recommendation
[01:54.60] system and then turned that into master's thesis.
[01:56.80] - Yeah.
[01:57.64] - And then later when I graduated,
[01:58.60] I joined Spotify full time.
[02:00.16] - Yeah, yeah.
[02:01.00] So that was the start of your data career.
[02:02.96] You also wrote a couple of popular open source tooling
[02:06.16] while you were there.
[02:07.32] And then you joined, is that correct or?
[02:09.28] - No, that's right.
[02:10.12] I mean, I was at Spotify for seven years.
[02:11.32] This is a long stand and Spotify was a wild place
[02:13.92] early on.
[02:14.76] I mean, the data space was also wild place.
[02:16.48] I mean, it was like Hadoop cluster in the like
[02:18.56] foosball room on the floor.
[02:20.24] There's a lot of crude, like very basic infrastructure
[02:22.92] and I didn't know anything about it.
[02:24.56] And like I was hired to kind of figure out data stuff.
[02:27.88] And I started hacking on a recommendation system
[02:31.20] and then, you know, got sidetracked
[02:33.12] and a bunch of other stuff.
[02:33.96] I fixed a bunch of reporting things
[02:35.44] and set up A/B testing and started doing like
[02:37.36] business analytics and later got back
[02:38.84] to music recommendation system.
[02:40.04] And a lot of the infrastructure didn't really exist.
[02:42.04] Like there was like Hadoop back then,
[02:43.52] which is kind of bad and I don't miss it,
[02:46.20] but spent a lot of time with that.
[02:48.20] As a part of that, I ended up building a workflow engine
[02:50.76] called Luigi, which is like briefly like somewhat like
[02:53.24] widely ended up being used by a bunch of companies.
[02:56.20] Sort of like, you know, kind of like Airflow,
[02:57.64] but like before Airflow,
[02:59.04] I think it did some things better, some things worse.
[03:01.36] I also built a vector database called annoy,
[03:02.88] which is like for a while, it was actually
[03:04.28] quite widely used in 2012.
[03:06.00] So it's like way before like all this like vector database
[03:08.48] stuff ended up happening.
[03:09.76] And funny enough, I was actually obsessed
[03:11.40] with like vectors back then.
[03:12.52] Like I was like, this is gonna be huge.
[03:13.72] Like just give it like a few years.
[03:15.56] I didn't know it was gonna take like nine years.
[03:17.12] And then it's gonna suddenly be like 20 startups
[03:18.96] doing vector databases in one year.
[03:20.72] So it did happen in that sense.
[03:21.96] I was right.
[03:22.80] I was glad I didn't start a startup
[03:23.76] in the vector database space.
[03:25.32] I would have started way too early.
[03:26.92] But yeah, that was, yeah, this was a fun
[03:29.24] seven years of Spotify, it was a great culture,
[03:31.12] a great company.
[03:31.96] - Yeah, just to take a quick tangent
[03:33.68] on this vector database thing,
[03:34.76] 'cause we probably won't revisit it,
[03:36.20] but like, has anything architecturally changed
[03:38.32] in the last nine years?
[03:39.88] Or...
[03:40.72] (laughing)
[03:42.80] - I mean, sort of like, I'm actually not following
[03:44.68] like super like closely.
[03:46.20] I think, you know, some of the best algorithms
[03:48.92] are still the same as like hierarchical,
[03:50.60] navigable, small world or whatever.
[03:52.48] - Exactly.
[03:53.32] Yeah, H&SW.
[03:54.66] I think now there's like product quantization.
[03:56.68] There's like some other stuff
[03:57.52] that haven't really followed super closely.
[03:59.24] I mean, obviously like back then it was like,
[04:00.64] you know, and always like very simple.
[04:01.88] It's like a C++ library with Python findings
[04:04.36] and you can map big files and into memory
[04:07.04] and like they had some lookups.
[04:08.12] And I used like this kind of recursive,
[04:10.72] like hyperspace splitting strategy,
[04:12.88] which is not that good,
[04:14.18] but it sort of was good enough at that time.
[04:16.12] But I think a lot of like H&SW is still like
[04:18.76] what people generally use.
[04:20.24] Now of course like databases are much better
[04:22.76] in the sense like to support like insertion updates
[04:24.88] and stuff like that.
[04:25.72] And I never supported that.
[04:26.84] Yeah, this is sort of exciting to finally see
[04:28.48] like vector databases becoming a thing.
[04:30.36] - Yeah, yeah.
[04:31.32] And then maybe one takeaway on a most interesting lesson
[04:34.44] from Daniel Eck.
[04:35.28] - I mean, I think Daniel, like, you know,
[04:38.72] he started Spotify when he was young.
[04:40.08] Like he was like 25, something like that.
[04:42.40] And I was like a good lesson.
[04:43.24] But like he, in a way, like, I think he was very good leader.
[04:46.64] Like there's anything like, and those scandals are like,
[04:49.24] no, he wasn't very eccentric at all.
[04:50.88] It was just kind of like very like level headed,
[04:53.24] like just like random company very well,
[04:54.92] like never made any like obvious mistakes or,
[04:57.08] I think it was like a few bets that maybe like in hindsight
[04:59.16] were like a little, you know, like we took us, you know,
[05:01.64] too far in one direction or another.
[05:03.08] But overall, I mean, I think it was a great CEO,
[05:05.32] like definitely, you know, up there, like generational CEO,
[05:08.48] at least for like Swedish startups.
[05:09.96] - Yeah, yeah, for sure.
[05:11.64] Okay, we should probably move to make a way to its model.
[05:14.24] So then you spent six years as CTO of Better.
[05:17.56] - Yeah.
[05:18.40] - As a CTO engineer and then you scaled up
[05:19.96] to like 300 engineers.
[05:21.52] - I joined as a CTO when there was like no tech team.
[05:23.84] And yeah, there was a wild chapter in my life.
[05:25.80] Like the company did very well for a while.
[05:28.20] And then like during the pandemic.
[05:29.52] - That's well.
[05:30.36] - Yeah, it was kind of a weird story.
[05:31.20] But yeah, it kind of collapsed.
[05:32.12] And then they actually went public for you.
[05:33.24] - Lead off people poorly.
[05:34.64] - Yeah, yeah, it was like a bunch of stories.
[05:36.36] Yeah, I mean, the company like grew from like 10 people
[05:38.92] when I joined at 10,000, now it's back to a thousand.
[05:40.80] And yeah, they actually went public a few months ago.
[05:42.72] It kind of crazy.
[05:43.56] They're still around, like, you know,
[05:44.40] they're still, you know, doing stuff.
[05:46.08] So yeah, very kind of interesting six years of my life
[05:49.20] for non-technical reasons, mostly like,
[05:51.04] but yeah, like I managed like 300, 400,
[05:52.36] - Management, scaling.
[05:53.20] - Yeah, like learning a lot of that,
[05:54.20] like recruiting, I spent all my time recruiting
[05:55.88] and stuff like that.
[05:56.72] And so managing at scale, it's like nice.
[05:59.28] Like now in a way, like when I'm building my own startup,
[06:01.20] like that's actually something I like don't feel
[06:03.04] nervous about at all.
[06:03.88] Like I've managed at scale.
[06:04.76] Like I feel like I can do it again.
[06:06.28] It's like very different things that I'm nervous about
[06:07.76] as a startup founder.
[06:09.08] But yeah, I started mode all three years ago
[06:10.52] after sort of, after leaving better,
[06:12.20] I took a little bit of time off during the pandemic.
[06:14.12] And but yeah, pretty quickly I was like,
[06:16.16] I got to build something.
[06:17.04] I just want to, you know,
[06:18.16] and then yeah, modal took form in my head, took shape.
[06:21.88] - And as far as I understand,
[06:23.12] and maybe we can sort of trade off questions.
[06:24.92] So the quick history is started mode in 2021,
[06:27.84] got your seed with Sarah from Amplify 2022.
[06:30.64] Last year you just announced your series A with Redpoint.
[06:32.92] - That's right.
[06:33.76] - And that brings us up to mostly today.
[06:36.24] Most people I think were expecting you
[06:37.76] to build for the data space.
[06:39.84] - But it is the data space.
[06:40.92] - It is the data space.
[06:42.64] When I think of data space,
[06:43.48] so I come from like snowflake, big query,
[06:46.20] you know, fire train nearby and that kind of stuff.
[06:48.12] - Yeah.
[06:48.96] - And what modal became
[06:51.04] is more general purpose than that.
[06:52.76] - Yeah, yeah.
[06:54.24] I don't know, it was like fun.
[06:55.20] I actually ran into like Eda Liberty,
[06:56.64] the CEO of Pinecon like a few weeks ago.
[06:58.16] And he was like, I was so afraid
[06:59.68] you were building a vector database.
[07:01.40] (laughing)
[07:02.64] No, I started modal because, you know,
[07:05.32] like in a way like I work with data,
[07:06.92] like throughout my most of my career,
[07:08.36] like every different part of the stack, right?
[07:10.32] Like I thought everything like business analytics
[07:12.76] to like deep learning, you know,
[07:14.72] like building, you know,
[07:15.92] trading neural networks to scale like,
[07:17.84] like everything in between, right?
[07:19.04] And so one of the thoughts,
[07:20.36] like in one of the observations I had
[07:21.84] when I started modal or like why I started was like,
[07:23.96] I just wanted to make,
[07:25.04] build better tools for data teams.
[07:26.52] And like very, like that's sort of abstract thing.
[07:28.68] But like, I find that the data stack is, you know,
[07:31.24] fully like point solutions that don't integrate well.
[07:33.96] And still when you look at like data teams today,
[07:36.32] you know, like every startup ends up building
[07:38.16] their own internal Kubernetes wrapper, whatever.
[07:40.84] And, you know, all the different data engineers
[07:42.72] and machine learning engineers
[07:43.56] end up kind of struggling with the same things.
[07:45.88] So I started to think about like,
[07:46.92] how do I build a new data stack,
[07:49.40] which is kind of a megalomaniac project?
[07:51.08] Like, because you kind of wanted to like
[07:52.56] throw out everything.
[07:53.40] - It's over almost a modern data stack.
[07:55.28] (laughing)
[07:56.12] - Yeah, like a post-modern data stack.
[07:58.40] And so I started to think about that.
[08:00.08] And a lot of it came with like,
[08:01.08] like more focus on like the human side of like,
[08:02.68] how do I make data teams more productive?
[08:04.08] And like, what is the technology tools that they need?
[08:06.24] And like, you know, drew out a lot of charts
[08:08.44] of like, how the data stack looks,
[08:09.76] you know, what are the different components?
[08:11.28] And this show is actually very interesting,
[08:12.44] like workflow scheduling,
[08:13.36] 'cause it kind of sits in like a nice sort of,
[08:15.32] you know, it's like a hub in the graph of like data products.
[08:18.24] But it was kind of hard to like kind of do that in a vacuum
[08:21.32] and also to monetize it to some extent.
[08:22.84] And I got very interested in like the layers below
[08:25.44] at some point.
[08:26.28] And like, at the end of the day,
[08:28.20] like most people have code to have to run somewhere.
[08:31.04] So I think about like,
[08:31.88] okay, well, how do you make that nice?
[08:34.04] Like, how do you make that?
[08:35.04] And in particular, like the thing I always like
[08:36.44] thought about like developer productivity is like,
[08:38.00] I think the best way to measure developer productivity
[08:40.56] is like in terms of the feedback loops.
[08:41.72] Like how quickly when you iterate, like when you write code,
[08:44.68] like how quickly can you get feedback?
[08:46.04] And at the innermost loop,
[08:46.96] it's like writing code and then running it.
[08:48.84] And like, as soon as you start working with the cloud,
[08:50.80] like it's like, takes minutes suddenly
[08:52.60] 'cause you have to build a Docker container
[08:53.88] and push it to the cloud and like run in, you know.
[08:55.68] So that was like the initial focus for me.
[08:57.52] It was like, I just want to solve that problem.
[08:59.16] Like I want to, you know, build something less
[09:01.80] your own thing is in the cloud and like retain this sort of,
[09:03.64] you know, the joy of productivity
[09:05.92] as when you're running things locally.
[09:07.52] And in particular, I was quite focused on data teams
[09:09.56] 'cause I think they had a couple of unique needs
[09:11.84] that wasn't well served by the infrastructure at that time
[09:14.36] or like still isn't like, in particular, like Kubernetes.
[09:16.92] I feel like it's like kind of worked okay for backend teams,
[09:19.68] but not so well for data teams.
[09:21.16] And very quickly, I got sucked into like a very deep
[09:23.04] like rabbit hole of like-
[09:24.00] - Not well for data teams because of burstiness.
[09:25.80] - Yeah, for sure.
[09:26.64] So like burstiness is like one thing, right?
[09:28.08] Like, you know, like you often have this like fan out.
[09:30.20] You want to like apply some function
[09:31.52] over very large assets.
[09:32.84] Another thing tends to be like hardware requirements.
[09:34.68] Like you need like GPUs.
[09:35.68] And like I've seen this with many companies.
[09:37.24] Like you go, you know, the data scientists
[09:38.76] go to a platform team and they're like,
[09:39.92] can we add GPUs to the Kubernetes?
[09:41.48] They're like, no, like that's, you know, complex.
[09:43.64] We're not gonna, or like, so like just getting GPU access.
[09:46.20] And then like, I mean, I also like data code.
[09:48.28] Like frankly, or like machine learning code,
[09:50.24] like tends to be like super annoying
[09:52.48] in terms of like environments.
[09:53.52] Like you have enough having like a lot of like custom
[09:55.56] like containers and like environment conflicts.
[09:58.08] And like it's very hard to set up like a unified container
[10:01.56] that like can serve like a data scientist.
[10:03.92] Because like there's always like packages that break.
[10:05.76] And so I think there's a lot of different reasons
[10:07.84] why the technology wasn't well suited for backend.
[10:11.44] And I think the attitude at that time was often like,
[10:13.28] you know, like you had friction
[10:14.80] between the data team and the platform team.
[10:16.36] Like, well, it works for the backend stuff.
[10:18.24] You know, why don't you just like, you know, make it work.
[10:20.24] But like, I actually felt like data teams, you know,
[10:22.32] or at this point now, like there's so much,
[10:24.84] so many people working with data and like they,
[10:26.36] to some extent like deserve their own tools
[10:28.08] and their own tool chains.
[10:28.92] And like optimizing for that is not something
[10:31.04] people have done.
[10:31.88] So that's sort of like very abstract,
[10:33.72] philosophical reason why I started model.
[10:35.12] And then, and then I got sucked into like rabbit hole
[10:37.04] of like container cold start and, you know, like whatever,
[10:40.20] Linux, page cache, you know, file system optimizations.
[10:43.44] - Yeah, tell people, I think the first time I met you,
[10:46.00] I think you told me a some numbers, but I don't remember.
[10:48.08] Like what are the main achievements that you were unhappy
[10:50.24] with the status quo and then you built your own container
[10:52.32] stack as well?
[10:53.16] - Yeah, I mean, like in particular it was like,
[10:54.40] in order to have that loop, right?
[10:56.08] You want to be able to start like take code on your laptop,
[10:59.32] whatever and like run in the cloud very quickly
[11:01.24] and like running in custom containers
[11:02.60] and maybe like spin up like a hundred containers,
[11:04.16] a thousand, you know, things like that.
[11:05.48] And so container cold start was the initial,
[11:07.64] like from like a developer productivity point of view,
[11:09.40] it was like really what I was focusing on is,
[11:11.92] I want to take code, I want to stick it in container,
[11:13.52] I want to execute in the cloud and like, you know,
[11:14.96] make it feel like fast.
[11:16.40] And when you look at like how Docker works for instance,
[11:18.60] like Docker, you have this like fairly convoluted,
[11:21.00] like very resource inefficient way, they, you know,
[11:23.60] you build a container, you upload the whole container
[11:25.68] and then you download it and you run it.
[11:27.68] And Kubernetes also like not very fast
[11:29.40] at like starting containers.
[11:30.24] So like I started kind of like, you know,
[11:31.92] going a layer deeper like Docker is actually like, you know,
[11:34.00] there's like a couple of different primitives,
[11:35.08] but like a lower level primitive is run C,
[11:36.96] which is like a container runner.
[11:38.44] And I was like, what if I just take the container runner,
[11:40.80] like run C and I point it to like my own root file system
[11:44.52] and then I built like my own virtual file system
[11:46.44] that exposes files over network instead.
[11:49.68] And that was like the sort of very crude version of model.
[11:51.40] It's like, now I can actually start containers very quickly
[11:54.06] because it turns out like when you start a Docker container,
[11:56.28] like first of all, like most Docker images
[11:58.72] are like several gigabytes.
[11:59.76] And like 99% of that is never going to be consumed.
[12:02.36] Like there's a bunch of like, you know,
[12:03.80] like time zone information for like Uzbekistan,
[12:06.16] whatever, like no one's going to read it.
[12:07.92] And then there's a very high overlap
[12:09.60] between the files that are going to be read.
[12:10.64] There's going to be like lib torch or whatever,
[12:12.04] like it's going to be read.
[12:12.88] So you can also cache it very well.
[12:14.16] So that was like the first sort of stuff we started working on
[12:16.48] was like, let's build this like container file system.
[12:19.60] And, you know, a couple of like, you know,
[12:21.24] just using run C directly.
[12:22.76] And that actually enabled us to like get to this point
[12:25.16] of like, you write code and then you can launch it in the cloud
[12:27.80] within like a second or two, like something like that.
[12:30.08] And, you know, there's been many optimizations since then,
[12:32.16] but that was sort of a starting point.
[12:33.44] - Can we talk about the developer experience as well?
[12:36.76] I think one of the magic things about model is
[12:39.60] at the very basic layers, like a Python function decorator,
[12:42.84] it's just like stub, well, not,
[12:45.00] but then you also have a way to define a full container.
[12:48.00] What were kind of the time decisions that went into it?
[12:50.12] Where did you start?
[12:51.08] How easy did you want it to be?
[12:52.44] And then maybe how much complexity did you then
[12:55.04] add on to make sure that every use case fit?
[12:57.16] - I mean, models, I almost feel like it's like
[12:58.72] almost like two products kind of glued together.
[13:00.80] Like there's like the low level like container runtime,
[13:02.96] like file system and all that stuff, like in Rust.
[13:04.52] And then there's like the Python SDK, right?
[13:06.28] Like how do you express applications?
[13:07.88] And I think, I mean, SWIX, like,
[13:09.52] I think your blog was like the self-provisioning runtime
[13:11.44] was like to me always like to sort of,
[13:12.60] for me like an eye-opening thing.
[13:13.80] It's like, so I didn't think about like.
[13:15.04] - You wrote your post four months before me.
[13:16.88] - Yeah?
[13:17.72] - The software 2.0, infrared 2.0.
[13:20.16] - Yeah, well, I don't know.
[13:21.00] Like convergence of minds, like.
[13:22.20] - I guess we're like both thinking, maybe you put,
[13:25.32] I think better words than like, you know,
[13:26.96] maybe something I was like thinking about for a long time.
[13:28.96] - Yeah, and I can tell you how I was thinking about it
[13:30.72] on my own, but I wanna hear it.
[13:31.56] - Yeah, yeah, I would love it.
[13:32.88] And like to me, like what I always wanted to build was like,
[13:35.52] I don't know, like I don't know if you use like Pulumi.
[13:37.32] Like Pulumi is like nice, like in the sense,
[13:38.72] like it's like Pulumi is like,
[13:39.96] you describe infrastructure in code, right?
[13:42.24] And to me, that was like so nice.
[13:44.12] Like finally I can like, you know, put a for loop
[13:46.44] that creates S3 buckets or whatever.
[13:48.36] And I think like Modal sort of goes one step further
[13:50.56] in the sense that like, what if you also put the app code
[13:53.12] inside the infrastructure code and like glue it all together
[13:55.56] and then like you only have one single place
[13:57.00] that defines everything.
[13:58.08] And it's all programmable.
[13:59.44] It don't have any config files.
[14:00.84] Like Modal has like zero config, there's no config.
[14:02.96] It's all code.
[14:03.80] And so that was like the goal that I wanted, like part of that.
[14:06.28] And then the other part was like,
[14:07.44] I often find that so much of like my time was spent
[14:09.92] on like the plumbing between containers.
[14:13.12] And so my thinking was like, well,
[14:14.68] if I just build this like Python SDK
[14:16.92] and make it possible to like bridge like different containers
[14:19.96] just like a function call.
[14:20.88] Like, and I can say, oh, this function runs in this container.
[14:23.96] And this other function runs in this container.
[14:25.72] And I can just call it just like a normal function.
[14:28.16] Then, you know, I can build these applications
[14:30.36] that may span a lot of different environments.
[14:32.52] Maybe they fan out, start other containers.
[14:34.68] But it's all just like inside Python.
[14:36.32] You just like have this beautiful kind of nice like DSL
[14:38.88] almost for like, you know,
[14:40.32] how to control infrastructure in the cloud.
[14:42.32] So that was sort of like how we ended up
[14:44.12] with the Python SDK as it is,
[14:45.72] which is still evolving all the time.
[14:47.04] By the way, we keep changing syntax quite a lot.
[14:48.56] 'Cause I think it's still somewhat exploratory.
[14:51.40] But we're starting to converge on something
[14:52.56] that feels like reasonably good now.
[14:54.48] - Yeah, along the way, you, with this expressiveness,
[14:57.52] you enabled the ability to, for example,
[15:00.24] attach a GPU to a function.
[15:01.76] - Totally, yeah.
[15:02.60] It's like, you just like say, you know,
[15:03.68] on the function decorator, you're like GPU equals,
[15:05.72] you know, A100 and then, or like GPU equals, you know,
[15:09.32] A10 or a T4 something like that.
[15:10.76] And then you get that GPU and like, you know,
[15:12.20] you just run the code and it runs.
[15:13.32] Like you don't have to, you know,
[15:15.12] go through hoops to, you know,
[15:16.48] start an EC2 instance or whatever.
[15:18.60] - Yeah.
[15:19.44] - So it's all code.
[15:20.28] - Yeah, so on my end, the reason I wrote
[15:21.92] self-provisioning runtimes was I was working at AWS
[15:24.80] and we had AWS CDK, which is kind of like, you know,
[15:27.72] that the Amazon basics blew me.
[15:29.00] - Yeah, totally.
[15:29.84] - And then, and then like, it creates,
[15:32.00] it compiles the cloud formation.
[15:33.56] - Yeah.
[15:34.40] - And then on the other side, you have to like,
[15:35.24] get all the config stuff and then put it
[15:36.80] into your application code and make sure that they line up.
[15:39.68] So then you're writing code to define your infrastructure,
[15:42.72] then you're writing code to define your application.
[15:44.52] And I was just like, this is like,
[15:46.16] obvious that it's going to converge, right?
[15:47.48] - Yeah, totally.
[15:48.72] But isn't there like, it might be wrong,
[15:50.48] but like, was it like Sam or Chalice or one of those?
[15:52.88] Like, isn't that like an AWS thing
[15:54.40] that where actually they kind of did that?
[15:56.44] I feel like there's like one problem.
[15:57.28] - Sam, yeah, yeah, yeah, yeah, yeah.
[15:59.52] Still very clunky.
[16:00.48] - Okay.
[16:01.32] - It's not as arrogant as Modo.
[16:02.80] - I love AWS for like, the stuff it's built,
[16:05.60] you know, like historically in order for me to like,
[16:07.56] you know, what it enables me to build.
[16:09.20] But like, AWS is always like struggle
[16:10.64] with developer experience.
[16:11.68] Like, and that's big.
[16:13.36] I mean, they have to not break things.
[16:15.32] - Yeah, yeah, and totally.
[16:16.40] And they have to, you know,
[16:17.36] build products for very wide range of use cases.
[16:20.32] And I think that's hard.
[16:21.16] - Yeah, yeah, so it's easier to design for.
[16:23.36] Yeah, so anyway, I was pretty convinced
[16:25.84] that this would happen.
[16:26.80] I wrote that thing.
[16:27.84] And then, you know, imagine my surprise
[16:29.12] that you guys had it on your landing page at some point.
[16:31.60] - Yeah.
[16:32.44] - I think Akshad was just like--
[16:33.28] - Oh, is that it? - Just throw that in there.
[16:34.40] - Did you trademark it?
[16:35.56] - No, but I definitely got sent a few pitch decks
[16:38.44] with my post on there.
[16:39.56] - Nice. - And it was like,
[16:40.40] really interesting.
[16:41.36] This is my first time like,
[16:42.40] kind of putting a name to a phenomenon.
[16:43.92] - Yeah. - And I think
[16:44.76] this is useful skill for people
[16:46.08] to just communicate what they're trying to do.
[16:47.72] - Yeah, no, I think it's a beautiful concept, yeah.
[16:49.96] - Yeah, yeah.
[16:50.92] But obviously you implemented it.
[16:52.48] What became more clear in your explanation today
[16:55.00] is that actually you're not that tied to Python.
[16:57.08] - No, I mean, I think that all the lower level stuff
[16:59.96] is, you know, just running containers
[17:01.80] and like scheduling things and, you know,
[17:04.04] serving container data and stuff.
[17:05.24] So, like one of the benefits of data teams is obviously like,
[17:07.48] they're all like using Python, right?
[17:09.04] So that made it a lot easier.
[17:10.56] I think, you know, if we had to focus on other workloads,
[17:13.28] like, you know, for various things,
[17:14.40] like we've like been kind of like half thinking
[17:16.56] about like CI or like things like that.
[17:18.36] But like, anyway, that's like harder
[17:20.04] 'cause like you also, then you have to be like,
[17:21.84] you know, multiple SDKs, whereas, you know,
[17:25.04] focus on data teams, you can only, you know,
[17:26.80] Python like covers like 95% of all teams.
[17:29.20] That made it a lot easier.
[17:30.04] But like, I mean, like definitely like in the future,
[17:31.60] we can add other support, like supporting other languages.
[17:34.08] JavaScript for sure is the obvious next language.
[17:37.12] But, you know, who knows, like, you know, Rust, Go, R,
[17:40.56] whatever, PHP, Haskell, I don't know.
[17:42.44] - You know, I think for me, I actually am a person
[17:45.56] who like kind of liked the idea
[17:47.72] of programming language advancements
[17:49.88] being improvements in developer experience.
[17:52.56] But all I saw out of the academic sort of PLT type people
[17:56.20] is just type level improvements.
[17:58.20] And I always think like, for me, like one of the core reasons
[18:00.96] for self-provisioning runtimes and then while like model,
[18:03.12] it's like, this is actually a productivity increase.
[18:05.72] - Totally.
[18:06.56] It's a language level thing, you know,
[18:07.96] you managed to stick it on top of an existing language,
[18:10.08] but it is your own language.
[18:11.44] - Yeah.
[18:12.28] - DSL on top of Python.
[18:13.12] - Yeah.
[18:13.96] - And so language level increase on the order
[18:15.28] of like automatic memory management, you know,
[18:17.28] you could sort of make that analogy that like,
[18:19.48] maybe you lose some level of control,
[18:21.44] but most of the time you're okay
[18:22.84] with whatever model gives you.
[18:24.44] And like, that's fine.
[18:25.44] - Yeah, I mean, that's how I look at about it too.
[18:28.04] Like, you know, you look at developer productivity
[18:29.68] over the last number of decades, like, you know,
[18:31.76] it's common like small increments of like, you know,
[18:34.28] dynamic typing or like, it's like one thing,
[18:36.24] it's not suddenly like for a lot of use cases,
[18:37.68] you don't even care about type systems
[18:39.04] or better compiler technology or like, you know,
[18:41.52] the cloud or like, you know, relational databases.
[18:43.64] And, you know, I think, you know,
[18:44.88] you look at like that, you know, history,
[18:47.08] it's a steadily, you know, it's like, you know,
[18:49.84] the developers have been getting like probably 10x
[18:52.24] more productive every decade for the last four,
[18:55.28] four decades or something.
[18:56.12] It was kind of crazy, like on an exponential scale,
[18:58.00] we're talking about 10x or is there a 10,000x,
[19:00.76] like, you know, improvement in developer productivity?
[19:02.40] What we can build today, you know, is arguably like,
[19:05.00] you know, a fraction of the cost of what it, you know,
[19:06.56] took to build it in the 80s.
[19:07.92] Maybe it wasn't even possible in the 80s.
[19:09.28] So that, to me, like, that's like so fascinating.
[19:11.40] I think it's going to keep going for the next few decades.
[19:13.68] - Yeah, yeah.
[19:14.88] - Another big thing in the infrared superno wish list
[19:17.56] was truly serverless infrastructure.
[19:19.92] The other, on your landing page,
[19:21.32] you called them native cloud functions,
[19:23.44] something like that.
[19:24.76] I think the issue I've seen with serverless
[19:26.88] has always been people really wanted it to be stateful,
[19:30.00] even though state less was much easier to do.
[19:32.36] And I think now with AI,
[19:34.08] most model inference is like state less,
[19:36.68] you know, outside of the context.
[19:37.92] So that's kind of made it a lot easier to just put a model,
[19:41.24] like a AI model on model to run.
[19:44.60] How do you think about how that changes,
[19:46.24] how people think about infrastructure too?
[19:48.20] - Yeah, I mean, I think model is definitely going
[19:50.24] in the direction of like doing more stateful things
[19:52.20] and working with data and like high IO use cases.
[19:55.16] I do think one like massive serendipitous thing
[19:57.72] that happened like halfway, you know,
[19:59.20] a year and a half into like the, you know,
[20:00.88] building model was like gen AI started exploding.
[20:03.20] And the IO pattern of gen AI,
[20:05.04] it's like fits the serverless model like so well,
[20:07.88] because it's like, you know,
[20:09.28] you send this tiny piece of, like a prompt, right?
[20:11.68] Or something like that.
[20:12.64] And then like you have this GPU
[20:13.92] that does like trillions of flops.
[20:15.68] And then it sends back like a tiny piece of information,
[20:17.88] right?
[20:18.72] And that turns out to be something like, you know,
[20:19.80] if you can get serverless working with GPU,
[20:21.84] that just like works really well, right?
[20:23.68] So I think from that point of view, like serverless always,
[20:26.08] to me, felt like a little bit of like a solution
[20:27.88] looking for a problem.
[20:28.92] I don't actually like don't think like backend
[20:30.64] is like the problem that needs sort of it.
[20:32.72] Or like not as much, but I look at data
[20:34.56] and in particular like things like gen AI,
[20:35.84] like model inference, like it's like clearly a good fit.
[20:38.44] So I think that is, you know, to a large extent,
[20:41.48] explains like why we saw, you know,
[20:43.24] the initial sort of like killer app
[20:45.44] for model being model inference,
[20:47.24] which actually wasn't like necessarily what we're focused on,
[20:49.60] but that's where we've seen like by far
[20:51.00] the most usage and growth.
[20:52.36] - And this was before you started offering like fine tuning
[20:55.48] or language models.
[20:56.32] It was mostly stable diffusion.
[20:58.92] - Yeah, yeah.
[20:59.76] - Like model, like I always built it
[21:01.16] to be a very general purpose compute platform,
[21:03.16] like something where you could run everything.
[21:04.28] And I used to call it model
[21:05.12] like a better Kubernetes for data team for a long time.
[21:08.04] What we realized was like, yeah, that's like, you know,
[21:10.04] a year and a half in like we barely had any users
[21:12.36] or any revenue and like we were like, well, maybe we should look
[21:14.84] at like some use, trying to think of use cases.
[21:16.60] And that was around the same time stable diffusion came out.
[21:19.68] And the beauty of model is like,
[21:21.00] you can run almost anything on model, right?
[21:23.40] Like model inference turned out to be like the place
[21:25.12] where we found initially or like clearly this has like 10x
[21:28.24] like better agronomics than anything else.
[21:30.12] But we're also like, you know,
[21:31.48] going back to my original vision,
[21:32.72] like we're thinking a lot about, you know, not, okay,
[21:35.08] now we do inference really well.
[21:36.16] Like what about training?
[21:37.00] What about fine tuning?
[21:37.84] What about, you know, end to end life cycle deployment?
[21:39.48] What about data pre-processing?
[21:40.76] What about, you know, I don't know, real-time streaming?
[21:43.16] What about, you know, large data munging?
[21:45.96] Like there's just data observability.
[21:47.92] I think there's so many things like kind of going back
[21:50.60] to what I said about like redefining data stack,
[21:52.44] like starting with the foundation of compute.
[21:55.28] Like one of the exciting things about model is like,
[21:57.24] we've sort of, you know,
[21:58.56] we've been working on that for three years
[21:59.84] and it's maturing,
[22:00.68] but like this is so many things you can do,
[22:03.08] like with just like a better compute primitive
[22:05.70] and also go up to stack
[22:06.64] and like do all this other stuff on top of it.
[22:08.84] - Yeah.
[22:09.68] How do you think about a rather like,
[22:11.52] I would love to learn more about the underlying infrastructure
[22:14.20] and like how you make that happen.
[22:16.08] Because with fine tuning and training,
[22:18.24] it's a static memory.
[22:20.04] Like you exactly know what you're gonna load in memory one.
[22:22.56] And it's kind of like a set amount of compute
[22:24.36] versus inference, just like data is like very bursty.
[22:27.48] How do you make batches work
[22:29.88] with a serverless developer experience?
[22:32.60] You know, like what are like some fun technical challenge
[22:35.10] is all to make sure you get max utilization on this GPUs.
[22:38.00] What we hear from people is like, we have GPUs,
[22:40.40] but we can really only get like, you know,
[22:42.28] 30, 40, 50% maybe utilization.
[22:45.12] - Yeah.
[22:45.96] - What some of the fun stuff you're working on
[22:46.90] to get a higher number there.
[22:48.28] - Yeah, I think on the inference side,
[22:49.60] like that's where like, you know,
[22:50.84] like from a cost perspective
[22:51.96] and like utilization perspective,
[22:53.24] we've seen, you know, like very good numbers.
[22:55.12] And in particular, like it's our ability
[22:56.48] to start containers and stop containers very quickly.
[22:59.08] And that means that we can autoscale extremely fast
[23:01.84] and scale down very quickly,
[23:03.50] which means like we can always adjust the sort of capacity,
[23:05.88] the number of GPUs running to the exact traffic volume.
[23:09.28] And so in many cases, like that actually leads
[23:11.12] to a sort of interesting thing where like,
[23:12.24] we obviously run our things on like the public cloud,
[23:14.08] like AWS GCP run on Oracle.
[23:16.32] But in many cases, like users who do inference
[23:19.44] on those platforms or those clouds,
[23:21.72] even though we charge a slightly higher price per GPU hour,
[23:25.12] a lot of users like moving their large scale,
[23:26.68] the inference use cases to model,
[23:27.68] like end up saving a lot of money.
[23:29.08] 'Cause we only charge for like with the time
[23:30.70] the GPU is actually running.
[23:32.20] And that's a hard problem, right?
[23:33.28] Like, you know, if you have to constantly adjust
[23:35.36] the number of machines, if you have to start containers,
[23:37.20] stop containers, like that's a very hard problem.
[23:38.92] And starting containers quickly is a very difficult thing.
[23:41.48] I mentioned we had to build our own file system for this.
[23:44.28] We also, you know, built our own container schedule
[23:46.48] over for that.
[23:47.32] We've implemented recently CPU memory check pointing
[23:50.04] so we can take running containers
[23:51.40] and snapshot entire CPU, like including registers
[23:54.40] and everything and restore it from that point,
[23:57.12] which means we can restore it from a initialized state.
[23:59.84] We're looking at GPU check pointing next,
[24:01.52] it's like a very interesting thing.
[24:02.72] So I think the name for this stuff,
[24:03.84] that's where serverless really shines,
[24:06.84] because you can drive, you know,
[24:08.32] you can push the frontier of latency
[24:10.96] versus utilization quite substantially,
[24:13.44] you know, which either ends up being a latency advantage
[24:15.44] or a cost advantage or both, right?
[24:17.56] On training is probably arguably like less
[24:19.20] of an advantage doing serverless, frankly,
[24:21.24] 'cause you know, you can just like spin up a bunch of machines
[24:23.16] and try to satisfy, like, you know,
[24:25.00] train as much as you can on each machine.
[24:27.20] For that area, like we've seen like, you know,
[24:29.12] arguably like less usage, like for model,
[24:31.48] but there are always like some interesting use kit.
[24:32.76] Like we do have a couple of customers,
[24:34.00] like RAM for instance, like they do fine tuning with model
[24:36.00] and they basically like one of the patterns they have
[24:38.08] is like very bursty type fine tuning,
[24:39.68] where they fine tune 100 models in parallel.
[24:41.56] And that's like a separate thing
[24:42.56] that model does really well, right?
[24:43.52] Like we can start up 100 containers very quickly,
[24:46.12] run a fine tuning training job on each one of them
[24:48.84] for that only runs for, I don't know, 10, 20 minutes.
[24:51.32] And then, you know, you can do hyper parameter tuning
[24:53.28] in that sense, like just pick the best model
[24:54.76] and things like that.
[24:55.60] So there are like interesting training.
[24:56.92] I think when you get to like training
[24:58.08] like very large foundation of models,
[24:59.48] that's a use case we don't support super well
[25:01.12] 'cause that's very high IO, you know,
[25:03.16] you need to have like infinite band and all these things.
[25:05.16] And those are things we haven't supported yet
[25:07.40] and might take a while to get to that.
[25:09.32] So that's like probably like an area
[25:10.60] where like we're relatively weakened.
[25:11.96] - Yeah, have you cared at all
[25:13.32] about lower level model optimization?
[25:15.76] There's other club providers that do custom kernels
[25:18.52] to get better performance or are you just
[25:20.60] given that you're not just an AI compute company?
[25:24.32] - Yeah, I mean, I think like we want to support
[25:26.04] like a generic, like general workloads in a sense
[25:28.12] that like we want users to give us a container essentially
[25:30.24] or a code or code and then we want to run that.
[25:32.92] So I think, you know, we benefit from those things
[25:36.44] in the sense that like we can tell our users, you know,
[25:39.04] to use those things.
[25:40.44] But I don't know if we want to like poke into users' containers
[25:43.20] and like do those things automatically.
[25:44.48] That's sort of, I think a little bit tricky
[25:46.08] from the outside to do
[25:46.92] 'cause we want to be able to take like arbitrary code
[25:49.20] and execute it, but certainly like, you know,
[25:51.04] we can tell our users to like use those things.
[25:52.80] - Yeah, I may have betrayed my own biases
[25:55.84] because I don't really think about model
[25:57.52] as for data teams anymore.
[25:59.76] I think you started AI.
[26:00.96] I think you're much more for AI engineers
[26:03.08] and my favorite anecdotes, which I think you know,
[26:06.20] but I don't know if you directly experienced it.
[26:08.84] I went through the Versal AI Accelerator,
[26:10.56] which you supported.
[26:11.96] And in the Versal AI Accelerator,
[26:13.80] a bunch of startups gave like free credits
[26:15.44] and like signups and talks and all that stuff.
[26:18.00] The only ones that stuck
[26:18.92] are the ones that actually appealed to engineers
[26:21.00] and the top usage, the top tool used by Fowler's Model.
[26:24.56] - That's awesome.
[26:25.40] - For people building with AI apps.
[26:27.04] - Yeah, I mean, it might be also like a terminology question,
[26:29.32] like the AI versus data, right?
[26:30.64] Like I've, you know, maybe I'm just like old and jaded,
[26:32.84] but like I've seen so many like different titles.
[26:34.92] Like for a while it was like, you know,
[26:37.32] I was a data scientist and a machine learning engineer.
[26:39.96] And then, you know, there was like analytics engineers
[26:42.00] and then it was like AI, you know, so like to me, it's like,
[26:44.92] I just like, in my head, that's to me just like data,
[26:48.64] or like engineer, you know, like I don't really,
[26:50.92] 'cause that's why I've been like, you know,
[26:51.96] just calling it data teams.
[26:53.56] But like, of course, like, you know, AI is like, you know,
[26:56.48] like such a massive fraction of our like workloads.
[26:59.48] - It's a different Venn diagram of things you do, right?
[27:02.64] So the stuff that you're talking about
[27:04.02] where you need like infinity bands
[27:05.84] for like highly parallel training,
[27:08.00] that's not, that's more of the ML engineer,
[27:09.72] that's more of the research scientist.
[27:11.12] - Yeah, yeah.
[27:11.96] - And less of the AI engineer,
[27:13.24] which is more sort of try to work at the application.
[27:15.96] - Yeah.
[27:16.80] I mean, to be fair to it, like we have a lot of users
[27:18.36] that are like doing stuff
[27:19.60] that I don't think fits neatly into like AI.
[27:22.40] Like we have a lot of people using like
[27:23.68] more of a web scraping, like it's kind of nice.
[27:25.40] Like you can just like, you know, fire up like a hundred
[27:27.92] or a thousand containers running Chromium
[27:29.52] and just like render a bunch of web pages
[27:30.88] and takes, you know, whatever.
[27:32.16] Or like, you know, protein folding, is that,
[27:35.04] I mean, maybe that's, I don't know, like,
[27:36.72] but like, you know, they have a bunch of users doing that
[27:38.60] or like, you know, in terms of in the realm of biotech,
[27:41.72] like sequence alignment, like people using,
[27:43.76] or like a couple of people using like modal
[27:45.68] to run like large, like mixed integer programming problems,
[27:48.12] like, you know, using Gerobi or like things like that.
[27:50.16] So video processing is another thing that keeps coming up.
[27:53.02] Like, you know, let's say you have like petabytes of video
[27:55.72] and you want to just like transcode it,
[27:56.88] like, or you can fire up a lot of containers
[27:58.64] and just like run FFM peg or like,
[28:00.56] so there are those things too.
[28:01.68] Like, I mean, like that being said,
[28:03.04] like AI is by far our biggest use case, but,
[28:05.16] you know, like, again, like modal
[28:06.40] is kind of general purpose in that sense.
[28:08.04] - Yeah.
[28:08.88] Well, maybe let's stick with the stable diffusion thing
[28:10.48] and then we'll move on to the other use cases
[28:12.12] or AI that you want to highlight.
[28:14.28] The other big player in my mind is Replicate.
[28:16.96] - Yeah.
[28:17.80] - In this era.
[28:18.64] They're much more, I guess, custom built for that purpose,
[28:21.76] whereas you're more general purpose.
[28:23.20] How do you position yourself with them?
[28:26.48] Are they just for like different audiences
[28:28.12] or are you just hits on competing?
[28:29.72] - I think there's like a tiny sliver of the Venn diagram
[28:32.48] where we're competitive
[28:33.52] and then like 99% of the area we're not competitive.
[28:37.16] I mean, I think for people who,
[28:39.64] if you think of like front engineers,
[28:40.68] I think that's where like really they found good fit.
[28:42.28] It's like, you know, people who built a cool web app
[28:44.36] and they want some sort of AI capability
[28:46.36] and they just, you know, an off the shelf model
[28:48.56] is like perfect for them.
[28:49.92] That's like, I like use Replicate, that's great, right?
[28:52.80] Like, I think where we shine is like custom models
[28:55.56] or custom workflows, you know,
[28:57.44] running things at very large scale.
[28:58.80] We need to care about utilization, care about costs.
[29:01.16] You know, we have much lower prices
[29:02.92] because we spend a lot more time
[29:03.92] optimizing our infrastructure, you know,
[29:05.68] and that's where we're competitive, right?
[29:06.96] Like, you know, and you look at some of the use cases
[29:08.60] like Suno is a big user.
[29:10.36] Like, they're running like large scale like AI.
[29:12.00] - Oh, we're talking with Mikey in a month.
[29:14.00] - Yeah, so I mean, they're using model
[29:15.76] for like production infrastructure.
[29:16.88] Like they have their own like custom model,
[29:18.76] like custom code and custom weights, you know,
[29:20.36] for AI generating music, Suno.ai, you know,
[29:22.88] that those are the types of use cases
[29:24.52] that we like, you know, things that are like very custom
[29:26.68] or like it's like, you know,
[29:28.24] and those are the things like
[29:29.24] it's very hard to run a Replicate, right?
[29:30.96] And that's fine.
[29:31.80] Like I think they focus on a very different part
[29:33.80] of the stack in that sense.
[29:35.08] - And then the other company pattern
[29:36.80] that I pattern match you to is Modular.
[29:39.64] - Is it the names?
[29:40.96] - No, no, no.
[29:42.20] But yes, the name is very similar.
[29:44.16] I think there's something that might be insightful there
[29:46.24] from a linguistics point of view.
[29:47.60] Oh, no, they have Mojo, the sort of Python SDK.
[29:50.52] And they have the Modular Inference Engine,
[29:52.00] which is their sort of, their cloud stack,
[29:54.08] their sort of compute inference stack.
[29:56.00] I don't know if anyone's made the comparison to you before,
[29:57.96] but like I see you evolving a little bit in parallel there.
[30:01.60] - No, I mean, maybe.
[30:03.28] Yeah, like it's not a company.
[30:04.44] I'm like super like, I mean,
[30:06.20] I know the basics, but like,
[30:07.72] I guess they're similar in the sense
[30:08.88] like they want to do a lot of,
[30:10.40] they have sort of big picture vision.
[30:12.40] - Yes, they also want to build very general purpose.
[30:14.40] And they also are marketing themselves as like,
[30:17.72] if you want to do off the shelf stuff, go somewhere else.
[30:20.44] If you want to do custom stuff,
[30:21.44] who are the best place to do it?
[30:22.48] - Yeah, yeah.
[30:23.32] - There is some overlap there.
[30:24.60] There's not overlap in the sense that you are a close source
[30:27.56] platform people have to host their code on you.
[30:30.60] Whereas for them, they're very insistent
[30:32.24] on not running their own cloud service.
[30:34.36] They're a box software.
[30:35.48] - Yeah, yeah.
[30:36.32] - They're licensed C software.
[30:37.16] - I'm sure their VCs at some point
[30:38.60] can have forced them to reconsider.
[30:40.28] - No, no, Chris is very, very insistent
[30:42.44] and very convincing.
[30:43.40] (laughing)
[30:44.76] So anyway, I would just make that comparison,
[30:47.52] let people make the links if they want to.
[30:48.88] But it's an interesting way to see the cloud market develop
[30:52.04] from my point of view.
[30:52.88] 'Cause I came up in this field thinking cloud is one thing.
[30:55.88] And I think your vision is like something slightly different.
[30:58.40] And I see the different takes on it.
[31:00.36] - Yeah, and like one thing I've, you know,
[31:02.40] like I've written a bit about it in my blog too.
[31:04.08] It's like, I think of us as like a second layer
[31:06.24] of cloud provider in the sense that like,
[31:07.56] I think snowflake is like kind of a good analogy.
[31:09.60] Like snowflake, you know, is infrastructure
[31:12.08] is a service, right?
[31:12.92] But they actually run on like major clouds, right?
[31:15.44] And I mean, like you can like analyze this very deeply.
[31:18.16] But like one of the things I always thought about is like,
[31:19.52] why did snowflake already like win over Redshift?
[31:21.64] And I think snowflake, you know, to me, one,
[31:24.68] because like, I mean, in the end,
[31:25.84] like AWS makes all the money anyway.
[31:27.32] Like and like snowflake just had the ability
[31:29.76] to like focus on like developer experience
[31:32.68] or like, you know, user experience.
[31:34.24] And to me, like really proved that you can build
[31:36.56] a cloud provider, a layer off from, you know,
[31:39.56] the traditional like public clouds.
[31:40.76] And in that layer, that's also where I would put modal.
[31:44.32] It's like, you know, we're building a cloud provider.
[31:45.88] Like, we're, you know, we're like a multi-tenant environment
[31:48.52] that runs the user code,
[31:49.80] but also building on top of the public cloud.
[31:51.36] So I think there's a lot of room in that space.
[31:53.12] I think it's very sort of interesting direction.
[31:55.56] - How do you think of that compared
[31:57.20] to the traditional past history?
[31:59.56] Like, you know, yeah, AWS,
[32:01.36] then you had Heroku, then you ran the railway.
[32:04.48] - Yeah, I mean, I think they're all,
[32:06.04] those are all like great.
[32:06.88] Like, I think the problem that they all faced
[32:09.08] was like the graduation problem, right?
[32:11.28] Like, you know, Heroku or like, I mean, like also like,
[32:14.16] Heroku, there's like a counterfactual future of like,
[32:16.80] what would have happened if Salesforce didn't buy them, right?
[32:18.88] Like, that's a sort of separate thing.
[32:20.08] But like, I think what Heroku, I think always struggled with
[32:23.04] was like, eventually companies would get big enough
[32:26.36] that you couldn't really justify running in Heroku.
[32:28.68] So they would just go and like move it to, you know,
[32:30.64] whatever AWS or, you know, in particular.
[32:32.96] And you know, that's something that keeps me up at night too.
[32:34.92] Like, what does that graduation risk like look like for modal?
[32:37.92] I always think like the only way to build
[32:39.68] a successful infrastructure company in the long run
[32:41.40] in the cloud today is you have to appeal
[32:43.68] to the entire spectrum, right?
[32:45.16] Or at least like the enterprise.
[32:46.56] Like you have to capture the enterprise market.
[32:48.44] But the truly good companies capture the whole spectrum, right?
[32:50.68] Like, I think a company is like,
[32:52.16] I don't like Data Dog or Mongo or something like that.
[32:53.92] We're like, they both capture like the hobbyists
[32:56.32] and acquire them, but also like, you know,
[32:58.24] have very large enterprise customers.
[33:00.12] I think that arguably was like where I,
[33:01.96] in my opinion, like Heroku struggle was like,
[33:04.68] how do you maintain the customers
[33:06.36] as they get more and more advanced?
[33:07.72] I don't know what the solution is,
[33:08.88] but I think this, you know,
[33:11.04] that's something I would have thought deeply
[33:12.36] if I was at Heroku at that time.
[33:14.32] - What's the AI graduation problem?
[33:16.16] Is it, I need to fine tune the model,
[33:18.04] I need better economics,
[33:19.36] any insights from customers?
[33:21.48] - Yeah, I mean, better economics certainly.
[33:23.08] But although like, I would say like, even for people who like,
[33:25.64] you know, need like thousands of GPUs,
[33:27.68] just because we can drive utilization so much better.
[33:29.80] Like we, there's actually like a cost advantage
[33:32.12] of staying on model.
[33:33.36] But yeah, I mean, certainly like, you know,
[33:34.72] and like the fact that VCs like love, you know,
[33:36.88] throwing money at least used to, you know,
[33:38.88] add companies who need it to buy GPUs.
[33:40.76] I think that didn't help the problem.
[33:42.32] And in training, I think, you know,
[33:43.52] there's less software differentiation.
[33:45.44] So in training, I think there's certainly like better economics
[33:47.48] of like buying big clusters.
[33:48.96] But I mean, I hope it's gonna change, right?
[33:51.12] Like I think, you know, we're still pretty early in the cycle
[33:53.44] of like building AI infrastructure.
[33:55.96] And I think a lot of these companies over in the long run,
[33:59.00] like, you know, they're, except it may be super big ones,
[34:01.44] like, you know, on Facebook and Google,
[34:03.12] they're always gonna build their own ones.
[34:04.16] But like everyone else, like, to some extent, you know,
[34:06.84] I think they're better off like buying platforms.
[34:09.28] And you know, someone's gonna have to build those platforms.
[34:11.52] - Yeah, cool.
[34:13.28] Let's move on to language models.
[34:15.48] And just specifically that workload,
[34:17.12] just to flesh it out a little bit.
[34:18.72] You already said that Ramp is like fine-tuning 100 models
[34:22.28] like once simultaneously on modal.
[34:24.92] Closer to home, the, my favorite example is Eric bought.
[34:28.56] Maybe you want to tell that story?
[34:30.08] - Yeah, I mean, it was a prototype thing we built for fun,
[34:32.84] but it was pretty cool.
[34:33.68] Like we basically built this thing that hooks up to Slack.
[34:35.96] It like downloads all the Slack history
[34:38.04] and, you know, fine-tuning is a model based on a person.
[34:40.20] And then you can chat with that.
[34:41.72] And so you can like, you know, clone yourself
[34:43.40] and like talk to yourself.
[34:44.24] It's like, I mean, it's like nice like demo.
[34:46.16] And it's like, I think like it's like fully contained model.
[34:48.92] Like there's a modal app that does everything, right?
[34:51.12] Like it downloads Slack, you know,
[34:52.64] integrates the Slack API, like downloads the stuff,
[34:55.04] the data, like just runs the fine-tuning.
[34:57.04] And then like creates like dynamically
[34:58.76] an inference endpoint and it's all like self-contained
[35:01.16] and like, you know, a few underlines of code.
[35:02.36] So I think it's sort of a good kind of use case for more,
[35:05.20] like it kind of demonstrates a lot of the capabilities
[35:07.32] of modal.
[35:08.16] - Yeah.
[35:08.98] - And now more personal side,
[35:09.96] how close did you feel Eric bought was to you?
[35:13.72] - It definitely captured the like, the language.
[35:16.90] - Uh-huh.
[35:18.12] - Yeah.
[35:18.96] I mean, I don't know, like the content,
[35:20.52] I always feel this way about like AI.
[35:22.12] And it's gotten better,
[35:22.96] but like you look at like AI output of text.
[35:25.64] Like, and it's like, when you glance at it,
[35:27.60] it's like, yeah, the sim's really smart, you know,
[35:29.96] but then you actually like look a little bit deeper.
[35:31.48] It's like, what does this mean?
[35:32.92] What does this person say?
[35:33.76] It's like kind of vacuous, right?
[35:35.00] And that's like kind of what I felt like, you know,
[35:36.68] talking to like my clone version.
[35:38.52] Like it's like says like things like the grammar is correct.
[35:41.20] Like some of the sentences make a lot of sense,
[35:42.96] but like, what are you trying to say?
[35:44.40] Like there's no content here, I don't know.
[35:46.48] I mean, it's like, I got that feeling also with chat TBT
[35:49.00] in the like early versions, right?
[35:50.16] Now it's like better, but.
[35:51.28] - That's funny.
[35:52.12] Yeah, I built this thing called small podcast there
[35:53.68] to automate a lot of our back office work.
[35:56.08] So to speak, and it's great at transcript,
[35:58.60] it's great at doing chapters.
[36:00.60] And then I was like, okay,
[36:01.76] how about you come up with a short summary?
[36:03.80] And it's like, it sounds good,
[36:05.44] but it's like, it's not even the same ballpark.
[36:07.20] It's like, well, we end up writing.
[36:08.84] And it's hard to see how it's going to get there.
[36:11.32] - Oh, I have ideas.
[36:12.84] - Yeah.
[36:13.76] - I'm certain it's going to get there,
[36:15.44] but like, I agree with you, right?
[36:17.12] And like, I have the same thing.
[36:18.24] I don't know if you read out like AI generated books.
[36:20.68] Like they just like kind of seem funny, right?
[36:22.72] Like there's off, right?
[36:23.84] But like you glance at them and it's like,
[36:25.24] oh, it's kind of cool.
[36:26.52] Like looks correct,
[36:27.56] but then it's like very weird when you actually read them.
[36:29.72] - Yeah.
[36:30.88] So for what it's worth,
[36:32.20] I think anyone can join a modal slack.
[36:33.60] I just open to the bottom.
[36:34.76] - Yeah, totally.
[36:35.60] If you go to modal.com, there's a button in the footer.
[36:38.48] - Yeah, and then you can talk to Eric Pot.
[36:40.40] And then sometimes I really like picking Eric Pot,
[36:42.68] and then you answer afterwards,
[36:44.08] but then you're like,
[36:44.92] - Really?
[36:45.76] - Yeah, mostly correct.
[36:46.60] Like whatever.
[36:47.44] - Cool.
[36:48.28] - Any other broader lessons, you know,
[36:49.68] just broadening out from like the single use case
[36:52.00] of fine tuning, like what are you seeing people do
[36:55.20] with fine tuning or just language models
[36:57.44] on modal in general?
[36:58.60] - Yeah, I mean, I think language models is interesting
[37:00.60] because so many people get started with APIs,
[37:04.16] and that's just, you know,
[37:05.24] they're just dominating a space
[37:06.44] in particular opening AI, right?
[37:07.80] And that's not necessarily like a place
[37:09.32] where we aim to compete.
[37:10.64] I mean, maybe at some point,
[37:11.48] but like it's just not like a core focus for us.
[37:13.20] And I think sort of separately sort of question
[37:15.32] if like there's economics in that longterm,
[37:16.80] but like, so we, we tend to focus on more
[37:19.08] like the areas like the around it, right?
[37:21.00] Like fine tuning, like another use case we have
[37:23.76] is a bunch of people ramp included
[37:25.44] is doing batch embeddings on modal.
[37:27.32] So let's say, you know, we have like a,
[37:29.24] actually we're like writing a blog post
[37:30.52] like we take all the Wikipedia
[37:32.84] and like paralyze embeddings in 15 minutes
[37:35.36] and produce vectors for each article.
[37:37.80] So those types of use cases,
[37:39.12] I think modal suits really well for,
[37:40.72] I think also a lot of like custom inference,
[37:42.76] like we have like that.
[37:43.60] - Yeah, when you say parallelize,
[37:45.44] I think you should give people an idea
[37:47.08] of the order of magnitude of parallelism
[37:48.84] because I think people don't understand how parallel.
[37:51.60] So like, I think your classic hello world with modal
[37:54.32] is like some kind of Fibonacci function, right?
[37:57.12] - Yeah, we have a bunch of different ones.
[37:57.96] - A recursive function.
[37:58.80] - Yeah, yeah, I mean, like, yeah,
[38:00.04] I mean it's like pretty easy in modal,
[38:01.20] like fan out to like, you know,
[38:02.92] at least like a hundred GPUs like in a few seconds.
[38:05.00] And, you know, if you give it like a couple of minutes,
[38:06.96] like we can, you know, you can fan out
[38:08.48] to like thousands of GPUs.
[38:09.72] Like we run it relatively large scale.
[38:12.36] And yeah, we've run, you know, many thousands of GPUs
[38:16.40] at certain points when we need it.
[38:17.84] You know, big backfills
[38:18.96] or some customers had very large compute needs.
[38:21.00] - Yeah, yeah.
[38:21.84] And I mean, that's super useful for a number of things.
[38:25.28] So one of my early interactions with modal as well
[38:27.52] was with a small developer,
[38:28.92] which is my sort of coding agent.
[38:30.72] The reason I chose modal was a number of things.
[38:32.72] One, I just wanted to try it out.
[38:33.92] And I just had an excuse to try it.
[38:35.76] Akshad offered to onboard me personally.
[38:37.52] - Yeah, good excuse.
[38:38.76] - But the most interesting thing was that
[38:40.48] you could have that sort of local development experience
[38:43.12] as well as running it on my laptop,
[38:44.40] but then it would seamlessly translate to a cloud service.
[38:47.16] Or like cloud hosted environment.
[38:49.32] And then it could fan out with concurrency controls.
[38:51.72] So I could say like, because like, you know,
[38:53.44] the number of times I hit the GPT-3 API at the time
[38:57.16] was going to be subject to the rate limit from there.
[38:59.80] But I wanted to fan out
[39:00.84] without worrying about the kind of stuff.
[39:02.88] With modal, I can just kind of declare that in my config.
[39:06.04] And that's it.
[39:06.88] - Oh, like a concurrency limit?
[39:07.88] - Yeah.
[39:08.72] - Yeah, there's a lot of control there.
[39:09.56] - Yeah, so I just want to highlight that to people.
[39:11.56] I was like, yeah, this is a pretty good use case for like,
[39:13.72] writing this kind of LLM application code
[39:16.68] inside of this environment
[39:18.08] that just understands fan out and rate limiting natively.
[39:22.52] You don't actually have an exposed queue system,
[39:24.48] but you have it under the hood.
[39:25.56] - Totally.
[39:26.40] - That kind of stuff.
[39:27.30] - It's a self-provisioning code.
[39:29.08] (laughing)
[39:30.80] - So the last part of modal I wanted to touch on
[39:33.32] and obviously feel free,
[39:34.24] I know you're working on new features,
[39:36.64] was the sandbox that was introduced last year.
[39:39.72] And this is something that I think was inspired
[39:42.52] by Code Interpreter.
[39:43.36] You can tell me the longer history behind that.
[39:45.52] - Yeah, like we originally built it for the use case.
[39:48.52] Like there was a bunch of customers
[39:50.16] who looked into code generation applications
[39:52.56] and then they came to us and asked us,
[39:54.84] is there a safe way to execute code?
[39:56.76] And yeah, we spent a lot of time
[39:57.76] on like container security.
[39:58.76] We used GeoVisor and for instance,
[40:00.12] which is a Google product
[40:01.20] that provides pretty strong isolation of code.
[40:03.84] So we built a product where you can like basically
[40:05.76] like run arbitrary code inside a container
[40:07.84] and monitors output or like get it back in a safe way.
[40:12.72] I mean, over time it's like evolved into more of like,
[40:15.32] I think the long-term direction
[40:16.44] is actually I think more interesting,
[40:17.64] which is that I think modal as a platform
[40:20.92] where like, I think the core like container infrastructure
[40:24.08] we offer could actually be like,
[40:26.24] unbounded from like the client SDK
[40:28.52] and offer to like other,
[40:30.36] we're talking to a couple of like other companies
[40:32.20] that want to run through their packages,
[40:34.96] like run, execute jobs on modal like kind of programmatically.
[40:39.76] So that's actually the direction like sandbox is going.
[40:41.76] It's like turning into more like a platform for platforms
[40:44.04] is kind of what I've been thinking about it.
[40:45.24] - Oh boy, platform, that's the old Kubernetes line.
[40:48.36] - Yeah, yeah, yeah, but it's like, you know,
[40:50.24] like having that ability to like programmatically,
[40:53.32] you know, create containers and execute them,
[40:55.64] I think it's really cool.
[40:57.52] And I think it opens up a lot of interesting capabilities
[41:00.36] that are sort of separate from the like core Python SDK in modal.
[41:04.92] So I'm really excited about C.
[41:06.24] I mean, it's like one of those features
[41:07.32] that we kind of released and like, you know,
[41:09.56] then we kind of look at like what users actually build with it
[41:11.84] and people are starting to build like kind of crazy things.
[41:13.84] And then, you know, we double down on some of those things
[41:16.16] 'cause when we see like, you know, potential new product features.
[41:19.24] And so sandbox, I think in that sense,
[41:20.76] it's like kind of in that direction,
[41:22.56] we found a lot of like interesting use cases
[41:24.80] in the direction of like platformized container runner.
[41:27.76] - Can you be more specific about what you're double down on
[41:30.12] after seeing users in action?
[41:32.08] - Yeah, I mean, we're working with like some companies that,
[41:35.24] I mean, without getting to specifics,
[41:36.96] like that need the ability to take their users code
[41:40.72] and then launch containers on modal.
[41:44.00] And it's not about security necessarily,
[41:45.76] like they just want to use modal as a back end, right?
[41:47.92] Like they may already provide like Kubernetes as a back end,
[41:50.76] Lambda as a back end,
[41:51.76] and now they want to add modal as a back end, right?
[41:53.76] And so, you know, they need a way to programmatically define jobs
[41:57.60] on behalf of their users and execute them.
[41:59.48] And so, I don't know, that's kind of abstract,
[42:01.80] but is that makes sense?
[42:02.76] - Yeah, I totally get it.
[42:03.60] It's sort of one level of recursion
[42:05.56] to sort of beat the modal for their customers.
[42:08.96] - Exactly, yeah, exactly.
[42:10.20] And Cloudflare has done this,
[42:11.80] you know, the Kenton Vardar from Cloudflare
[42:13.64] was like the tech lead on this thing,
[42:15.16] called it sort of functions as a service, as a service.
[42:17.28] - Yeah, that was exactly right.
[42:19.52] Fastass.
[42:20.84] - Fastass.
[42:21.68] - Fastass.
[42:22.52] - Yeah, like, I mean, like that,
[42:23.84] I think any base layer, second layer cloud provider
[42:27.72] like yourself, compute provider like yourself should provide.
[42:30.76] You know, it's a marker of maturity and success
[42:32.80] that people just trust you to do that.
[42:34.68] They'd rather build on top of you than compete with you.
[42:37.16] The more interesting thing for me is like,
[42:38.84] what does it mean to serve a computer,
[42:41.28] like an LLM developer rather than a human developer, right?
[42:44.76] Like that's what a sandbox is to me.
[42:46.64] - Yeah, for sure.
[42:47.48] - That you have to redefine modal
[42:48.72] to serve a different non-human audience.
[42:51.24] - Yeah, yeah, yeah.
[42:52.24] And I think there's some really interesting people,
[42:54.00] you know, building very cool things.
[42:55.40] - Yeah, so I don't have an answer,
[42:57.16] but you know, I imagine things like,
[42:59.52] hey, the way you give feedback is different.
[43:02.24] You maybe have to like stream errors, log errors differently.
[43:06.28] I don't really know.
[43:07.56] - Yeah.
[43:08.40] - Obviously there's like safety considerations.
[43:10.08] Maybe you have an API to like restrict access to the web.
[43:13.12] - Yeah.
[43:13.96] - I don't think anyone would use it,
[43:16.00] but it's there if you want it.
[43:17.04] - Yeah, yeah.
[43:18.16] - Any other sort of design considerations?
[43:20.12] I have no idea.
[43:21.44] - With sandboxes?
[43:22.28] - Yeah, yeah.
[43:23.12] Open-ended question here.
[43:24.48] - Yeah, I mean, no, I think, yeah,
[43:26.28] the network restrictions I think make a lot of sense.
[43:28.80] Yeah, I mean, I think, you know, long-term like,
[43:31.56] I think there's a lot of interesting use cases
[43:32.92] where like the LLM instead in itself can like decide,
[43:36.12] I want to install these packages and like run this thing.
[43:38.48] And like, obviously for a lot of those use cases,
[43:40.52] like you want to have some sort of control
[43:42.32] that it doesn't like install malicious stuff
[43:44.12] and steal your secrets and things like that.
[43:45.68] But I think that's what's exciting
[43:47.72] about the sandbox primitive is like,
[43:48.92] it lets you do that in a relatively safe way.
[43:50.72] - Do you have any thoughts on the inference wars?
[43:54.84] A lot of providers are just rushing to the bottom
[43:57.36] to get the lowest price per million tokens.
[43:59.76] Some of them, you know, Sean ran them out.
[44:02.52] They're just losing money.
[44:03.68] There's like the physics of it.
[44:05.80] Just don't work out for them to make any money on it.
[44:08.32] How do you think about your pricing
[44:10.08] and like how much premium you can get
[44:12.64] and you can kind of command versus using lower prices
[44:15.84] as kind of like a wedge into getting there,
[44:17.64] especially once you have a model instrumented.
[44:20.08] What are the trade-offs and any thoughts on strategies?
[44:23.68] - I mean, we focus more on like custom models
[44:25.40] and custom code.
[44:26.44] And I think in that space, there's like less competition.
[44:29.44] And I think we can have a pricing markup, right?
[44:32.20] Like, you know, people will always compare our prices
[44:34.56] to like, you know, the GPU power they can get elsewhere.
[44:36.64] And so how big can that markup be?
[44:38.68] Like it never can be, you know,
[44:39.68] we can never charge like 10X more,
[44:41.24] but we can certainly charge a premium.
[44:42.48] And like, you know, for that reason,
[44:43.40] like we can have pretty good margins.
[44:44.92] The LLM space is like the opposite.
[44:46.36] Like the switching costs of LLMs is zero.
[44:48.64] Like, if all you're doing is like straight up,
[44:50.64] like at least like open source, right?
[44:52.40] Like if all you're doing is like, you know,
[44:54.08] using some, you know, inference endpoint
[44:56.80] that serves an open source model.
[44:58.80] And, you know, some other provider comes along
[45:00.36] and like offers a lower price price.
[45:01.60] You're just going to switch, right?
[45:02.44] So I don't know, to me that reminds me a lot of like,
[45:05.36] all this like 15 minute delivery wars
[45:07.28] or like, you know, like Uber versus Lyft, you know,
[45:09.92] and like maybe going back even further,
[45:11.32] like I think a lot about like the sort of, you know,
[45:13.36] flip side of this is like, this actually positive side
[45:15.56] of it is like, I thought a lot about like fiber optics,
[45:17.84] boom of like 98, 99, like the other day, or like, you know,
[45:21.36] and also like the overinvestment in GPU today, like, like,
[45:24.64] yeah, like, you know, I don't know, like in the end,
[45:26.72] like I don't think VCs will have the return they expected,
[45:29.76] like, you know, in these things,
[45:31.60] but guess who's going to benefit?
[45:32.84] Like, you know, it's the consumers.
[45:34.52] Like someone's like reaping the value of this.
[45:36.84] And that's I think an amazing flip side is that, you know,
[45:39.60] we should be very grateful, you know, the fact that like,
[45:41.92] VCs want to subsidize these things, which is, you know,
[45:45.08] like you go back to fiber optics,
[45:46.32] like there was an extreme like overinvestment
[45:48.68] in fiber optics network and like 98.
[45:50.80] And no one made money who did that.
[45:52.96] But consumers, you know, got tremendous benefits
[45:56.80] of all the fiber optics cable that were led, you know,
[45:58.96] throughout the country in the decades after.
[46:01.20] I feel something similar about like GPUs today
[46:03.88] and also like specifically looking like more narrowly
[46:05.68] at like LLM in the French market, like, that's great.
[46:08.00] Like, you know, I'm very happy that, you know,
[46:11.16] there's a price war.
[46:12.80] Modal is like not necessarily like participating
[46:15.40] in that price war, right?
[46:16.24] Like I think, you know, it's going to shake out
[46:17.84] and then someone's going to win
[46:19.16] and then they're going to raise prices or whatever.
[46:20.68] Like we'll see how that works out.
[46:22.28] But for that reason, like we're not like hyper focused
[46:24.76] on like serving, you know, just like straight up,
[46:27.32] like here's an endpoint for to an open source model.
[46:29.96] We think the value in modal comes from all these, you know,
[46:32.84] the other use cases, the more custom stuff,
[46:34.84] like fine tuning and complex, you know,
[46:36.76] guided output, like type stuff.
[46:38.28] Or like also like in other like outside of LLMs,
[46:40.44] like focus a lot more like image, audio, video stuff.
[46:43.48] 'Cause that's where there's a lot more proprietary models.
[46:45.96] There's a lot more like custom workflows.
[46:47.52] And that's where I think, you know, modal is more, you know,
[46:51.04] there's a lot of value in software differentiation.
[46:53.52] I think focusing on developer experience
[46:55.28] and developer productivity.
[46:56.80] That's where I think, you know,
[46:57.68] you can have more of a competitive mode.
[47:00.48] - I'm curious what the difference is going to be now
[47:03.12] that it's an enterprise.
[47:04.08] So like with DoorDash, Uber,
[47:06.80] they're going to charge you more and like as a customer,
[47:08.84] like you're going to decide to not take a Uber.
[47:10.96] But if you're a company building AI features
[47:13.04] in your product using the subsidized prices,
[47:15.44] and then, you know, the VC money dries up in a year
[47:18.52] and like prices go up, it's like,
[47:20.56] you can't really take the features back.
[47:22.76] Without a lot of backlash,
[47:23.84] but you also cannot really kill your margins
[47:26.08] by paying the new price.
[47:27.48] - So I don't know what that's going to look like, but.
[47:29.44] - But like margins are going to go up for sure.
[47:31.08] But I don't know if prices will go up.
[47:32.60] Cause like GPU prices have to drop eventually, right?
[47:36.44] So like, you know, like in the long run,
[47:38.36] I still think like prices may not go up that much,
[47:41.60] but certainly margins will go up.
[47:42.80] Like I think you said,
[47:43.64] switch that margins are negative right now.
[47:45.24] Like, you know, obviously that's not sustainable.
[47:49.48] So certainly margins will have to go up.
[47:50.92] Like some companies are going to have to make money
[47:52.28] for it in this space.
[47:53.32] Otherwise like they're not going to provide the service.
[47:55.16] But that's equilibrium too, right?
[47:56.40] Like at some point, like, you know,
[47:57.80] the sort of stabilizes and one or two
[48:00.24] or three providers make money.
[48:02.48] - Yeah, what else is maybe underrated a model,
[48:05.24] something that people don't talk enough about,
[48:08.00] or yeah, that we didn't cover in the discussion.
[48:11.40] - Yeah, I think, what are some other things?
[48:13.96] We talked about a lot of stuff.
[48:15.04] Like we have the bursty parallelism.
[48:16.56] I think that's pretty cool.
[48:17.84] Working on a lot of like, trying to figure out like,
[48:20.28] you know, like kind of thinking more about the roadmap,
[48:22.00] but like one of the things I'm very excited about is
[48:24.56] building primitives for like more like
[48:26.32] I/O intensive workloads.
[48:27.88] And so like we're building some like crude stuff right now
[48:30.92] where like you can like create like direct TCP tunnels
[48:33.04] to containers and that lets you like pipe data.
[48:35.08] And like, like, you know, we haven't really explored
[48:37.28] as much as it was showed,
[48:38.20] but like there's a lot of interesting applications.
[48:39.80] Like you can actually do like kind of real time video stuff
[48:42.44] in model now, because you can like create a tunnel to,
[48:45.68] exactly you can create a raw TCP socket to a container,
[48:48.84] feed it video and then like, you know,
[48:50.72] get the video back.
[48:51.68] And I think like it's still like a little bit like,
[48:54.56] you know, not fully ergonomically like figured out,
[48:56.84] but I think there's a lot of like super cool stuff.
[48:59.40] Like when we start enabling those more like high I/O workloads.
[49:04.40] I'm super excited about.
[49:05.84] I think also like, you know, working with large data sets
[49:07.96] or kind of taking the ability to map and found out
[49:10.84] and like building more like higher level,
[49:12.28] like functional primitives, like filters and group buys
[49:14.84] and joins like, I think there's a lot of like
[49:16.60] really cool stuff you can do.
[49:18.00] But this is like maybe like, you know, years out like.
[49:21.24] Yeah, we can just broaden out from model a little bit,
[49:23.36] but you still have a lot of, you have a lot of great tweets.
[49:25.40] So it's very easy to just kind of go through them.
[49:28.52] Why is Oracle underrated?
[49:30.44] - I love Oracle's GPUs.
[49:32.52] I don't know why, you know,
[49:34.16] what the economics looks like for Oracle,
[49:36.26] but I think they're great value for money.
[49:38.04] Like we run a bunch of stuff in Oracle
[49:39.76] and they have bare metal machines,
[49:41.20] like two terabytes of RAM, they're like super fast SSDs.
[49:44.16] You know, I mean, I mean, we love AWS and a GCP too.
[49:46.40] We have great relationships with them.
[49:47.72] But I think Oracle's surprising like, you know,
[49:50.06] if you told me like three years ago
[49:51.32] that I would be using Oracle cloud, like what, wait, why?
[49:55.20] But now I'm, you know, I'm a happy customer.
[49:57.00] - And it's a combination of pricing
[49:58.80] and the kinds of SKUs I guess they offer.
[50:01.92] - Yeah, great, great machines, good prices, you know.
[50:04.96] - That's it.
[50:05.80] - Yeah, yeah.
[50:06.64] - That's all you care about.
[50:07.48] - Yeah, the sales team is pretty fun too, like I like them.
[50:09.16] - In Europe, people often talk about Hedzner.
[50:11.80] - Yeah, like we've focused on the main clouds, right?
[50:14.88] Like we've, you know, Oracle, AWS, GCP,
[50:16.60] we'll probably add Azure at some point.
[50:18.08] I think, I mean, there's definitely a long tail
[50:19.92] of like, you know, CoreWeave, Hedzner,
[50:22.62] yeah, like Lambda, like all these things.
[50:25.14] And like over time, I think we'll look at those too.
[50:27.14] Like, you know, wherever we can get the right, you know,
[50:29.10] GPUs at the right price.
[50:31.26] Yeah, I mean, I think it's fascinating.
[50:32.46] Like it's a tough business.
[50:35.14] Like I wouldn't want to try to build like a cloud provider.
[50:37.94] You know, it's just, you just have to be like incredibly
[50:40.46] focused on like, you know, efficiency and margins
[50:43.02] and things like that.
[50:43.86] But I mean, I'm glad people are trying.
[50:45.90] - Yeah, and you can ramp up on any of these clouds
[50:48.62] very quickly, right?
[50:49.46] - Yeah, I mean, yeah, like, I think so.
[50:52.00] Like, you know, what modal does is like programmatic,
[50:54.96] you know, launching and termination of machines.
[50:57.16] So that's like, what's nice about the clouds is, you know,
[51:00.44] they have relatively like immature APIs for doing that,
[51:03.52] as well as like, you know, support for Terraform,
[51:05.44] for all the networking and all that stuff.
[51:07.12] That makes it easier to work with the big clouds.
[51:09.24] But yeah, I mean, some of those things, I think, you know,
[51:11.20] I also expect the smaller clouds to like embrace those things
[51:14.00] in the long run, but also think, you know, you know,
[51:16.04] we can also probably integrate with some of the clouds.
[51:17.88] Like even without that, there's always an HTML API
[51:20.96] that you can use, just like script something
[51:23.66] that launches instances, like through the web.
[51:25.82] - Yeah, yeah.
[51:26.66] I think a lot of people are always curious about
[51:28.18] whether or not you will buy your own hardware someday.
[51:31.26] I think you're a pretty firm in that.
[51:32.62] It's not your interest.
[51:33.98] But like, your story and your growth does remind me
[51:37.34] a little bit of Cloudflare, which obviously, you know,
[51:40.58] invest a lot in its own physical network.
[51:42.66] - Yeah, I don't remember like early days,
[51:44.62] like did they have their own hardware or?
[51:46.70] - No, they push out a lot with like agreements
[51:49.18] through other, you know, providers.
[51:52.26] - Yeah, okay, interesting.
[51:53.10] - But now it's all their own hardware.
[51:55.50] - Yeah. - Sorry, I understand.
[51:57.54] - Yeah, I mean, my feeling is that
[52:00.02] when you're venture funded startup,
[52:01.62] like buying physical hardware is maybe not the best use
[52:05.60] of the money.
[52:06.44] - I really wanted to put you in a room
[52:08.14] with Isocat from poolside.
[52:10.54] - Yeah.
[52:11.38] - 'Cause he has the complete opposite view.
[52:12.22] - Yeah. - It is great.
[52:13.80] - I mean, I don't like, I just think for like
[52:15.18] a capital efficiency point of view,
[52:16.40] like do you really want to tie up that much money
[52:18.06] in like, you know, physical hardware
[52:19.34] and think about depreciation and like,
[52:21.18] like as much as possible, like I, you know,
[52:24.74] I favor a more capital efficient way of like,
[52:27.10] we don't want to own the hardware,
[52:28.22] 'cause then, and ideally we want to,
[52:30.50] we want the sort of margin structure to be sort of like,
[52:33.66] 100% correlated revenue and cogs in the sense that like,
[52:36.70] you know, when someone comes and pays us,
[52:38.98] you know, one dollar for compute, like, you know,
[52:41.02] we immediately incur a cost of like whatever,
[52:43.54] like 70 cents, 80 cents, you know,
[52:45.40] and there's like complete correlation
[52:46.96] between cost and revenue.
[52:48.36] 'Cause then you can leverage up in like,
[52:49.88] a kind of a nice way, you can scale very efficiently,
[52:52.08] you know, like that's not, you know,
[52:54.28] turns out like that's hard to do.
[52:55.96] Like you can't just only use like,
[52:57.32] spotting on-demand instances.
[52:58.52] Like over time, we've actually started adding
[53:00.72] a pretty significant amount of reservations too.
[53:02.84] So I don't know, like reservations is always like,
[53:04.72] one step towards owning your own hardware.
[53:07.08] Like, I don't know, like, do we really want to be,
[53:09.00] you know, thinking about switches and cooling
[53:12.12] and HVAC and like power supplies?
[53:14.16] - Exactly, recovery.
[53:15.46] - Yeah, like, is that the thing I want to think about?
[53:17.62] Like, I don't know, like I like to make developers happy,
[53:19.82] but who knows, like maybe one day, like,
[53:21.68] but I don't think it's gonna happen anytime soon.
[53:24.10] - Yeah, obviously, for what it's worth,
[53:26.22] obviously I believe running in cloud,
[53:28.82] but it's interesting to have the devil's advocate
[53:31.94] on the other side.
[53:32.94] The main thing you have to do is be confident
[53:34.70] that you can manage your depreciation better
[53:36.66] than the typical assumption, which is two to three years.
[53:40.10] - Yeah, yeah.
[53:40.94] And so the one when you have a CTO that tells you,
[53:42.92] no, I think I can make these things last seven years,
[53:45.72] then it changes the math.
[53:46.56] - Yeah, yeah, but, you know,
[53:48.24] are you deluding yourself then?
[53:49.64] - Yeah.
[53:50.48] - That's the question, right?
[53:51.40] It's like the waste management scandal.
[53:53.32] Do you know about that?
[53:54.16] Like, they had all this like,
[53:55.12] like accounting scandal back in the 90s,
[53:57.24] like this garbage company, like was,
[54:00.32] they like started assuming their garbage trucks
[54:03.20] had a 10 year depreciation schedule,
[54:05.68] booked like a massive profit, you know,
[54:07.60] the stock went to like, you know, up like, you know,
[54:09.76] and then it turns out actually the,
[54:10.96] all those garbage trucks broke down and like,
[54:13.34] you can't really depreciate them over 10 years.
[54:15.26] And so, so then the whole company, you know,
[54:17.04] they had to restate all the earnings and they use.
[54:20.14] - Nice.
[54:21.46] Let's go into some personal nuggets.
[54:24.00] You received the IOI Gold Medal,
[54:26.34] which is the International Olympian in Informatics.
[54:29.66] - 20 years ago.
[54:30.82] - Yeah.
[54:31.66] How have these models and like going to change
[54:35.36] competitive programming?
[54:36.38] Like, do you think people are still love the craft?
[54:39.66] I feel like over time we're kind of like,
[54:41.84] programming has kind of lost maybe a little bit
[54:44.96] of its luster in the eyes of a lot of people.
[54:48.00] Yeah, I'm curious to see what you think.
[54:51.32] - I mean, maybe, but like, I don't know, like, you know,
[54:54.00] I've been coding for almost 30 or more than 30 years.
[54:56.52] And like, I feel like, you know, you look at like,
[54:59.16] programming and, you know, where it is today
[55:01.08] versus where it was, you know, 30, 40, 50 years ago.
[55:05.52] There's like, probably a thousand times more developers
[55:08.60] today than, you know, so like, in every year,
[55:10.74] there's more and more developers.
[55:12.10] And at the same time, developer productivity
[55:13.82] keeps going up.
[55:14.70] And when I look at the real world,
[55:16.26] I just think there's so much software
[55:18.14] that's still waiting to be built.
[55:20.10] Like, I think we can, you know,
[55:21.80] 10X the amount of developers and still, you know,
[55:24.34] have a lot of people making a lot of money, you know,
[55:27.14] building amazing software and also being,
[55:29.50] while at the same time being more productive.
[55:31.26] Like, I never understood this, like, you know,
[55:33.18] AI is going to, you know, replace engineers.
[55:35.54] That's very rarely how this actually works.
[55:38.22] When AI makes engineers more productive,
[55:40.68] like the demand actually goes up
[55:42.28] because the cost of engineers goes down
[55:43.84] because you can build software more cheaply.
[55:45.32] And that's, I think, the story of software in the world
[55:47.52] over the last few decades.
[55:48.76] So, I mean, I don't know how this,
[55:50.48] like relates to like, competitive programming is a,
[55:53.12] I don't know, kind of going back to your question.
[55:55.56] Competitive programming to me was always kind of a weird,
[55:57.68] kind of, you know, niche, like kind of, I don't know,
[56:00.36] I loved it.
[56:01.20] It's like puzzle solving.
[56:03.20] And like, my experience is like, you know,
[56:05.48] half of competitive programmers are able to translate that
[56:08.66] to actual, like, building cool stuff in the world.
[56:11.86] Half just, like, get really, you know,
[56:13.70] sucked into this, like, puzzle stuff
[56:15.10] and, you know, it never loses its grip on them.
[56:18.46] But, like, for me, it was an amazing way
[56:20.30] to get started with coding or get very deep into coding
[56:23.90] and, you know, kind of battle off with, like,
[56:26.26] other smart kids and traveling to different countries
[56:29.02] when I was a teenager.
[56:30.30] - So, I was just going to mention, like,
[56:31.38] it's not just that he personally is a competitive programmer.
[56:34.46] Like, I think a lot of people at Moto are competitive programmers.
[56:37.98] I think you met Akshah through--
[56:39.30] - Akshah, co-founder, is also I.I. Gold Medal.
[56:41.78] By the way, Gold Medal, doesn't mean you win.
[56:43.90] Like, but, although we actually had an intern that won I.O.I.
[56:47.24] Gold Medal is, like, the top 20, 30 people, roughly.
[56:49.96] - Yeah, obviously, it's very hard to get hired at Moto.
[56:52.28] But what is it like to work with, like,
[56:54.52] such a talent density, like, you know,
[56:56.68] how is that contributing to the culture at Moto?
[56:59.04] - Yeah, I mean, I think humans are the root cost
[57:01.88] of, like, everything at a company, right?
[57:04.00] Like, you know, bad code, because it's bad human,
[57:06.90] or, like, whatever, you know, bad culture.
[57:08.30] So, like, I think, you know, like, talent density is very important
[57:11.08] in, like, keeping the bar high and, like, hiring smart people.
[57:13.46] And, you know, it's not always, like, the case
[57:15.14] that, like, hiring competitive programmers is the right strategy,
[57:17.54] right? If you're building something very different,
[57:19.12] like, you may not, you know.
[57:20.32] But we actually end up having a lot of, like,
[57:22.12] hard, you know, complex challenges.
[57:24.74] Like, you know, I talked about, like, the cloud,
[57:27.58] you know, the resource allocation.
[57:29.72] Like, turns out, like, that actually, like,
[57:31.26] you can phrase that as, like, mixed integer programming problem.
[57:33.54] Like, we now have that running in production.
[57:35.04] Like, constantly optimizing how we allocate cloud research.
[57:37.84] There's a lot of, like, interesting, like, complex,
[57:39.68] like, scheduling problems, and, like,
[57:41.68] how do you do all the bin packing of all the containers?
[57:43.84] Like, so, you know, I think, for what we're building,
[57:46.64] you know, it makes a lot of sense to hire these people
[57:48.32] who, like, like, those very hard problems.
[57:50.56] Yeah. And they don't necessarily have to know
[57:52.62] the details of the stack.
[57:53.72] They just need to be very good at algorithms.
[57:55.80] Uh, no, but, like, my feeling is, like, people who are, like,
[57:58.70] pretty good at competitive programming,
[58:00.32] they can also pick up, like, other stuff, like, elsewhere.
[58:03.48] Not always the case, but, you know,
[58:05.24] there's definitely a high correlation.
[58:06.88] Oh, yeah. I'm just, I'm interested in that, just because,
[58:09.62] you know, like, there's competitive mental talents
[58:12.62] in other areas, like, competitive, um, speed memorization.
[58:16.02] Yeah. Whatever.
[58:16.98] And, like, you know, don't, don't really see those transfer.
[58:19.66] And I always assumed, in my narrow perception,
[58:22.26] that competitive programming is so specialized.
[58:24.96] It's so obscure, even, like, so divorced from real-world,
[58:28.46] uh, scenarios that, um, it doesn't actually transfer that much.
[58:31.50] But obviously, I think, for the problems that you work on,
[58:33.00] it does.
[58:33.82] But, but it's also, like, you know, frankly,
[58:35.70] it's, like, translates to some extent,
[58:37.62] not because, like, the problems are the same,
[58:39.36] but just because, like, it sort of filters for the, you know,
[58:41.34] people who are, like, willing to go very deep
[58:43.64] and work hard on things, right?
[58:45.60] Like, I, I, I feel like a similar thing is, like,
[58:48.44] a lot of good developers are, like, talented musicians.
[58:51.74] Like, why? Like, why is this a correlation?
[58:53.74] And, like, my theory is, like, you know,
[58:55.48] it's the same sort of skill. Like, you have to, like,
[58:57.02] just hyper-focus on something and practice a lot.
[58:59.62] Like, and, and there's something similar that I think
[59:01.32] creates, like, good developers.
[59:02.78] Yeah.
[59:03.70] Sweden also had a lot of very good counter-strike players.
[59:06.48] I don't know, why did, why did Sweden have
[59:08.74] fiber optics before all of Europe?
[59:10.70] I feel like, I grew up in Italy,
[59:12.88] and our internet was terrible.
[59:15.42] And then, I feel like, all the Nordics
[59:17.58] are, like, amazing internet.
[59:18.88] I remember getting online, and people in the Nordics
[59:21.18] are, like, five-ping, ten-ping.
[59:23.06] Yeah, we had very good network back then.
[59:25.48] Yeah, do you know why?
[59:26.86] I mean, I'm sure, like, you know, I think the government,
[59:29.96] you know, did certain things quite well, right?
[59:32.36] Like, in the '90s, like, there was, like,
[59:34.16] a bunch of tax rebates for, like, buying computers.
[59:36.22] And I think there were similar, like, investments
[59:37.90] in infrastructure. I mean, like, and I think, like,
[59:39.80] I always think about, you know, it's like,
[59:41.44] I still can't use my phone in the subway in New York.
[59:43.90] And that was something I could use in Sweden in '95.
[59:47.10] You know, we're talking, like, 40 years almost, right?
[59:49.34] Like, like, why?
[59:51.24] And I don't know, like, I think certain infrastructure,
[59:53.44] you know, Sweden was just better, I don't know.
[59:55.62] Yeah.
[59:56.54] And also, you never owned a TV or a car?
[59:59.22] Never owned a TV or a car. I never had a driver's license.
[60:01.38] How do you do that in Sweden, though?
[60:03.08] Like, that's cold.
[60:03.92] I grew up in a city. I mean, like,
[60:05.42] I took the subway everywhere, with a bike or whatever.
[60:08.76] Yeah. I always lived in cities, so I don't, you know,
[60:11.40] I never felt. I mean, I, like, we have a,
[60:14.16] like, me and my wife as a car, but like, I--
[60:16.80] That doesn't count.
[60:18.14] I mean, it's her name, 'cause I don't have a driver's license.
[60:20.48] She drives me everywhere. It's nice.
[60:23.04] Nice.
[60:23.88] Yeah, it's fantastic.
[60:24.84] I was gonna ask you, like, the last thing I had
[60:26.84] on this list was your advice to people thinking
[60:28.88] about running some sort of run code in the cloud startup,
[60:31.42] is only do it if you're genuinely excited
[60:33.46] about spending five years thinking
[60:34.82] about load balancing, page false, cloud security, and DNS.
[60:37.06] So, basically, like, it sounds like you're summing up
[60:38.56] a lot of pain running model.
[60:40.98] Yeah. Yeah.
[60:41.82] Like, one thing I struggled with, like,
[60:43.06] I talked to a lot of people starting companies
[60:45.40] in the data space or, like, AI space or whatever,
[60:48.02] and they sort of come at it at, like, you know,
[60:49.88] from, like, an application developer point of view,
[60:51.58] and they're like, "I'm gonna make this better."
[60:53.24] But, like, guess how you have to make it better?
[60:54.86] It's like, you have to go very deep
[60:56.26] on the infrastructure layer.
[60:57.26] And so, one of my frustrations has been, like,
[60:59.04] so many startups are, like, in my opinion,
[61:00.38] like, Kubernetes wrappers,
[61:01.48] and not very, like, thick wrappers,
[61:02.80] like, fairly thin wrappers.
[61:04.12] And I think, you know, every startup is a wrapper,
[61:06.24] to some extent, but, like, you need to be, like,
[61:07.60] a fat wrapper.
[61:08.44] You need to, like, go deep and, like, build some stuff.
[61:10.32] And that's, like, you know, if you build a tech company,
[61:12.36] you're gonna want to have-- you're gonna have to spend,
[61:14.24] you know, five, 10, 20 years of your life, like,
[61:16.96] going very deep and, like, you know,
[61:18.40] building the infrastructure you need
[61:19.96] in order to, like, make your product truly stand out
[61:22.20] and be competitive.
[61:23.48] And so, you know, I think that goes for everything.
[61:25.40] I mean, like, you're starting a whatever, you know,
[61:28.10] online retailer of, I don't know, bathroom sinks.
[61:31.14] You have to be willing to spend 10 years of your life
[61:33.50] thinking about, you know, whatever, bathroom sinks.
[61:36.18] Like, otherwise, it's gonna be hard.
[61:38.06] - Yeah, I think that's good advice for everyone.
[61:39.58] And, yeah, congrats on all your success.
[61:41.62] It's pretty exciting to watch it.
[61:43.46] It's just the beginning.
[61:44.30] - Yeah, yeah, yeah, it's exciting.
[61:45.86] And everyone should sign up and try at modal.com.
[61:48.62] - Yeah, now it's GA.
[61:49.46] - Yeah.
[61:50.46] - Used to be behind a wait list.
[61:51.82] - Yeah.
[61:52.66] - Awesome, Eric.
[61:53.48] Thanks you so much for coming on.
[61:54.30] - Yeah, it's amazing.
[61:55.14] - Thanks.
[61:56.74] (upbeat music)
[61:59.32] (upbeat music)
[62:01.90] (upbeat music)
[62:04.48] (upbeat music)
[62:07.90] (upbeat music)
[62:10.48] (upbeat music)
[62:13.06] (upbeat music)
[62:15.64] (upbeat music)
[62:18.22] (upbeat music)
[62:20.80] [BLANK_AUDIO]