transcript-site/content/post/Latent Space/Latent-Space-A-Brief-History-of-the-Open-Source-AI-Hacker---with-Ben-Firshman-of-Replicate.lrc

[by:whisper.cpp]
[00:00.00] Hey everyone, welcome to the Lytton Space Podcast.
[00:08.04] This is Alessio, partner and CTO of residence at Decibel Partners, and I'm joined by my
[00:12.10] co-host Swix, founder of SmallEye.
[00:14.32] Hey, and today we have Ben first-man in the studio.
[00:17.52] Welcome Ben.
[00:18.52] Hey, good to be here.
[00:19.52] Ben, you're a co-founder CEO of Replicate.
[00:22.04] Before that, you were most notably founder of FIG, which became Docker Compose.
[00:25.76] You also did a couple of other things before that, but that's what a lot of people know
[00:29.16] you for.
[00:30.16] What should people know about you outside of your LinkedIn profile?
[00:35.16] Yeah, good question.
[00:37.04] I think I'm a builder and tinkerer in a very broad sense, and I love using my hands to
[00:41.56] make things, so I work on things maybe a bit closer to tech, like electronics.
[00:47.50] I also build things out of wood, and I fix cars, and I fix my bike, and build bicycles,
[00:55.08] and all this kind of stuff, and there's so much, I think I've learned from transferable
[01:00.72] skills from just working in the real world to building things in software, and so much
[01:07.12] about being a builder both in real life and in software that crosses over.
[01:11.60] Is there a real world analogy that you use often when you think about a code architecture
[01:15.92] or problem?
[01:17.60] I like to build software tools as if they were something real.
[01:21.76] So I wrote this thing called the command line interface guidelines, which was a bit like
[01:26.44] sort of the Mac human interface guidelines, but for command line interfaces, I did it
[01:30.06] with the guy I created Docker Compose with and a few other people, and I think something
[01:35.32] in there, I think I described that your command line interface should feel like a big iron
[01:39.84] machine where you pull a lever and it goes clunk, and things should respond within like
[01:44.92] 50 milliseconds as if it was like a real life thing, and another analogy here is like in
[01:50.36] the real life, when you press a button on an electronic device, and it's like a soft
[01:54.90] switch, and you press it and nothing happens, and there's no physical feedback about anything
[01:59.28] happening, and then like half a second later something happens, like that's how a lot of
[02:03.12] software feels, but instead like software should feel more like something that's real
[02:06.76] where you touch, you pull a physical lever and the physical lever moves, and I've taken
[02:11.40] that lesson of kind of human interface to software a ton, it's all about kind of the
[02:16.20] latency, things feeling really solid and robust, both the command lines and user interfaces
[02:21.80] as well.
[02:22.80] And how did you operationalize that for a FIG or Docker?
[02:27.32] A lot of it's just low latency, actually we didn't do it very well for FIG, in the
[02:31.72] first place we used Python, which was a big mistake, where Python's really hard to get
[02:37.04] booting up fast because you have to load up the whole Python runtime before it can run
[02:40.12] anything.
[02:41.12] Okay.
[02:42.12] Go is much better at this, where like go just instantly starts.
[02:45.36] You have to be under 500 milliseconds to start up.
[02:48.36] Yeah, effectively.
[02:49.36] I mean, perception of human things being immediate is something like 100 milliseconds, so anything
[02:55.44] like that is good enough.
[02:57.88] Yeah.
[02:58.88] Also, I should mention since we're talking about your side projects, well one thing is
[03:01.56] I am maybe one of a few fellow people who have actually written something about CLI
[03:05.64] design principles, because I was in charge of the Netlify CLI back in the day, and had
[03:09.84] many thoughts.
[03:10.92] One of my fun thoughts, I'll just share in case you have thoughts, is I think CLIs are
[03:15.36] effectively starting points for scripts that are then run, and the moment one of the scripts
[03:20.52] preconditions are not fulfilled, typically they end, so the CLI developer will just exit
[03:27.40] the program.
[03:28.88] And the way that I designed, I really wanted to create the Netlify Dev workflow was for
[03:32.60] it to be kind of a state machine that would resolve itself.
[03:36.32] If it detected a precondition wasn't fulfilled, it would actually delegate to a sub-program
[03:41.32] that would then fulfill that precondition, asking for more info or waiting until a condition
[03:45.60] is fulfilled, then it would go back to the original flow and continue that.
[03:49.32] I don't know if that was ever tried, or is there a more formal definition of it, because
[03:53.32] I just came up with it randomly.
[03:55.32] But it felt like the beginnings of AI, in the sense that when you run a CLI command,
[03:59.36] you have an intent to do something, and you may not have given the CLI all the things that
[04:03.60] it needs to do to execute that intent.
[04:07.04] So that was my two cents.
[04:08.84] Yeah, that reminds me of a thing we sort of thought about when writing the CLI guidelines
[04:14.48] where CLIs were designed in a world where the CLI was really a programming environment,
[04:20.16] and it's primarily designed for machines to use all of these commands and scripts, whereas
[04:27.56] over time, it's back in a world where the primary way of using and computers was writing
[04:36.80] shell scripts effectively, and we've transitioned to a world where humans are using CLI programs
[04:42.12] much more than they used to, and the current best practices about how UNIX was designed.
[04:49.12] There's lots of design documents about UNIX from the '70s and '80s, where they say things
[04:54.44] like command line commands should not output anything on success, it should be completely
[05:00.40] silent, which makes sense if you're using it in a shell script.
[05:04.52] But if a user is using that, it just looks like it's broken.
[05:07.96] If you type copy and it just doesn't say anything, you assume that it didn't work as a new user.
[05:12.46] I think what's really interesting about the CLI is that it's actually a really good,
[05:19.08] to your point, it's a really good user interface where it can be like a conversation, where
[05:25.76] it feels like you're, instead of just like you telling the computer to do this thing
[05:29.60] and either silently succeeding or saying, "No, you did failed," it can guide you in
[05:35.96] the right direction and tell you what your intent might be, and that kind of thing in
[05:40.68] a way that's actually, it's almost more natural to a CLI than it is in a graphical user interface
[05:45.46] because it feels like this back and forth with the computer.
[05:48.54] Almost funnily, like a language model, so I think there's some interesting intersection
[05:53.96] of CLIs and language models actually being very closely related and good fit for each
[05:59.36] other.
[06:00.36] Yeah, I'll say one of the surprises from last year, I worked on a coding agent, I think
[06:04.02] the most successful coding agent of my cohort was Open Interpreter, which was a CLI implementation.
[06:09.48] I have, even as a CLI person, I have chronically underestimated CLI as a useful interface.
[06:15.32] Yeah, totally.
[06:17.06] You also developed Archive Vanity, which you recently retired after a glorious seven
[06:21.50] years.
[06:22.50] Something like that, yeah.
[06:23.50] Something like that, which is nice, I guess, HTML PDFs.
[06:27.50] Yeah, that was actually the start of where Replicate came from.
[06:31.76] Okay, we can tell that story.
[06:33.26] So, when I quit Docker, I got really interested in science infrastructure, just as like a
[06:37.62] problem area because it is, like, science has created so much progress in the world,
[06:44.22] the fact that we're, you know, can talk to each other on a podcast and we use computers
[06:49.12] and the fact that we're alive is probably thanks to medical research, you know.
[06:52.44] But science is just like completely archaic and broken and there's like 19th century processes
[06:56.88] that just happen to be copied to the internet rather than taken to account that, you know,
[07:00.68] we can transfer information at the speed of light now.
[07:02.76] And the whole way science is funded and all this kind of thing is all kind of very broken.
[07:06.04] And there's just so much potential for making science work better.
[07:09.04] And I realized that I wasn't a scientist and I didn't really have any time to go and
[07:12.48] get a PhD and become a researcher, but I'm a tool builder and I could make existing scientists
[07:16.94] better at their job.
[07:17.94] And if I could make, like, a bunch of scientists a little bit better at their job, maybe, you
[07:21.66] know, that's the kind of equivalent of being a researcher.
[07:24.14] So one particular thing I dialed in on is just how science is disseminated in that all
[07:28.86] of these PDFs quite often behind paywalls, you know, on the internet.
[07:33.50] Yeah, and that's a whole thing because it's funded by national grants, government grants
[07:38.70] that put behind paywalls.
[07:40.42] Yeah, exactly.
[07:41.42] That's like a hole.
[07:42.42] Yeah, I could talk for hours about that, but the particular thing we got we got dove in
[07:46.00] on was, interestingly, these PDFs are also, there's a bunch of open science that happens
[07:50.82] as well.
[07:51.82] So math, physics, computer science, machine learning, notably, is all published on the
[07:56.86] archive, which is actually a surprisingly old institution.
[08:00.32] Some random Cornell.
[08:01.32] Yeah, it was just like somebody in Cornell who started a mailing list in the 80s.
[08:05.04] And then when the web was invented, they built a web interface around it like it's super
[08:09.08] old.
[08:10.08] And it's like kind of like a user group thing, right?
[08:13.26] That's why they're all these numbers and stuff.
[08:15.18] Yeah, exactly.
[08:16.18] Like it's a bit like you should have something.
[08:19.54] That's where all basically all of math, physics, and computer science happens, but it's still
[08:23.10] PDFs published to this thing, which is just so infuriating.
[08:27.50] The web was invented at CERN, a physics institution, to share academic writing.
[08:32.70] Like there are these, there are figure tags.
[08:35.02] There are like author tags, there are heading tags, there are site tags, you know, hyperlinks
[08:38.94] are effectively citations.
[08:40.96] Because you want to link to another academic paper, I mean instead you have to like copy
[08:44.20] and paste these things and try and get around paywalls, like it's absurd, you know.
[08:47.24] And now we have like social media and things, but still like academic papers as PDFs, you
[08:52.92] know, it's just like, why?
[08:53.92] This is not what the web was for.
[08:55.42] So anyway, I got really frustrated with that.
[08:57.24] And I went on vacation with my old friend Andreas.
[08:59.80] So we were, we used to work together in London on a startup at somebody else's startup.
[09:04.60] And we were just on vacation in Greece for fun.
[09:07.52] And he was like, trying to read a machine learning paper on his phone, you know, like
[09:11.30] we had to like zoom in and like scroll line by line on the PDF and he was like, this is
[09:15.54] fucking stupid.
[09:16.54] So I was like, I know, like this is something, we discovered our mutual hatred for, for this,
[09:21.54] you know.
[09:22.70] And we spent our vacation sitting by the pool, like making latex to HTML, like converters,
[09:29.62] making the first version of archive entity.
[09:31.06] Anyway, that was up then a whole thing.
[09:33.10] And the story, we shot it down recently because they caught the eye of archive who were like,
[09:38.80] oh, this is great.
[09:39.80] We just haven't had the time to work on this.
[09:40.80] And what's tragic about the archive, it's like this project of Cornell that's like,
[09:44.72] they can barely scrounge together enough money to survive.
[09:47.00] I think it might be better funded now than it was when we were, we were collaborating
[09:49.92] with them.
[09:50.92] And compared to these like scientific journals, it's just that this is actually where the
[09:53.92] work happens.
[09:54.92] They just have a fraction of the money that like these big scientific journals have, which
[09:58.04] is just so tragic.
[09:59.04] But anyway, they were like, yeah, this is great.
[10:00.64] We can't afford to like do it, but do you want to like, as a volunteer, integrate archive
[10:04.30] entity into archive?
[10:05.30] Oh, you did the work.
[10:06.70] We didn't do the work.
[10:07.70] We started doing the work.
[10:08.70] We did some.
[10:09.70] I think we worked on this for like a few months to actually get it integrated into archive.
[10:13.22] And then we got like distracted by replicate.
[10:15.90] So like I called day and picked up the work and made it happen, like somebody who works
[10:21.94] on one of the, the piece of the libraries that powers archive entity.
[10:25.42] Okay.
[10:26.42] And the relationship with archive sanity.
[10:28.26] None.
[10:29.26] And did, did you predate them?
[10:30.72] But I actually don't know the lineage.
[10:32.36] We were after, we both were both users of archive sanity, which is like a sort of archive
[10:36.08] event.
[10:37.08] Like Rex is on top of archive.
[10:39.80] Yeah.
[10:40.80] Yeah.
[10:41.80] And we were both users of that.
[10:42.80] And I think we were trying to come up with a working name for archive and Andreas just
[10:45.48] like cracked a joke of like, oh, it's called a kind of vanity.
[10:48.12] Let's make the papers look nice.
[10:49.44] And that was the working name and it just stuck.
[10:54.56] And then from there, tell us more about why you got distracted, right?
[10:58.00] So replicate maybe feels like an overnight success to a lot of people, but you've been
[11:02.18] building this since 2019.
[11:03.70] Yeah.
[11:04.70] So what prompted the, the start and we've been collaborating for even longer.
[11:07.78] So we created archive vanity in 2017.
[11:11.14] So in some sense, we've been doing this almost like six, seven years now, a classic seven
[11:15.10] year.
[11:16.10] Overnexus.
[11:17.10] Yeah.
[11:18.10] Yes.
[11:19.10] We did archive vanity and then worked on a bunch of like surrounding projects.
[11:22.50] I was still like really interested in science publishing at that point.
[11:25.26] And I'm trying to remember, because I tell a lot of like the condensed story to people
[11:29.36] because I can't really tell like a seven year history is I'm trying to figure out like the
[11:32.00] right, the right, the right, the right length to, we want to nail the definitive replicate
[11:36.80] story here.
[11:37.96] One thing that's really interesting about these machine learning papers is that these
[11:42.68] machine learning papers are published on, on the archive.
[11:45.60] And a lot of them are actual fundamental research.
[11:47.80] So like should be like pros describing a theory, but a lot of them are just running pieces of
[11:54.24] software that like a machine learning researcher made that did something, you know, it's like
[11:58.16] an image classification model or something.
[12:00.78] And they managed to make an image classification model that was better than the state of the
[12:04.98] existing state of the arts.
[12:07.14] And they've made an actual running piece of software that does image segmentation.
[12:11.54] And then what they had to do is they then had to take that piece of software and write
[12:15.82] it up as pros and math in a PDF.
[12:18.74] And what's frustrating about that is like if, if you want to, so this was like Andreas
[12:23.82] was, Andreas was a machine learning engineer at Spotify.
[12:27.56] And some of his job was like, he did pure research as well.
[12:31.06] Like he did a PhD and he was doing a lot of stuff internally, but part of his job was
[12:34.08] also being an engineer and taking some of these existing things that people have made
[12:38.96] and published and trying to apply them to actual problems at Spotify.
[12:43.88] And he was like, you know, you get given a paper, which like describes roughly how the
[12:48.36] model works.
[12:49.36] It's probably listing lots of crucial information.
[12:51.28] There's sometimes code on GitHub, more and more there's code on GitHub.
[12:54.26] But back then it was kind of relatively rare, but it was quite often just like scrappy research
[12:58.58] code and didn't actually run.
[13:00.74] And you know, there was maybe the weights that were on Google Drive, but they accidentally
[13:03.58] deleted the weights of Google Drive, you know, and it was like really hard to like take this
[13:07.70] stuff and actually use it for real things.
[13:10.14] We just started talking together about like his problems at Spotify.
[13:14.18] And I connected this back to my work at Docker as well, was like, oh, this is what we created
[13:19.82] containers for.
[13:20.82] You know, we sold this problem for normal software by putting the thing inside a container
[13:24.08] so that you could ship it around and it kept on running.
[13:26.76] So we were sort of hypothesizing about like, hmm, what if we put machine learning models
[13:30.48] inside containers so that they could actually be shipped around and they could be defined
[13:34.88] in like some production ready formats and other researchers could run them to generate
[13:38.96] baselines and you could, people who wanted to actually apply them to real problems in
[13:42.60] the world could just pick up the container and run it, you know.
[13:46.40] And we then thought this is like whether it gets normally in this part of the story,
[13:50.60] I skip forward to be like, and then we created COG, this container stuff for machine learning
[13:55.70] models and we created Replicate the place for people to publish these machine learning
[13:58.22] models.
[13:59.22] But there's actually like two or three years between that.
[14:02.00] The thing we then got dialed into was Andreas was like, what if there was a CI system for
[14:07.30] machine learning?
[14:08.30] Because like one of the things he really struggled with as a researcher is generating baselines.
[14:13.22] So when like he's writing a paper, he needs to like get like five other models that are
[14:17.42] existing work and get them running.
[14:21.16] On the same evals.
[14:22.16] On the, exactly on the same evals so you can compare apples to apples because you can't
[14:25.08] trust the numbers in the paper.
[14:26.76] So you can be Google and just publish them anyway.
[14:31.08] So I think this was coming from the thinking of like there should be containers for machine
[14:34.44] learning, but why are people going to use that?
[14:36.40] Okay, maybe we can create a supply of containers by like creating this useful tool for researchers.
[14:42.16] And the useful tool was like, let's get researchers to package up their models and push them to
[14:46.60] this central place where we run a standard set of benchmarks across the models so that
[14:52.14] you can trust those results and you can compare these models apples to apples.
[14:55.34] And for like a researcher for Andreas, like doing a new piece of research, he could trust
[14:59.30] those numbers and he could like pull down those models, confirm it on his machine, use
[15:03.74] the standard benchmark to then measure his model and you know, all this kind of stuff.
[15:07.98] And so we started building that.
[15:09.82] That's what we applied to YC with, got into YC and we started sort of building a prototype
[15:13.94] of this.
[15:15.08] And then this is like where it all starts to fall apart.
[15:18.28] We were like, okay, that sounds great.
[15:19.76] And we talked to a bunch of researchers and they really wanted that and that sounds brilliant.
[15:22.12] That's a great way to create a supply of like models on this research platform.
[15:25.76] But how the hell is this a business, you know, like how are we even going to make any money
[15:29.08] out of this?
[15:30.08] And we're like, oh, shit.
[15:31.08] That's like the, that's the real unknown here of like what the business is.
[15:34.80] So we thought it would be a really good idea to like, okay, before we get too deep into
[15:40.12] this, let's try and like reduce the risk of this turning to a business.
[15:44.26] So let's try and fit like research what the business could be for this research tool effectively.
[15:49.02] So we went and talked to a bunch of companies trying to sell them something which didn't
[15:52.02] exist.
[15:53.02] So we're like, hey, do you want a way to share research inside your company?
[15:56.74] So the other researchers or say like the product manager can test out the machine learning
[16:00.22] model.
[16:01.22] And they're like, maybe.
[16:02.62] And we were like, do you want like a deployment platform for deploying models?
[16:09.22] Like, do you want like a central place for versioning models?
[16:12.32] Like we're trying to think of like lots of different like products we could sell that
[16:14.56] were like related to this thing.
[16:16.36] And terrible idea.
[16:17.96] Like we're not salespeople and like people don't want to buy something that doesn't exist.
[16:22.96] I think some people can pull this off, but we were just like, you know, a bunch of product
[16:26.60] people, products and engineer people, and we just like couldn't pull this off.
[16:30.40] So we then got halfway through our YC batch.
[16:32.32] We hadn't built a product.
[16:33.32] We had no users.
[16:35.32] We had no idea what our business was going to be because we couldn't get anybody to like
[16:38.08] buy something which doesn't exist.
[16:39.08] And actually there was quite a way through our, I think it was like two thirds of the
[16:42.62] way through our YC batch or something.
[16:43.62] We're like, okay, well, we're kind of screwed now because we don't have anything to show
[16:46.26] at demo day.
[16:47.82] And then we then like tried to figure out, okay, what can we build in like two weeks?
[16:51.98] That'll be something.
[16:53.42] So we like desperately tried to, I can't remember what we tried to build at that point.
[16:56.98] And then two weeks before demo day, I just remember it was all, we were going down to
[17:00.94] mount a view every week for dinners and we got called onto like an all hand zoom call,
[17:04.38] which was super weird.
[17:05.38] We were like, what's going on?
[17:06.72] And they were like, don't come to dinner tomorrow.
[17:10.88] And we realized, we kind of looked at the news and we were like, oh, there's a pandemic
[17:14.64] going on.
[17:15.64] We were like so deep in our style up, we were just like completely oblivious to what was
[17:19.40] going on around us.
[17:20.40] Was this Jen or Feb?
[17:22.36] This was March, 2020.
[17:24.04] March, 2020.
[17:25.04] Yeah.
[17:26.04] Because I remember Silicon Valley at the time was early to COVID.
[17:28.72] Yeah.
[17:29.72] Like they started locking down a lot faster than the rest of the world.
[17:31.84] Yeah, exactly.
[17:32.84] Yeah.
[17:33.84] Soon after that, like there was the San Francisco lockdowns and then like the YC batch just
[17:38.14] like stopped.
[17:39.14] There wasn't demo day and it was an incentive blessing for us because we just kind of...
[17:46.78] In the normal course of events, you're actually allowed to defer to a future demo day.
[17:50.66] Yeah.
[17:51.66] So we didn't even tell you to defer because it just kind of didn't happen, you know?
[17:55.46] So was YC helpful?
[17:57.50] Yes.
[17:58.50] We completely screwed up the batch and that was our fault.
[18:00.58] I think the thing that YC has become incredibly valuable for us has been after YC.
[18:06.92] I think there was a reasonable argument that we couldn't, didn't need to do YC to start
[18:12.00] with because we were quite experienced.
[18:14.52] We had done some startups before, we were kind of well connected with VCs, you know,
[18:18.92] it was relatively easy to raise money because we were like a known quantity.
[18:21.64] You know, if you go to a VC and be like, "Hey, I made this piece of..."
[18:24.90] It's docker compose for AI.
[18:26.40] Yeah.
[18:27.40] Exactly.
[18:28.40] And like, you know, people can pattern match like that and they can have some trust, you
[18:31.90] know what you're doing.
[18:32.90] Whereas it's much harder for people straight out of college and that's where like YC sweet
[18:36.46] spot is like helping people straight out of college who are super promising, like figure
[18:39.42] out how to do that.
[18:40.42] Yeah.
[18:41.42] No credentials.
[18:42.42] Yeah.
[18:43.42] Exactly.
[18:44.42] So in some sense, we didn't need that, but the thing's been incredibly useful for us
[18:45.42] since YC has been, this was actually, I think, so Docker was a YC company and Solomon,
[18:51.08] the founder of Docker, I think told me this.
[18:52.50] He was like, "A lot of people underestimate the value of YC after you finish the batch."
[18:57.58] And his biggest regret was like, not staying in touch with YC.
[19:01.08] I might be misattributing this, but I think it was him.
[19:04.60] And so we made a point of that and we just stayed in touch with our batch partner who
[19:07.44] Jared at YC has been fantastic.
[19:09.32] Jared Harris.
[19:10.32] Jared Friedman.
[19:11.32] Friedman.
[19:12.32] And all of like the team at YC, like there was the growth team at YC when they were still
[19:16.16] there and they've been super helpful.
[19:18.36] And two things been super helpful about that is like raising money, like they just know
[19:21.96] exactly how to raise money and they've been super helpful during that process in all of
[19:24.80] our rounds.
[19:25.80] We've done three rounds since we did YC and they've been super helpful during the whole
[19:28.42] process.
[19:29.42] And also just like reaching a ton of customers.
[19:32.02] So like the magic of YC is that you have all of, like there's thousands of YC companies,
[19:35.60] I think, like on the road of thousands of things.
[19:38.98] And they're all of your first customers and they're like super helpful, super receptive,
[19:43.60] really want to like try out new things.
[19:46.12] You have like a warm intro to every one of them basically and there's this mailing list
[19:49.44] where you can post about updates to your products, which is like really receptive.
[19:54.12] And that's just been fantastic for us.
[19:55.48] Like we've just like got so many of our users and customers through YC.
[20:00.12] Yeah.
[20:01.12] Well, so the classic criticism or the sort of, you know, pushback is people don't buy
[20:05.92] you because you are both from YC, but at least they'll open the email.
[20:10.96] Yeah.
[20:11.96] Right.
[20:12.96] Like that's the, okay.
[20:13.96] Yeah.
[20:14.96] Yeah.
[20:15.96] So that's been a really, really positive experience.
[20:16.96] And sorry, I interrupted with the YC question.
[20:18.28] Like you were, you just made it out of the YC, survived the pandemic.
[20:22.36] I'll try and condense this a little bit.
[20:24.36] When we started building tools for COVID weirdly, we were like, okay, we don't have a startup.
[20:27.84] We haven't figured out anything.
[20:28.84] It was always a useful thing we could be doing right now.
[20:32.60] Save lives.
[20:33.60] So yeah, let's try and save lives.
[20:35.48] I think we failed at that as well.
[20:36.48] We had a bunch of products that everybody caught anywhere.
[20:38.72] We kind of worked on, yeah, a bunch of stuff like contact tracing, which I don't, didn't
[20:42.88] really be useful thing.
[20:45.52] Sort of Andreas worked on it, like a door dash for like people delivering food to people
[20:50.96] who are vulnerable.
[20:51.96] What else did we do?
[20:53.24] The meta problem of like helping people direct their efforts to what was most useful and
[20:57.44] a few other things like that.
[20:58.44] I didn't really go anywhere.
[20:59.44] So we're like, okay, this is not really working either.
[21:00.44] We were considering actually just like doing like work for COVID.
[21:03.52] We have this decision document early on in our company, which is like, should we become
[21:06.36] a like government app contracting shop, you know?
[21:10.84] We decided no-
[21:11.84] Because you also did work for the Gov.uk.
[21:13.60] Yeah, exactly.
[21:14.60] We had experience like doing some like-
[21:17.28] And the Guardian and all that.
[21:18.28] Yeah.
[21:19.28] For like government stuff.
[21:20.28] And we were just like really good building stuff.
[21:22.44] Like we were just like product people.
[21:23.80] Like I was like the front-end product side and Andreas was the back-end side.
[21:26.60] So we were just like a product and we were working with a designer at the time, a guy
[21:30.32] called Mark, who did our early designs for Replicate and we're like, hey, what if we
[21:33.88] just team up and like become it and build stuff.
[21:36.56] But yeah, we gave up on that in the end for, can't remember the details.
[21:39.88] So we went back to machine learning and then we were like, well, we're not really sure if
[21:44.32] this is going to work and one of my most painful experiences from previous startups is shutting
[21:49.80] them down.
[21:50.80] Like when you realize it's not really working and having to shut it down, it's like a ton
[21:52.96] of work and it's, people hate you and it's just sort of, you know.
[21:57.60] So we were like, how can we make something we don't have to shut down?
[22:00.48] And even better, how can we make something that won't page us in the middle of the night?
[22:05.62] So we made an open source project.
[22:07.72] We made a thing which was an open source weights and biases because we had this theory that
[22:11.88] like people want open source tools.
[22:13.48] There should be like an open source like version control experiment tracking like thing.
[22:17.76] And it was intuitive to us and they were like, oh, we're software developers and we like
[22:20.64] command line tools.
[22:21.64] Like everyone loves command line tools and open source stuff for machine learning researchers
[22:25.16] just really didn't care.
[22:26.16] Like they just wanted to click on buttons.
[22:27.44] They didn't mind that it was a cloud service.
[22:29.12] Like it was all very visual as well that you need a lot to graphs and charts and stuff
[22:33.92] like this.
[22:35.12] So it wasn't right.
[22:36.52] Like it was right.
[22:37.52] We were actually real bleeding something that Andreas made at Spotify for just like saving
[22:40.48] experiments to cloud storage automatically, but other people didn't really want this.
[22:44.88] So we kind of gave up on that.
[22:47.12] And then we, that was actually originally called replicate and we renamed that out the
[22:50.08] way.
[22:51.08] So it's now called keepsake.
[22:52.08] And I think some people still use it.
[22:53.60] Then we sort of came back, we looped back to our original idea.
[22:58.60] So we were like, oh, maybe there was a thing in that thing we were originally sort of thinking
[23:01.88] about of like researchers showing their work and containers for machine learning models.
[23:06.20] So we just built that.
[23:07.20] And at that point, we were kind of running out of the YC money.
[23:10.32] So we were like, okay, this like feels good though.
[23:12.24] Let's like give this a shot.
[23:13.24] So that was the point we raised the seed round.
[23:15.84] We raised seed launch.
[23:18.48] We raised pre-launch and pre-team.
[23:20.40] It was an idea basically.
[23:21.48] We had a little prototype.
[23:22.48] It was just an idea and a team, but we were like, okay, like, you know, when bootstrapping
[23:28.88] this thing is getting hard.
[23:29.88] So that's actually really some money.
[23:31.80] Then we made cog and replicates.
[23:35.08] It initially didn't have APIs, interestingly.
[23:37.64] It was just the bit that I was talking about before of helping researchers share their
[23:41.60] work.
[23:42.60] It was helping researchers to put their work on a webpage such that other people could
[23:47.30] try it out and so that you could download the Docker container.
[23:50.04] We cut the benchmarks thing of it because we thought that was just like too complicated.
[23:53.60] But it had a Docker container that like, you know, Andreas in a past life could download
[23:57.76] and run with his benchmark and you could compare all these models apples to apples.
[24:01.76] So that was like the theory behind it.
[24:03.84] That kind of started to work.
[24:05.80] It was like still when like, you know, it's pre-long time pre-AI hype and there was lots
[24:11.76] of interesting stuff going on, but it was very much in like the classic deep learning
[24:15.92] era.
[24:16.92] So sort of image segmentation models and sentiment analysis and all these kind of things that
[24:22.36] people were using deep learning models for.
[24:25.48] And we were very much building for research because all of this stuff was happening in
[24:29.00] research institutions.
[24:30.00] You know, the sort of people who'd be publishing to archive.
[24:32.24] So we were creating an accompanying material for their models, basically.
[24:35.12] You know, they wanted a demo for their models and we were creating accompanying material
[24:38.80] for it.
[24:39.80] And what was funny about that is they were like not very good users.
[24:42.16] Like they were, they were doing great work, obviously, but the way that research worked
[24:46.92] is that they just made like one thing every six months and they just fired and forgot
[24:51.92] it.
[24:52.92] Like they published this piece of paper and like done, I've published it.
[24:56.12] So they like output it to replicate and then they just stopped using replicate.
[25:00.28] You know, they were like once every six monthly users.
[25:04.12] And that wasn't great for us, but we stumbled across this early community.
[25:08.76] This was early 2021 when OpenAI created this, created a clip and people started smushing
[25:15.76] clip and GANs together to produce image generation models.
[25:19.80] And this started with, you know, it's just a bunch of like tinkerers on Discord, basically.
[25:25.04] There was an early model called Big Sleep by AdVadNown.
[25:30.00] And then there was VEGA GAN Clip, which was like a bit more popular by Rivers Have Wings.
[25:34.88] And it was all just people like tinkering on stuff in collabs and it was very dynamic
[25:37.68] and it was people just making copies of collabs and playing around with things and forking.
[25:41.32] And to me, this, I saw this and I was like, oh, this feels like open source software.
[25:44.28] Like so much more than the research world where like people are publishing these papers.
[25:48.72] You don't know their real names and it's just like a Discord thing.
[25:51.24] Yeah, exactly.
[25:52.24] But crucially, it was like people were tinkering and forking and people were, things were moving
[25:55.68] really fast and it just felt like this creative, dynamic, collaborative community in a way
[26:03.08] that research wasn't really like it was still stuck in this kind of six month
[26:07.64] publication cycle.
[26:09.76] So we just kind of latched onto that and started building for this community.
[26:14.04] And a lot of those early models were published on replicates.
[26:17.72] I think the first one that was really primarily on replicates was one called Pixray, which
[26:22.92] was sort of mid 2021 and it had a really cool like pixel art output, but it also just like
[26:28.16] produced general, they weren't like crisp images, but they're quite aesthetically pleasing
[26:33.88] like some of these early image generation models.
[26:36.92] And you know, that was like published primarily on replicates and then a few other models
[26:40.04] around that were like published on replicates.
[26:42.96] And that's where we really started to find early community and like where we really found
[26:45.96] like, oh, we've actually built a thing that people want.
[26:49.48] And they were great users as well.
[26:50.76] And people really want to try out these models.
[26:52.16] Lots of people were like running the models on replicate.
[26:55.08] We still didn't have APIs though.
[26:56.72] Interesting.
[26:57.72] And this is like another like really complicated part of the story.
[26:59.28] We had no idea what a business model was still at this point.
[27:01.28] I don't think people could even pay for it.
[27:03.20] You know, it's just like these web forms where people could run the model.
[27:06.24] Just for historical interests, which discords were they and how did you find them?
[27:09.40] Was this the Lyon discord?
[27:10.40] Yeah, Lyon.
[27:11.40] This is Luther.
[27:12.40] Yeah.
[27:13.40] It was the Luther one.
[27:14.40] These two, right?
[27:15.40] Luther, I particularly remember.
[27:16.40] There was a channel where where VEAKYGUN clip, this was early 2021 where VEAKYGUN clip was
[27:20.28] set up as a discord bot.
[27:23.08] I just remember being completely just like captivated by this thing.
[27:27.56] I was just like playing around with it all afternoon and like the sort of thing.
[27:30.28] In discord.
[27:31.28] Shit, it's 2am, you know.
[27:32.28] Yeah.
[27:33.28] This is the beginnings of mid-journey.
[27:34.28] Yeah, exactly.
[27:35.28] It was the start of mid-journey and you know, it's where that kind of user interface came
[27:39.72] from.
[27:40.72] Like what's beautiful about the user interface is like you could see what other people are
[27:42.96] doing and that you could you could riff off other people's ideas.
[27:47.52] And it was just so much fun to just like play around with this in like a channel for over
[27:51.64] 100 people.
[27:52.88] And yeah, that just like completely captivated me and I like, okay, this is something, you
[27:56.20] know, so like we should get these things on replicate.
[27:58.40] Yeah.
[27:59.40] That's where that all came from.
[28:00.60] And then you moved on to, so was it APIs next or was it stable diffusion next?
[28:04.12] It was APIs next.
[28:05.56] And the APIs happened because one of our users, our web form had like an internal API for
[28:11.52] making the web form work.
[28:12.68] Like with an API that was called from JavaScript.
[28:15.52] And somebody like reverse engineered that to start generating images with a script.
[28:19.96] You know, they did like, you know, web inspector copy his car, like figure out what the API
[28:25.12] request was.
[28:26.12] Yeah.
[28:27.12] And it wasn't secured or anything.
[28:28.12] Of course not.
[28:29.12] They started generating a bunch of images and like we got tons of traffic and like what's
[28:32.32] going on.
[28:33.32] And I think like a sort of usual reaction to that would be like, Hey, you're abusing
[28:37.72] our API and to shut them down.
[28:40.24] And instead we're like, Oh, this is interesting.
[28:41.76] Like people want to run these models.
[28:43.60] So we documented the API in an ocean document, like our internal API in an ocean document
[28:49.48] and like message this person being like, Hey, you seem to have found our API.
[28:56.00] Here's the documentation.
[28:57.00] That'll be like a thousand bucks a month with a straight form that we just click some
[29:01.60] buttons to make.
[29:02.60] And they were like, Sure, that sounds great.
[29:03.72] So that was our first customer a thousand bucks a month.
[29:07.04] It was, it was surprising.
[29:08.04] A lot of money.
[29:09.04] That's not.
[29:10.04] It was on the casual.
[29:11.04] It was on the order of a thousand bucks a month.
[29:12.04] So was he, was it a business thing?
[29:13.52] It was the creator of Picks Ray.
[29:16.16] Like it was, he generated NFT art.
[29:19.96] And so he like made a bunch of art with these models and was selling these NFTs effectively.
[29:26.72] And I think lots of people in his community were doing similar things and like he then
[29:29.72] referred us to other people who were also generating NFTs using the joint models, started
[29:33.12] our API business.
[29:34.12] Yeah.
[29:35.12] Then we like made an official API and actually like added some billing to it.
[29:37.84] So it wasn't just like a fixed fee.
[29:40.48] And now people think of you as the host and models API business.
[29:43.48] Yeah, exactly.
[29:44.48] And that just turned out to be our business, you know, but what ended up being beautiful
[29:47.96] about this is it was really fulfilling like the original goal of what we wanted to do
[29:52.48] is that we wanted to make this research that people were making accessible to like other
[29:57.84] people and for it to be used in the real world.
[30:00.88] And this was like the just like ultimately the right way to do it because all of these
[30:05.32] people making these generate models could publish them to replicate and they wanted
[30:09.00] a place to publish it.
[30:10.84] And software engineers, you know, like myself, like I'm not a machine learning expert, but
[30:14.56] I want to use this stuff, could just run these models with a single line of code.
[30:18.40] And we thought maybe the Docker image is enough, but it's actually super hard to get the Docker
[30:21.44] image running on a GPU and stuff.
[30:23.08] So it really needed to be the hosted API for this to work and to make it accessible to
[30:27.00] software engineers.
[30:28.00] And we just like wound our way to this, to yours to the customer.
[30:31.88] Yeah, exactly.
[30:33.08] Did you ever think about becoming my journey during that time?
[30:36.64] You have like so much interest in image generation.
[30:39.04] I mean, you're doing fine for the record, but you know, it was right there.
[30:45.00] You were playing with it.
[30:46.00] I don't think it was our expertise.
[30:48.08] Like I think our expertise was DevTools rather than like mid-journeys, almost like a consumer
[30:51.48] products, you know, so I don't think it was our expertise.
[30:55.44] Certainly occurred to us.
[30:56.44] I think at the time we were thinking about like, oh, maybe we could hire some of these
[30:59.08] people in this community and make great models and stuff like this, but we ended up more
[31:03.32] being at the tooling.
[31:04.32] Like I think like before I was saying, like I'm not really a researcher, but I'm more
[31:06.48] like the tool builder behind the scenes.
[31:08.04] And I think both me and Andreas are like that.
[31:09.88] I think this is an illustration of the tool builder philosophy.
[31:12.92] Something where you very, you latch onto in DevTools, which is when you see people behaving
[31:17.00] weird, it's not their fault.
[31:18.00] It's yours.
[31:19.00] And you want to pave the cow paths is what they say, right?
[31:20.52] Like the unofficial paths that people are making, like make it official and make it easy
[31:23.60] for them and then maybe charge a bit of money.
[31:25.84] And now fast forward a couple of years, you have 2 million developers using Replicate.
[31:30.56] Maybe more.
[31:31.56] That was the last public number that I found.
[31:33.84] It's 2 million users, not all those people are developers, but a lot of them are developers.
[31:38.32] And then 30,000 paying customers was the number.
[31:41.88] Late in space runs on Replicate.
[31:43.44] So we had a small podcast there and we hosted a subscription on whispered diarization on
[31:47.92] Replicate.
[31:48.92] So and we're paying.
[31:49.92] So we're late in space is in the 30,000.
[31:52.48] You raised a $40 million series B.
[31:54.52] I would say that maybe the stable diffusion time, August 22 was like really when the company
[32:00.08] started to break out.
[32:01.40] Tell us a bit about that and the community that came out and I know now you're expanding
[32:05.16] beyond just image generation.
[32:06.72] Yeah.
[32:07.72] Like I think we kind of set us all, like we saw that was this really interesting to image
[32:10.72] generative image world going on.
[32:12.28] So we kind of, you know, like we're building the tools for that community already, really.
[32:16.84] And we knew stable diffusion was coming out.
[32:20.12] We knew it was a really exciting thing, you know, it was the best generative image model
[32:22.80] so far.
[32:23.80] I think the thing we didn't, we underestimated was just like what an inflection point it
[32:27.96] would be where it was, I think Simon Welleson put it this way, where he said something along
[32:33.84] the lines of it was a model that was open source and tinkerable and like, you know, it's just
[32:39.92] good enough and open source and tinkerable such that it just kind of took off in a way
[32:43.36] that none of the models had before.
[32:46.28] And like what was really neat about stable diffusion is it was open source so you could
[32:50.76] like, compared to like Dali, for example, which was like sort of equivalent quality.
[32:55.04] And like the first week we saw like people making animation models out of it.
[32:58.42] We saw people make like game texture models that like use circular convolutions to make
[33:03.24] repeatable textures.
[33:04.32] We saw, you know, a few weeks later, like people were fine tuning it so you could make, put
[33:07.72] your face in these models and all of these other.
[33:10.60] Textual inversion.
[33:11.60] Yep.
[33:12.60] Yeah.
[33:13.60] Exactly.
[33:14.60] That happened a bit before that.
[33:15.60] And all of this sort of innovation was happening all of a sudden and people were publishing
[33:20.04] out and replicate because you could just like publish arbitrary models and replicate.
[33:22.80] So we had this sort of supply of like interesting stuff being built.
[33:25.56] But because it was a sufficiently good model, there was also just like a ton of people building
[33:31.48] with it.
[33:32.48] They were like, oh, we can build products with this thing.
[33:33.88] And this was like about the time where people were starting to get really interested in
[33:36.40] AI.
[33:37.40] So like tons of product builders wanted to build stuff with it and we were just like
[33:39.64] sitting in there in the middle is like the interface layer between like, all these people
[33:42.88] wanted to build and all these like machine learning experts who were building cool models.
[33:46.18] And that's like really where it took off.
[33:47.84] We were just sort of credible supply and credible demand and we were just like in the middle.
[33:51.56] And then yes, since then we've just kind of grown and grown really.
[33:55.12] And we, you know, been building a lot for like the indie hacker community, these like
[33:58.08] individual tinkerers, but also startups and a lot of large companies as well who are sort
[34:01.92] of exploring and building AI things.
[34:05.14] Then kind of the same thing happened like middle of last year with language models and
[34:09.92] Lama 2 where the same kind of stable diffusion effect happened with Lama and Lama 2 was like
[34:14.88] our biggest week of growth ever because like tons of people wanted to tinker with it and
[34:17.96] run it.
[34:19.52] And you know, since then we've just been seeing a ton of growth in language models as well
[34:22.36] as image models.
[34:23.36] Yeah.
[34:24.36] We're just kind of riding a lot of the interest that's going on in AI and all the people building
[34:27.72] an AI, you know.
[34:28.72] Yeah.
[34:29.72] Kudos in the right place, right time.
[34:30.72] But also, you know, took a while to position for the right place before the wave came.
[34:34.76] I'm curious if like you have any insights on these different markets.
[34:38.20] So Peter levels notably very loud person, very picky about his tools.
[34:43.56] I wasn't sure actually if he used you.
[34:45.20] He does.
[34:46.20] He does.
[34:47.20] Because you cited them on your Series B blog post and Danny Postmore as well, his competitor
[34:49.12] all in that wave.
[34:50.36] What are their needs versus, you know, the more enterprise or B2B type needs?
[34:55.76] Did you come to a decision point where you're like, okay, you know, how serious are these
[34:58.70] in the hackers versus like the actual businesses that are bigger and perhaps better customers
[35:03.36] because they're less tourney?
[35:04.76] They're surprisingly similar because I think a lot of people right now want to use and
[35:09.32] build with AI, but they're not AI experts.
[35:13.28] And they're not infrastructure experts either.
[35:14.92] So they want to be able to use this stuff without having to like figure out all the internals
[35:18.04] or the models and, you know, like touch pie torch and whatever.
[35:23.12] And they also don't want to be like setting up and booting up servers.
[35:26.44] And that's the same all the way from like indie hackers just getting started because
[35:31.60] like obviously you just want to get started as quickly as possible all the way through
[35:35.20] to like large companies who want to be able to use this stuff, but don't have like all
[35:39.00] of the experts on stuff, you know, you know, big companies like Google and so on.
[35:43.24] They do actually have a lot of experts on stuff, but the vast majority of companies
[35:45.96] don't.
[35:46.96] And they're all software engineers who want to be able to use this AI stuff, but they
[35:49.32] just don't know how to use it.
[35:51.36] And it's like, you really need to be an expert and it takes a long time to like learn the
[35:54.64] skills to be able to use that.
[35:55.64] So they're surprisingly similar in that sense.
[35:57.36] I think it's kind of also unfair of like the indie community, like surprise, they're not
[36:02.04] churning surprisingly, or churning or spiky surprisingly.
[36:05.24] They're building real established businesses, which is like kudos to them, like a building
[36:10.44] these really like large sustainable businesses, often just as solo developers.
[36:16.96] And it's kind of remarkable how they can do that, actually, and it's credits or a lot
[36:19.68] of their like product skills.
[36:21.84] And you know, we're just like there to help them being like their machine learning team
[36:24.96] effectively to help them use all of this stuff.
[36:27.28] A lot of these indie hackers are some of our largest customers, like alongside some of our
[36:31.48] biggest customers that you would think would be spending a lot more money than them.
[36:34.68] But yeah.
[36:35.68] And we should name some of these.
[36:36.68] You have them on your landing page.
[36:37.68] You have Unsplash, CharacterAI.
[36:40.88] What do they power?
[36:41.88] What can you say about their usage?
[36:43.50] Yeah, totally.
[36:44.50] It's kind of a various things.
[36:46.84] Well, I mean, I'm naming them because they're on your landing page.
[36:50.00] So you have logo rights.
[36:51.92] It's useful for people to, like I'm not imaginative.
[36:54.52] I see, monkey see monkey do, right, like if I see someone doing something that I want
[36:58.28] to do, then I'm like, okay, replicates great for that.
[37:01.04] So that's what I think about case studies on company landing pages is it's just a way
[37:04.84] of explaining, like, yeah, this is something that we are good for.
[37:08.64] Yeah, totally.
[37:09.64] I mean, it's these companies are doing things all the way up and down the stack at different
[37:14.52] levels of sophistication.
[37:16.36] So like Unsplash, for example, they actually publicly posted this story on Twitter where
[37:22.00] they're using blip to annotate all of the images in their catalog.
[37:27.80] So you know, they have lots of images in the catalog and they want to create a text description
[37:30.64] of it so you can search for it.
[37:31.80] And they're annotating the images with, you know, off the shelf open source model, you
[37:34.88] know, we have this big library of open source models that you can run.
[37:38.32] And you know, we've got lots of people are running these open source models off the shelf.
[37:42.02] And then most of our larger customers are doing more sophisticated stuff so like fine
[37:46.72] tuning the models, they're running completely custom models on us.
[37:50.76] A lot of these larger companies are like using us for a lot of their, you know, inference,
[37:56.80] but it's like a lot of custom models and them like writing the Python themselves because
[38:01.32] they've got machine learning experts and team on the team and they're using us for like,
[38:05.68] you know, their inference infrastructure effectively.
[38:08.64] So it's like lots of different levels of sophistication where like some people using these off the
[38:12.00] shelf models, some people are fine tuning models.
[38:14.88] So like level Peter levels is great example where a lot of his products are based off
[38:18.24] like fine tuning, fine tuning image models, for example.
[38:22.20] And then we've also got like larger customers who are just like using us as infrastructure
[38:25.68] effectively.
[38:26.68] So yeah, it's like all things up and down, up and down the stack.
[38:29.12] Let's talk a bit about cog and the technical layer.
[38:32.50] So there are a lot of a GPU clouds.
[38:35.68] I think people have different pricing points and I think everybody tries to offer a different
[38:39.96] developer experience on top of it, which then lets you charge a premium.
[38:44.84] Why did you want to create cog?
[38:46.80] You worked at doggie.
[38:47.80] What were some of the issues with traditional container runtimes?
[38:50.28] And maybe yeah, what were you surprised with as you built it?
[38:54.12] Cog came right from the start actually when we were thinking about this, you know, evaluation
[38:58.52] and the sort of benchmarking system for machine learning researchers where we wanted researchers
[39:04.32] to publish their models in a standard format that was guaranteed to keep on running, that
[39:10.36] you could replicate the results of, like that's where the name came from.
[39:14.16] And we realized that we needed something like Docker to make that work, you know.
[39:18.72] And I think it was just like natural from my point of view of like obviously that should
[39:22.12] be open source that we should try and like create some kind of open standard here that
[39:25.16] people can share because if more people use this format, then that's great for everyone
[39:29.84] involved.
[39:30.84] I think the magic of Docker is not really in the software, it's just like the standard
[39:34.68] that people have agreed on like here are a bunch of keys for a JSON document basically.
[39:40.84] And you know, that was the magic of like the metaphor of real containerization as well.
[39:44.24] It's not the containers that are interesting, it's just like the size and shape of the damn
[39:47.52] box.
[39:48.52] Right.
[39:49.52] And it's similar thing here where really we just wanted to get people to agree on like
[39:52.88] this is what a machine learning model is.
[39:55.00] This is how a prediction works, this is what the inputs are, this is what the outputs are.
[39:59.76] So cog is really just a Docker container that attaches to a CUDA device if it needs a GPU
[40:06.04] that has a open API specification as a label on the Docker image and the open API specification
[40:12.52] defines the interface for the machine learning model, like the inputs and outputs effectively
[40:19.04] or the the params in machine learning terminology.
[40:21.92] And you know, we just tried wanted to get people to kind of agree on this thing.
[40:25.04] And it's like general purpose enough that we weren't saying like some of the existing
[40:28.08] things were like at the graph level, but we really wanted something general purpose enough
[40:32.20] that you could just put anything inside this and it was like future compatible and it was
[40:35.08] just like arbitrary software and you know, be future compatible with like future inference
[40:38.82] servers and future machine learning model formats and all this kind of stuff.
[40:42.20] So that was the intent behind it.
[40:43.80] It just came naturally that we wanted to define this format and that's been really working
[40:47.24] for us like a bunch of people have been using cog outside of replicates, which is kind of
[40:51.16] our original intention.
[40:52.16] Like this should be how she wonderful packaged and how people should use it.
[40:55.64] Like it's common to use cog in situations where like maybe they can't use the SAS service
[41:01.22] because I don't know, they're in a big company and they're not allowed to use a SAS service,
[41:04.96] but they can use cog internally still and like they can download the models from replicates
[41:08.36] and run them internally and in their org, which we've been seeing happen and that works
[41:12.00] really well.
[41:13.00] People who want to build like custom inference pipelines, but don't want to like reinvent
[41:16.08] the world, they can use cog off the shelf and use it as like a component in their inference
[41:19.80] pipelines.
[41:20.80] We've been doing tons of usage like that and it's just been kind of happening organically.
[41:23.80] We haven't really been trying, but it's like there if people want it and we've been seeing
[41:27.36] people use it.
[41:28.36] So that's great.
[41:29.36] Yeah.
[41:30.36] So a lot of it is just sort of philosophical of just like, this is how it should work from
[41:31.76] my experience at Docker, you know, and there's just a lot of value from like the core being
[41:35.36] open, I think, and other people can share it and it's like an integration point.
[41:38.24] So, you know, if Replicate, for example, wanted to work with a testing system like a CI system
[41:43.60] or whatever, we can just like interface at the cog level, like that system just needs
[41:47.56] to put cog models and then you can like test your models on that CI system before they
[41:51.80] get deployed to Replicate.
[41:52.80] And it's just like a format that everyone, we can get everyone to agree on.
[41:55.52] What do you think?
[41:57.08] I guess Docker got wrong because if I look at a Docker Compose and a cog definition, first
[42:01.56] of all, the cog is kind of like the Docker file plus the Compose versus in Docker Composer
[42:06.20] just exposing the services and also Docker Compose is very like ports driven versus you
[42:12.60] have like the actual, you know, predict this is what you have to run.
[42:17.04] Yeah.
[42:18.04] Any learning, some maybe tips for other people building container based runtimes, like how
[42:21.72] much should you separate the API services versus the image building or how much you
[42:27.96] want to build them together.
[42:28.96] I think it was coming from two sides.
[42:31.72] We were thinking about the design from the point of view of user needs, what are their
[42:37.12] problems and what problems can be solved for them, but also what the interface should
[42:41.96] be for a machine learning model.
[42:43.36] And it's sort of the combination of two things that led us to this design.
[42:47.64] So the thing I talked about before was a little bit of like the interface around the machine
[42:50.96] learning model.
[42:51.96] So we realized that we wanted to be general purpose.
[42:54.40] We wanted to be at the like JSON, like human readable things rather than the tensor level.
[43:02.40] So it's like an open API specification that wrapped a Docker container.
[43:04.76] That's where that design came from.
[43:07.12] And it's really just a wrapper around Docker.
[43:08.76] So we kind of building on, standing on shoulders there, but with Docker's to low level.
[43:13.00] So it's just like arbitrary software.
[43:14.72] So we wanted to be able to like have a open API specification there that defined the function
[43:21.44] effectively that is the machine learning model, but also like how that function is written,
[43:27.68] how that function is run, which is all defined in code and stuff like that.
[43:30.16] So it's like a bunch of abstraction on top of Docker to make that work.
[43:34.04] And that's where that design came from.
[43:36.28] But the core problems we were solving for users was that Docker's really hard to use
[43:42.08] and productionizing machine learning models is really hard.
[43:45.00] Right.
[43:46.00] So on the first part of that, we knew we couldn't use Docker files.
[43:49.56] Like Docker files are hard enough for software developers to write.
[43:52.12] I'm saying this with love as somebody who works on Docker and like worked on Docker files,
[43:56.36] but it's really hard to use.
[43:57.36] And you need to know a bunch about Linux basically because you're running a bunch of CLI commands.
[44:01.16] You need to know a bunch of Linux and best practices like how app works and all this
[44:04.84] kind of stuff.
[44:05.84] So we're like, okay, we can't, we can't do it to that level.
[44:07.32] We need something that machine learning researchers will be able to understand like people who
[44:09.88] are used to like co-lab notebooks.
[44:12.28] And what they understand is they're like, I need this version of Python.
[44:15.08] I need these Python packages.
[44:16.80] And somebody told me to app get install something.
[44:19.04] You know.
[44:20.04] It throws pseudo in there when I don't really know what that means.
[44:24.16] So we tried to create a format that was at that level and that's what cog.yaml is.
[44:27.18] And we're really kind of trying to imagine like, what is that machine learning research
[44:31.28] you're going to understand, you know, and trying to build for them.
[44:34.16] Then the productionizing machine learning models thing is like, okay, how can we package
[44:39.16] up all of the complexity of like productionizing machine learning models?
[44:42.92] Like picking CUDA versions, like hooking it up to GPUs, writing an inference server, defining
[44:50.00] a schema, doing batching, all of these just like really gnarly things that everyone does
[44:55.16] again and again, and just like, you know, provide that as a tool.
[44:59.80] And that's where that side of it came from.
[45:01.40] So it's like combining those user needs with, you know, the sort of world need of needing
[45:06.32] something like a common standard for like what a machine learning model is.
[45:09.60] And that's, that's how we thought about the design.
[45:11.12] I don't know whether that answers the question.
[45:12.32] Yeah.
[45:13.32] So your idea was like, hey, you really want what Docker stands for in terms of standard,
[45:18.48] but you actually don't want people to do all the work that goes into Docker.
[45:22.56] It needs to be higher level, you know.
[45:25.12] So I want to, for the listener, you're not the only standard that is out there.
[45:29.12] As with any standard, there must be 14 of them.
[45:31.54] You are very surprisingly friendly with Olama, who is your former colleagues from Docker who
[45:35.92] came out with the model file, Mozilla came out with the llama file.
[45:40.26] And then I don't know if this is in the same category even, but I'm just going to throw
[45:42.80] it in there.
[45:43.80] Like hugging face has the transformers and diffusers library, which is a way of disseminating
[45:46.60] models that obviously people use.
[45:49.36] How would you compare your contrast, your approach of cog versus all these?
[45:52.72] It's kind of complimentary actually, which is kind of neat in that a lot of transformers,
[45:57.48] for example, is lower level than cog.
[45:59.24] So it's, you know, Python library effectively, but you still need to like.
[46:04.88] Expose them.
[46:05.88] You still need to turn that into an inference survey, you still need to like install the
[46:08.36] Python packages and that kind of thing.
[46:09.84] So lots of replicate models are transformers models and diffusers models inside cog, you
[46:17.20] know.
[46:18.20] So that's like the level that that sits.
[46:19.20] So it's very complimentary in some sense and, you know, we're kind of working on integrations
[46:22.76] with hugging face, such as you can like deploy models from hugging face and into cog models
[46:26.62] and stuff like that to replicate.
[46:28.76] And some of these things like llama file and what Olama are working on are also very complimentary
[46:34.40] in that they're doing a lot of the sort of running these things locally on laptops, which
[46:39.48] is not a thing that works very well with cog, like cog is really designed around servers
[46:43.80] and attaching to CUDA devices and, and, and video GPUs and this kind of thing.
[46:48.08] So we're actually like, you know, figuring out ways that like we can, those things can
[46:52.80] be interoperable because, you know, they should be, and they are quite complimentary in that
[46:56.96] you should be able to like take a model and replicate and run it on your like a machine.
[46:59.64] You should be able to take a model, you know, the machine and run it in the cloud.
[47:02.72] Is the base layer something like, is it at the like the gguf level, which I, by the way,
[47:06.52] I need to get it primarily on like the different formats that have emerged, or is it at the
[47:11.20] star.file level, which is model file, llama file, whatever, whatever, or is it at the
[47:14.88] cog level?
[47:15.88] I don't know, to be honest.
[47:16.88] And I think this is something we still have to figure out.
[47:18.60] There's a lot.
[47:19.60] Yeah.
[47:20.60] Like exactly where there's lines of drawn.
[47:21.60] Don't know exactly.
[47:22.60] This is something we're trying to figure out ourselves, but I think there's only a lot
[47:25.00] of promise about these systems into operating.
[47:27.52] We just want things to work together.
[47:28.52] You know, we want to try and reduce the number of standards so the more, the more these things
[47:31.20] get into operation, you know, convert between each other and that kind of stuff at the manner.
[47:34.96] Cool.
[47:35.96] Well, there's a foundation for that.
[47:36.96] Andreas comes out of Spotify.
[47:38.52] Eric from Moto also comes out of Spotify.
[47:42.32] You work like Docker and the Olama guys work like Docker.
[47:45.64] The both you and Andreas know that there was somebody else you work with that had a kind
[47:49.56] of like similar, not similar idea, but like was interested in the same thing.
[47:53.12] Or did you then just see, oh, I know those people, they're doing something very similar.
[47:59.08] We learned about both early on actually, yeah, because we know them both quite well.
[48:04.20] And it's funny how I think we're all seeing the same problems and just like applying, you
[48:08.72] know, trying to fix the same problems that we're all seeing.
[48:10.84] I think the Olama one's particularly funny because I joined Docker through my startup.
[48:18.16] Finally, actually the thing which worked from my startup was composed, but we were actually
[48:22.24] working on another thing which was a bit like EC2 for Docker.
[48:25.56] So we were working on like productionizing Docker containers and Olama was working on
[48:31.28] a thing called Kitematic, which was a bit like a desktop app for Docker.
[48:36.72] So and our companies both got bought by Docker at the same time.
[48:41.60] And you know, Kitematic turned into Docker desktop.
[48:44.56] And then, you know, our thing then turned into compose.
[48:47.60] And it's funny how we're both applying our like the things we saw at Docker to the AI
[48:53.12] world, but they're building like the local environment for us and we're building like
[48:56.84] the cloud for it.
[48:58.52] And yeah, so that's just like really pleasing.
[49:01.12] And I think, you know, we're collaborating closely because there's just so much opportunity
[49:04.76] of working there.
[49:05.76] When you have a hammer, everything's a nail.
[49:07.72] Yeah, exactly.
[49:08.72] Exactly.
[49:09.72] So I think a lot of where we're coming from a lot with AI is we're all kind of on the
[49:13.68] replicated team.
[49:14.68] We're all kind of people who have built developer tools in the past.
[49:17.32] So we've got a team, like I worked at Docker, I've got people who worked at Heroku and GitHub
[49:22.56] and like the iOS ecosystem and all this kind of thing.
[49:25.36] Like the previous generation of developer tools where we like figured out a bunch of
[49:30.84] stuff and then like AI's come along and we just don't yet have those tools and abstractions
[49:36.72] like to make it easy to use.
[49:39.08] So we're trying to like take the lessons that we learned from the previous generation
[49:42.52] of stuff and apply it to this new generation of stuff.
[49:46.04] And obviously there's a bit of nuance there because the trick is to take like the right
[49:48.76] lessons and do new stuff where it makes sense.
[49:51.92] You can't just like cut and paste, you know, but that's like how we're approaching this
[49:56.20] is we're trying to like as much as possible, like take some of those lessons we learned
[49:59.64] from like, you know, how Heroku and GitHub was built, for example, and apply them to
[50:04.64] AI.
[50:05.64] We should also talk a little bit about your compute availability.
[50:08.92] We're trying to ask this of all, you know, it's compute provider month.
[50:11.40] Do you own your own GPUs?
[50:12.84] How many do you have access to?
[50:14.48] What do you feel about the tightness of the GPU market?
[50:17.52] We don't own our own GPUs.
[50:18.88] We've got a few that we play around with, but not for production workloads.
[50:23.00] And we are primarily built under public clouds, so primarily GCP and CoreWeave and like it's
[50:27.84] some smatterings elsewhere.
[50:29.36] Not from NVIDIA, which is you're a new investor.
[50:31.76] We work with NVIDIA.
[50:33.28] So, you know, they're kind of helping us get GP availability, like GPUs are hard to get
[50:37.64] hold of.
[50:38.64] Like if you go to AWS and ask for one A100, they won't give you an A100.
[50:43.20] But if you go to AWS and say I'd like a hundred A100 to two years, they're like, sure, we've
[50:46.80] got some.
[50:47.80] And I think the problem is, like that makes sense from their point of view.
[50:50.48] They want just like reliable sustained usage.
[50:53.20] They don't want like spiky usage and like wastage in their infrastructure, which makes
[50:56.00] total sense.
[50:57.00] But that makes it really hard for startups, you know, who are wanting to just like get
[51:00.60] hold of GPUs.
[51:02.08] I think we're in a fortunate position where we can aggregate demand so we can make commits
[51:06.52] to cloud providers.
[51:07.96] And then, you know, we actually have good availability, like, you know, we don't have
[51:11.88] infinite availability, obviously, but you know, if you want an A100 from Replicate, you
[51:14.92] can get it.
[51:15.92] You know, we're seeing other companies pop up as well, like SF Compute is a great example
[51:19.92] of this where they're doing the same idea for training almost where, you know, a lot
[51:24.20] of startups need to be able to train a model, but they can't get hold of GPUs from much
[51:27.36] cloud providers.
[51:28.36] So SF Compute is like letting people rent, you know, 10H100s for two days, which is just
[51:33.32] impossible otherwise.
[51:34.32] And, you know, what their effects we're doing there is that aggregating demand such that
[51:37.60] they can make a big commit to the cloud provider and then let people use smaller chunks of
[51:40.44] it.
[51:41.44] And that's kind of what we're doing is Replicate as well.
[51:42.88] So we're aggregating demand such that we make big commits to the cloud providers and, you
[51:46.76] know, then people can run like a 100 millisecond API request on an A100, you know.
[51:51.36] Coming from a finance background, this sounds surprisingly similar to banks, where the job
[51:55.88] of a bank is maturity transformation is what you call it.
[51:59.28] You take short-term deposits, which technically can be withdrawn at any time, and you turn
[52:02.92] that into long-term loans for mortgages and stuff, and you pocket the difference in interest.
[52:07.92] And that's the bank.
[52:09.40] Yeah.
[52:10.40] That's exactly what we're doing.
[52:11.40] So you run a bank.
[52:12.40] You run a bank.
[52:13.40] Right, yeah.
[52:14.40] And it's so much a finance problem as well, because we have to make bets on the future.
[52:18.76] You have to do forecasting.
[52:19.76] On the value of GPUs.
[52:20.76] Yeah.
[52:21.76] What are you...
[52:22.76] Okay.
[52:23.76] I don't know how much you can disclose, but what are you forecasting down?
[52:28.00] Up a lot?
[52:29.00] Yeah.
[52:30.00] Up 10X?
[52:31.00] I can't really...
[52:32.00] We're projecting our growth with some educated guesses about what kind of models are going
[52:35.08] to come out and what kind of models these will run, you know.
[52:38.08] We need to bet that, like, okay, maybe language models are getting larger.
[52:40.68] So we need to, like, have GPUs without a RAM, or, like, multi-GPU nodes, or maybe models
[52:45.68] are getting smaller.
[52:46.68] We actually need smaller GPUs.
[52:47.68] We have to make some educated guesses about that kind of stuff.
[52:49.92] Yeah.
[52:50.92] Speaking of which, the mixture of experts' models must be throwing a spanner into the
[52:55.08] planning.
[52:56.08] Not so much.
[52:57.08] We've got, like, multi-node A100 machines, and multi-node H100 machines, which can run
[53:01.56] there's no problem.
[53:02.56] So we're set up for that.
[53:03.56] Yeah.
[53:04.56] Okay.
[53:05.56] Right?
[53:06.56] I didn't expect it to be so easy.
[53:07.56] The question was that the amount of RAM per model is increasing a lot, especially on a
[53:12.04] sort of per parameter basis, per active parameter basis, going from, like, mixed trial being
[53:16.56] eight experts to, like, the deep-seek MOE models.
[53:19.56] I don't know if you saw them being, like, 30, 60 experts, and you can see it keep going
[53:25.36] up, I guess.
[53:26.36] Yeah.
[53:27.36] I think we might run into problems at some point, and, yeah, I don't know exactly what's
[53:31.00] going on there.
[53:32.00] I think something that we're finding which is kind of interesting, like, I don't know
[53:35.44] this in depth.
[53:36.44] You know, we're certainly seeing a lot of good results from lower-precision models.
[53:39.92] So, like, you know, 90% of the performance with just, like, much less RAM required.
[53:44.84] That means, like, we can run them on GPUs we have available, and it's good for customers
[53:48.04] as well, because it runs faster, and, like, they want that trade-off, you know, where
[53:52.62] it's just slightly worse, but, like, way faster and cheaper.
[53:55.92] Do you see a lot of GPU waste in terms of people running the thing on a GPU that is,
[54:00.88] like, too advanced?
[54:01.88] I think we use C4 to run Whisper.
[54:03.88] So we were at the bottom end of it.
[54:05.80] Yeah.
[54:06.80] Any thoughts?
[54:07.80] I think one of the hackathons we were at, people were like, "Oh, how do I get access
[54:10.36] to, like, H100?"
[54:11.36] And it's like, you need to run, like, SIP on the future, and it's like, you don't need
[54:15.80] H100.
[54:16.80] Yeah.
[54:17.80] Yeah.
[54:18.80] Well, if you want low-lacency, like, sure, like, spend a lot of money on the H100, yeah,
[54:22.36] we see a ton of that kind of stuff, and it's surprisingly hard to optimize these models
[54:28.08] right now.
[54:29.08] So a lot of people are just running, like, really unoptimized models.
[54:31.72] We're doing the same, honestly.
[54:32.72] Like, we're a lot of models on Replicate.
[54:34.20] I've just been, like, not been optimized very well.
[54:37.04] So something we want to, like, be able to help people with is optimizing those models.
[54:42.64] Like, either we, you know, show people how to with guides, or we make it easier to use
[54:47.40] some of these more optimized inference servers, or we show people how to compile the models,
[54:53.00] or we do that automatically, or something like that.
[54:55.92] But that's only something we're exploring, because there's so much wastage.
[54:58.64] Like, it's not just wasting the GPUs, it's also, like, a bad experience, and the models
[55:01.76] run slow, you know?
[55:02.88] Right.
[55:03.88] So the models on Replicate almost all pushed by our community, like, people who have pushed
[55:07.44] those models themselves.
[55:08.96] But like, it's like a big-headed distribution where there's, like, a long tail of lots of
[55:12.52] models that people have pushed, and then, like, a big head of, like, the models most
[55:16.52] people run.
[55:17.52] So models like Llama 2, like Stable Diffusion, you know, we work with Meta and Stability
[55:23.00] to, like, maintain those models, and we've done a ton of optimizations to make those
[55:26.84] really fast.
[55:28.08] So those models are optimized, but the long tail is not, and there's, like, a lot of
[55:31.36] wastage there.
[55:32.36] Yeah.
[55:33.36] And going into the, well, it's already the new year, do you see the customer demand
[55:38.08] and the GPU, like, hardware demand kind of, like, stink together?
[55:41.56] Because I think a lot of people are saying, "Oh, there's, like, hundreds of thousands
[55:44.64] of GPUs being shipped this year, like, the crunch is going to be over," but you also
[55:48.28] have, like, millions of people that don't care about using AI.
[55:50.96] You know, how do you see the two lines progressing?
[55:53.08] Are you seeing customer demand is going to outpace the GPU growth?
[55:57.32] Do you see them together?
[55:58.52] Do you see, maybe, a lot of this, like, model improvement work kind of helping alleviate
[56:04.16] that?
[56:05.16] Yeah, that's a good question.
[56:06.32] From our point of view, demand is not outpacing supply GPUs, like, we have enough, from our
[56:10.68] point of view, we have enough GPUs to go around, but that might change for sure.
[56:14.48] Yeah.
[56:15.48] That's a very nicely pulled away as a startup founder to respond, I think.
[56:21.52] So as you framed it more, it's, like, sort of picking the wrong box model, whereas yours
[56:24.80] is more about maybe the inference stack, if you can call it.
[56:28.24] Were you referencing VLM?
[56:30.60] What other sort of techniques are you referencing?
[56:33.04] And also keeping in mind that when I talk to your competitors, and I don't know if, we
[56:37.76] don't have to name any of them, but they are working on trying to optimize the kinds of
[56:42.00] models.
[56:43.00] Like, they basically, they'll quantize their models for you with their special stack.
[56:46.04] So you basically use their versions of Lama2, you use their versions of Mistral.
[56:51.40] And that's when we need to approach it.
[56:53.56] I don't see it as the replicate DNA2 do that, because that would be, like, sort of, you
[56:57.76] would have to slap the replicate house brand on something, which, I mean, just comment
[57:02.36] on any of that.
[57:03.36] Like, what do you mean when you say optimize models?
[57:04.88] Yeah, things like quantizing the models, you can imagine a way that we could help people
[57:09.00] quantize their models if we want to.
[57:11.24] We've had success using inference servers like VLM and TRTLM, and we're using those
[57:18.08] kind of things to serve language models.
[57:20.36] We've had success with things like AI templates, which compile the models, all of those kind
[57:25.52] of things.
[57:26.52] And there's like some, even really just boring things of just, like, making the code more
[57:29.92] efficient.
[57:30.92] Like some people, like when they're just writing some Python code, it's really easy to just
[57:34.08] write an efficient Python code, and there's, like, really boring things like that as well.
[57:38.08] But it's like a whole smash of things like that.
[57:40.16] You'll do that for a customer, like, you look at their code, and we, yeah, we've certainly
[57:43.96] helped some of our customers be able to do that some of the stuff, that some stuff, yeah.
[57:47.20] And a lot of the models on, like the popular models on Replicate, we've rewritten them
[57:51.60] to use that stuff as well.
[57:53.88] And like the stable diffusion that we run, for example, is compiled for the AI template
[57:57.40] to make it super fast.
[57:59.44] And it's all open source that you can see all of this stuff on GitHub if you want to
[58:02.68] see how we do it.
[58:03.68] But you can imagine ways that we could help people, you know, it's almost like built into
[58:07.28] the cog layer, maybe, where we could help people, like, use these fast inference servers
[58:11.24] or use AI template to compile their models to make it faster, whether it's like manual,
[58:16.48] semi-manual or automatic.
[58:17.48] We're not really sure.
[58:18.48] You know, that's something we want to explore because, you know, that benefits everyone.
[58:21.24] And then on the competitive piece, there was a price war on mixed trial last year, this
[58:25.56] last December.
[58:26.56] As far as I can tell, you guys did not enter that war.
[58:29.20] You have mixed trial, but, you know, it's just regular pricing.
[58:32.12] I think also some of these players are probably losing money on their pricing.
[58:36.68] You know, you don't have to say anything, but it's, you know, it's somewhere, the break
[58:38.96] even is somewhere between 50 to 75 cents per million tokens served.
[58:43.28] How are you thinking about like the, just the overall competitiveness in the market?
[58:46.32] How should people choose when everyone's an API?
[58:50.36] So for Lama 2 and mixed trial, I think not mixed trial, I can't remember exactly.
[58:55.52] We have, you know, similar performance and similar price to some of these other services.
[59:01.68] We're not like bargain basement, like to some of the others, because to your point, like,
[59:06.16] we don't want to like burn tons of money, but we're, you know, pricing it sensibly and
[59:10.76] sustainably to a point where we think it's, we think, you know, it's competitive with
[59:14.72] other people such that we want developers using Replicate and we don't want to price
[59:18.68] it such that it's like only affordable by big companies.
[59:21.72] We want to make it cheap enough such that the developers can afford it, but we also
[59:24.28] don't like want the super cheap prices because then it's almost like then your customers
[59:29.12] are hostile, you know, and the more customers you get, the worse it gets, you know.
[59:32.96] So we're pricing it sensibly, but still to the point where, you know, hopefully it's
[59:37.48] cheap enough to build on and I think the thing we really care about, like we want to, obviously
[59:42.08] we want, you know, models and Replicate to be comparable to other people.
[59:45.84] But I think the really crucial thing about Replicate and the way I think we think about
[59:49.68] it is that it's not just the API for them, particularly an open source.
[59:54.20] It's not just the API for the model that is the important bit.
[59:57.80] It's because quite often with open source models, like the whole point of open source
[60:00.96] is that you can tinker on it and you can customize it and you can fine tune it and you can like
[60:04.16] smash it together with another model like Lava, for example.
[60:08.08] And you can't do that if it's just like a hosted API because it's just like, you know,
[60:12.00] you can't touch the code.
[60:13.10] So what we want to do with Replicate is build a platform that's actually open.
[60:17.36] So like we've got all of these models where the performance and price is on par with everything
[60:22.56] else.
[60:23.56] But if you want to customize it, you can fine tune it.
[60:25.56] You can go to GitHub and get the source code for it and edit the source code and push up
[60:28.64] your own custom version and this kind of thing because that's the crucial thing for open source
[60:32.96] machine learning is be able to tinker on it and customizing it.
[60:35.60] And we think, we think that's really important to make open source AI work.
[60:39.76] You mentioned open source.
[60:41.56] How do you think about levels of openness?
[60:43.64] When Lama2 came out, I wrote a post about this, about it's like open source and there's
[60:48.20] open weights, then there's restricted weights.
[60:51.12] It was on the front page of AgriNews, so there was like all sort of comments from people.
[60:55.32] So I'm always curious to hear your thoughts.
[60:57.64] Like what do you think is okay for people to license?
[61:01.88] What's okay for people to not release?
[61:04.00] Before, it was just like close source, big models, open source, little models, purely
[61:10.36] open source stuff.
[61:12.52] And we're now seeing lots of variations where model companies putting restrictive licenses
[61:18.04] on their models.
[61:19.56] That means it can only be used for non-commercial use and a lot of the open source crowd is
[61:25.52] complaining it's not true, open source and all this kind of thing.
[61:30.12] And I think a lot of that is coming from philosophy, the free software movement kind of philosophy.
[61:36.78] And I don't think it's necessarily a bad thing.
[61:38.56] I think it's good that model companies can make money out of their models.
[61:42.08] That's like how it's well incentivized people to make more models and this kind of thing.
[61:45.76] And I think it's totally fine if somebody made something to ask for some money in return
[61:50.40] if you're making money out of it.
[61:51.40] And I think that's totally okay.
[61:53.40] I think there's some really interesting midpoints as well where people are releasing the codes.
[61:56.52] You can still tinker on it, but the person who trained the model still wants to get a
[62:00.88] cut of it if you're making a bunch of money out of it.
[62:02.72] And I think that's good and that's going to make the ecosystem more sustainable.
[62:07.00] I don't think anybody's really figured it out yet.
[62:08.56] We're going to see more experimentation with this and more people try to figure out what
[62:13.68] are the business models around building models and how can I make money out of this.
[62:18.00] And we'll just see where it ends up.
[62:19.68] And I think it's something we want to support as a replicate as well because we believe
[62:24.08] in open source.
[62:25.08] That's great, but there's also going to be lots of models which are close source as well.
[62:30.88] And these companies might not be, there's probably going to be a long tail of a bunch
[62:34.96] of people building models that don't have the reach that OpenAI have.
[62:39.60] And hopefully as a replicate, we can help those people find developers and help them
[62:44.60] make money in that kind of thing.
[62:46.16] I think the computer requirements of AI kind of changed the thing.
[62:49.48] I started an open source company.
[62:51.04] I'm a big open source fan and before it was kind of man hours was really all that went
[62:56.12] into open source.
[62:57.12] It wasn't much monetary investment.
[62:59.44] Well, not that man hours are not worth a lot.
[63:03.04] But if you think about Llama 2, it's like $25 million, you know, like all in.
[63:07.96] It's like you can't just spin up a discord and like spend $25 million.
[63:11.68] So I think it's not positive for everybody that Llama 2 is open source.
[63:15.88] And well, it's the open source, you know, it's the open source term.
[63:19.36] I think people like you're saying it's like they kind of argue on the semantics of it.
[63:24.36] But like all we care about is that Llama 2 is open because if Llama 2 was an open source
[63:29.00] today, like that if Mr. All was not open source, we will be in a bad spot, you know.
[63:33.44] So.
[63:34.44] And I think the nuance here is making sure that these models are still tinkerable because
[63:38.68] the beautiful thing about Llama 2 as a base model is that, yeah, it costs $25 million to
[63:43.60] train to start with, but then you can fine tune it for like $50.
[63:48.76] And that's what's so beautiful about the open source ecosystem and something I think is really
[63:53.00] surprising as well.
[63:54.00] Like completely surprised me, like I think a lot of people assumed that it's not going
[63:58.48] to be open source machine learning is just not going to be practical because it's so
[64:01.88] expensive to train these models, but like fine tuning is unreasonably effective and
[64:06.42] people are getting really good results out of it and it's really cheap.
[64:09.16] So people can effectively create open source models really cheaply and there's going to
[64:14.16] be like this sort of ecosystem of tons of models being made and I think the risk there
[64:19.12] from a licensing point of view is we need to make sure that the licenses let people do
[64:22.68] that because if you release a big model under a non-commercial license and people can't
[64:27.52] fine tune it, you've lost the magic of it being open.
[64:30.96] And I'm sure there are ways to structure that such that the person paying $25 million feels
[64:35.68] like they're compensated somehow and they can feel like they can, you know, they should
[64:39.76] keep on training models and people can keep on fine tuning it, but I guess we just have
[64:43.76] to figure out exactly how that plays out.
[64:46.08] Excellent.
[64:47.08] So just wanted to round it out that you've been excellent and very open.
[64:51.04] I should have started my intro with this, but I feel like you found the sort of AI engineer
[64:55.04] crew before I did and, you know, something I really resonated with you in sort of the
[65:00.36] series B announcement was that you put in some stats here about how there are two orders
[65:04.80] of magnitude more software engineers than there are machine learning engineers, about
[65:07.68] 30 million software engineers and 500,000 machine learning engineers.
[65:11.24] You can maybe plus, minus one of those orders of magnitude, but it's around that ballpark.
[65:14.76] And so obviously there will be a lot more AI engineers than there will be ML engineers.
[65:19.24] How do you see this group?
[65:21.36] Like is it all software engineers?
[65:23.32] Are they going to specialize?
[65:25.80] What would you advise someone trying to become an AI engineer?
[65:29.16] Is this a legitimate career path?
[65:30.92] Yeah, absolutely.
[65:31.92] I mean, it's very clear that AI is going to be a large part of how we build software in
[65:37.04] the future now.
[65:38.64] It's a bit like being a software developer in the nineties and ignoring the internet,
[65:42.68] you know, you just need to, you need to learn about this stuff and you need to figure this
[65:46.24] stuff out.
[65:47.24] I don't think it needs to be super low level.
[65:50.80] You don't need to be like, you know, the metaphor here is like, you don't need to be digging
[65:55.08] it down into like this sort of pytorch level if you don't want to, in the same way as a
[66:01.24] software engineer in the nineties, you don't need to be like understanding how network
[66:04.24] stacks work to be able to build a website, you know, but you need to understand the shape
[66:07.12] of this thing and how to hold it and what it's good at and what it's not.
[66:10.68] And that's really important.
[66:12.84] So yeah, certainly just advise people to like just start playing around with it, get a feel
[66:17.20] of like how language models work, get a feel of like how these diffusion models work, get
[66:22.44] a feel of like what fine tuning is and how it works because some of your job might be
[66:28.08] building data sets, you know, get a feeling of how prompting works because some of your
[66:31.08] job might be writing a prompt.
[66:33.20] And those are just all really important skills to sort of figure out.
[66:36.98] Yeah.
[66:37.98] Well, thanks for building the definitive platform for doing all that.
[66:41.08] Yeah, of course.
[66:42.60] Any final call to actions, who should come work at Replicate anything for the audience?
[66:47.40] Yeah.
[66:48.40] Well, I mean, we're hiring a few click on jobs at the bottom of our Replicate.com.
[66:53.44] There's some jobs.
[66:55.12] And I don't think I would use it like just like try out AI even if you don't even if
[67:00.12] you think you're not smart enough.
[67:01.12] Like the whole reason I started this company is because I was looking at the cool stuff
[67:03.88] that Andreas was making.
[67:04.88] Like Andreas is like a proper machine learning person with a PhD, you know, and I was like
[67:08.44] just like, you know, a sort of lowly software engineer.
[67:11.72] I was like, you're doing really cool stuff and I want to be able to do that.
[67:15.08] And by us working together, you know, we've now made it accessible to dummies like me and
[67:19.92] just encourage anyone who's like wants to try this stuff out, just give it a try.
[67:24.24] I would also encourage people who are tool builders, like the limiting factor now on
[67:28.08] AI is not like the technology like technologies made incredible advances.
[67:32.40] And there's just so many incredible machine learning models that can do a ton of stuff.
[67:37.64] The limiting factor is just like making that accessible to people who build products because
[67:42.00] it's really hard to use this stuff right now.
[67:44.48] And obviously we're building some of that stuff as Replicate, but there's just like
[67:46.80] a ton of other tooling and abstraction abstractions that need to be built out to make this stuff
[67:50.44] usable.
[67:51.44] So I just encourage people who like like building developer tools to just like get stuck into
[67:55.56] it as well.
[67:56.56] Because that's going to make this stuff accessible to everyone.
[67:58.84] Yeah.
[67:59.84] I especially want to highlight you have a hacker and residence job opening available, which
[68:03.32] not every company has, which means just join you and hack stuff.
[68:07.32] I think Charlie Halt is doing a fantastic job with that.
[68:09.68] Yep.
[68:10.68] Effectively, like most of our, a lot of our job is just like showing people how to use
[68:15.40] AI.
[68:16.40] So we've just got a team of like software developers and people have kind of figured
[68:18.52] this stuff out who are writing about it, who are making videos about it or making example
[68:23.72] applications to like show people what you can do with this stuff.
[68:26.12] Yeah.
[68:27.12] In my world that used to be called DevRel, but now it's hacker and residence.
[68:31.28] This came from Zeke is another one of our hackers.
[68:38.32] Tell me this came from Chroma cause I'll just start that one.
[68:41.28] We developed, like they answered actually was like, Hey, we came up with that first.
[68:45.44] But I think we came up with it independently because the story behind this is we originally
[68:51.68] called it the DevRel team and DevRel is cursed now.
[68:55.28] Zeke was like, that sounds so boring.
[68:58.92] I'm going to say I'm a developer relations person or developer advocate or something.
[69:05.44] So we're like, okay, what's the like, the way we can make this sound the most fun?
[69:08.84] All right.
[69:09.84] You're right.
[69:10.84] I would say like that, that is consistently the vibe I get from replicate everyone on your
[69:14.12] team.
[69:15.12] I interact with when I go to your San Francisco office, like that's the vibe that you're
[69:18.84] generating.
[69:19.84] It's a hacker space more than an office and you hold a fantastic meetup that's meetups
[69:23.32] there and I think you're really positive presence in our community.
[69:25.88] So thank you for doing all that.
[69:27.60] And it's instilling the hacker vibe and culture into AI.
[69:31.28] I'm really glad that.
[69:32.28] I'm really glad that's working.
[69:33.28] Cool.
[69:34.28] That's a wrap I think.
[69:35.28] Thank you so much for coming on man.
[69:36.28] Yeah, of course.
[69:37.28] Thank you.
[69:38.28] Bye.
[69:53.56] Bye.
[69:54.56] Bye.
[69:55.56] Bye.
[69:56.56] Bye.
[69:57.56] Bye.
[69:58.56] Bye.
[69:59.56] Bye.
[70:00.56] [BLANK_AUDIO]