[by:whisper.cpp] [00:00.00] Hey everyone, welcome to the Lytton Space Podcast. [00:08.04] This is Alessio, partner and CTO of residence at Decibel Partners, and I'm joined by my [00:12.10] co-host Swix, founder of SmallEye. [00:14.32] Hey, and today we have Ben first-man in the studio. [00:17.52] Welcome Ben. [00:18.52] Hey, good to be here. [00:19.52] Ben, you're a co-founder CEO of Replicate. [00:22.04] Before that, you were most notably founder of FIG, which became Docker Compose. [00:25.76] You also did a couple of other things before that, but that's what a lot of people know [00:29.16] you for. [00:30.16] What should people know about you outside of your LinkedIn profile? [00:35.16] Yeah, good question. [00:37.04] I think I'm a builder and tinkerer in a very broad sense, and I love using my hands to [00:41.56] make things, so I work on things maybe a bit closer to tech, like electronics. [00:47.50] I also build things out of wood, and I fix cars, and I fix my bike, and build bicycles, [00:55.08] and all this kind of stuff, and there's so much, I think I've learned from transferable [01:00.72] skills from just working in the real world to building things in software, and so much [01:07.12] about being a builder both in real life and in software that crosses over. [01:11.60] Is there a real world analogy that you use often when you think about a code architecture [01:15.92] or problem? [01:17.60] I like to build software tools as if they were something real. [01:21.76] So I wrote this thing called the command line interface guidelines, which was a bit like [01:26.44] sort of the Mac human interface guidelines, but for command line interfaces, I did it [01:30.06] with the guy I created Docker Compose with and a few other people, and I think something [01:35.32] in there, I think I described that your command line interface should feel like a big iron [01:39.84] machine where you pull a lever and it goes clunk, and things should respond within like [01:44.92] 50 milliseconds as if it was like a real life thing, and another analogy here is like in [01:50.36] the real life, when you press a button on an electronic device, and it's like a soft [01:54.90] switch, and you press it and nothing happens, and there's no physical feedback about anything [01:59.28] happening, and then like half a second later something happens, like that's how a lot of [02:03.12] software feels, but instead like software should feel more like something that's real [02:06.76] where you touch, you pull a physical lever and the physical lever moves, and I've taken [02:11.40] that lesson of kind of human interface to software a ton, it's all about kind of the [02:16.20] latency, things feeling really solid and robust, both the command lines and user interfaces [02:21.80] as well. [02:22.80] And how did you operationalize that for a FIG or Docker? [02:27.32] A lot of it's just low latency, actually we didn't do it very well for FIG, in the [02:31.72] first place we used Python, which was a big mistake, where Python's really hard to get [02:37.04] booting up fast because you have to load up the whole Python runtime before it can run [02:40.12] anything. [02:41.12] Okay. [02:42.12] Go is much better at this, where like go just instantly starts. [02:45.36] You have to be under 500 milliseconds to start up. [02:48.36] Yeah, effectively. [02:49.36] I mean, perception of human things being immediate is something like 100 milliseconds, so anything [02:55.44] like that is good enough. [02:57.88] Yeah. [02:58.88] Also, I should mention since we're talking about your side projects, well one thing is [03:01.56] I am maybe one of a few fellow people who have actually written something about CLI [03:05.64] design principles, because I was in charge of the Netlify CLI back in the day, and had [03:09.84] many thoughts. [03:10.92] One of my fun thoughts, I'll just share in case you have thoughts, is I think CLIs are [03:15.36] effectively starting points for scripts that are then run, and the moment one of the scripts [03:20.52] preconditions are not fulfilled, typically they end, so the CLI developer will just exit [03:27.40] the program. [03:28.88] And the way that I designed, I really wanted to create the Netlify Dev workflow was for [03:32.60] it to be kind of a state machine that would resolve itself. [03:36.32] If it detected a precondition wasn't fulfilled, it would actually delegate to a sub-program [03:41.32] that would then fulfill that precondition, asking for more info or waiting until a condition [03:45.60] is fulfilled, then it would go back to the original flow and continue that. [03:49.32] I don't know if that was ever tried, or is there a more formal definition of it, because [03:53.32] I just came up with it randomly. [03:55.32] But it felt like the beginnings of AI, in the sense that when you run a CLI command, [03:59.36] you have an intent to do something, and you may not have given the CLI all the things that [04:03.60] it needs to do to execute that intent. [04:07.04] So that was my two cents. [04:08.84] Yeah, that reminds me of a thing we sort of thought about when writing the CLI guidelines [04:14.48] where CLIs were designed in a world where the CLI was really a programming environment, [04:20.16] and it's primarily designed for machines to use all of these commands and scripts, whereas [04:27.56] over time, it's back in a world where the primary way of using and computers was writing [04:36.80] shell scripts effectively, and we've transitioned to a world where humans are using CLI programs [04:42.12] much more than they used to, and the current best practices about how UNIX was designed. [04:49.12] There's lots of design documents about UNIX from the '70s and '80s, where they say things [04:54.44] like command line commands should not output anything on success, it should be completely [05:00.40] silent, which makes sense if you're using it in a shell script. [05:04.52] But if a user is using that, it just looks like it's broken. [05:07.96] If you type copy and it just doesn't say anything, you assume that it didn't work as a new user. [05:12.46] I think what's really interesting about the CLI is that it's actually a really good, [05:19.08] to your point, it's a really good user interface where it can be like a conversation, where [05:25.76] it feels like you're, instead of just like you telling the computer to do this thing [05:29.60] and either silently succeeding or saying, "No, you did failed," it can guide you in [05:35.96] the right direction and tell you what your intent might be, and that kind of thing in [05:40.68] a way that's actually, it's almost more natural to a CLI than it is in a graphical user interface [05:45.46] because it feels like this back and forth with the computer. [05:48.54] Almost funnily, like a language model, so I think there's some interesting intersection [05:53.96] of CLIs and language models actually being very closely related and good fit for each [05:59.36] other. [06:00.36] Yeah, I'll say one of the surprises from last year, I worked on a coding agent, I think [06:04.02] the most successful coding agent of my cohort was Open Interpreter, which was a CLI implementation. [06:09.48] I have, even as a CLI person, I have chronically underestimated CLI as a useful interface. [06:15.32] Yeah, totally. [06:17.06] You also developed Archive Vanity, which you recently retired after a glorious seven [06:21.50] years. [06:22.50] Something like that, yeah. [06:23.50] Something like that, which is nice, I guess, HTML PDFs. [06:27.50] Yeah, that was actually the start of where Replicate came from. [06:31.76] Okay, we can tell that story. [06:33.26] So, when I quit Docker, I got really interested in science infrastructure, just as like a [06:37.62] problem area because it is, like, science has created so much progress in the world, [06:44.22] the fact that we're, you know, can talk to each other on a podcast and we use computers [06:49.12] and the fact that we're alive is probably thanks to medical research, you know. [06:52.44] But science is just like completely archaic and broken and there's like 19th century processes [06:56.88] that just happen to be copied to the internet rather than taken to account that, you know, [07:00.68] we can transfer information at the speed of light now. [07:02.76] And the whole way science is funded and all this kind of thing is all kind of very broken. [07:06.04] And there's just so much potential for making science work better. [07:09.04] And I realized that I wasn't a scientist and I didn't really have any time to go and [07:12.48] get a PhD and become a researcher, but I'm a tool builder and I could make existing scientists [07:16.94] better at their job. [07:17.94] And if I could make, like, a bunch of scientists a little bit better at their job, maybe, you [07:21.66] know, that's the kind of equivalent of being a researcher. [07:24.14] So one particular thing I dialed in on is just how science is disseminated in that all [07:28.86] of these PDFs quite often behind paywalls, you know, on the internet. [07:33.50] Yeah, and that's a whole thing because it's funded by national grants, government grants [07:38.70] that put behind paywalls. [07:40.42] Yeah, exactly. [07:41.42] That's like a hole. [07:42.42] Yeah, I could talk for hours about that, but the particular thing we got we got dove in [07:46.00] on was, interestingly, these PDFs are also, there's a bunch of open science that happens [07:50.82] as well. [07:51.82] So math, physics, computer science, machine learning, notably, is all published on the [07:56.86] archive, which is actually a surprisingly old institution. [08:00.32] Some random Cornell. [08:01.32] Yeah, it was just like somebody in Cornell who started a mailing list in the 80s. [08:05.04] And then when the web was invented, they built a web interface around it like it's super [08:09.08] old. [08:10.08] And it's like kind of like a user group thing, right? [08:13.26] That's why they're all these numbers and stuff. [08:15.18] Yeah, exactly. [08:16.18] Like it's a bit like you should have something. [08:19.54] That's where all basically all of math, physics, and computer science happens, but it's still [08:23.10] PDFs published to this thing, which is just so infuriating. [08:27.50] The web was invented at CERN, a physics institution, to share academic writing. [08:32.70] Like there are these, there are figure tags. [08:35.02] There are like author tags, there are heading tags, there are site tags, you know, hyperlinks [08:38.94] are effectively citations. [08:40.96] Because you want to link to another academic paper, I mean instead you have to like copy [08:44.20] and paste these things and try and get around paywalls, like it's absurd, you know. [08:47.24] And now we have like social media and things, but still like academic papers as PDFs, you [08:52.92] know, it's just like, why? [08:53.92] This is not what the web was for. [08:55.42] So anyway, I got really frustrated with that. [08:57.24] And I went on vacation with my old friend Andreas. [08:59.80] So we were, we used to work together in London on a startup at somebody else's startup. [09:04.60] And we were just on vacation in Greece for fun. [09:07.52] And he was like, trying to read a machine learning paper on his phone, you know, like [09:11.30] we had to like zoom in and like scroll line by line on the PDF and he was like, this is [09:15.54] fucking stupid. [09:16.54] So I was like, I know, like this is something, we discovered our mutual hatred for, for this, [09:21.54] you know. [09:22.70] And we spent our vacation sitting by the pool, like making latex to HTML, like converters, [09:29.62] making the first version of archive entity. [09:31.06] Anyway, that was up then a whole thing. [09:33.10] And the story, we shot it down recently because they caught the eye of archive who were like, [09:38.80] oh, this is great. [09:39.80] We just haven't had the time to work on this. [09:40.80] And what's tragic about the archive, it's like this project of Cornell that's like, [09:44.72] they can barely scrounge together enough money to survive. [09:47.00] I think it might be better funded now than it was when we were, we were collaborating [09:49.92] with them. [09:50.92] And compared to these like scientific journals, it's just that this is actually where the [09:53.92] work happens. [09:54.92] They just have a fraction of the money that like these big scientific journals have, which [09:58.04] is just so tragic. [09:59.04] But anyway, they were like, yeah, this is great. [10:00.64] We can't afford to like do it, but do you want to like, as a volunteer, integrate archive [10:04.30] entity into archive? [10:05.30] Oh, you did the work. [10:06.70] We didn't do the work. [10:07.70] We started doing the work. [10:08.70] We did some. [10:09.70] I think we worked on this for like a few months to actually get it integrated into archive. [10:13.22] And then we got like distracted by replicate. [10:15.90] So like I called day and picked up the work and made it happen, like somebody who works [10:21.94] on one of the, the piece of the libraries that powers archive entity. [10:25.42] Okay. [10:26.42] And the relationship with archive sanity. [10:28.26] None. [10:29.26] And did, did you predate them? [10:30.72] But I actually don't know the lineage. [10:32.36] We were after, we both were both users of archive sanity, which is like a sort of archive [10:36.08] event. [10:37.08] Like Rex is on top of archive. [10:39.80] Yeah. [10:40.80] Yeah. [10:41.80] And we were both users of that. [10:42.80] And I think we were trying to come up with a working name for archive and Andreas just [10:45.48] like cracked a joke of like, oh, it's called a kind of vanity. [10:48.12] Let's make the papers look nice. [10:49.44] And that was the working name and it just stuck. [10:54.56] And then from there, tell us more about why you got distracted, right? [10:58.00] So replicate maybe feels like an overnight success to a lot of people, but you've been [11:02.18] building this since 2019. [11:03.70] Yeah. [11:04.70] So what prompted the, the start and we've been collaborating for even longer. [11:07.78] So we created archive vanity in 2017. [11:11.14] So in some sense, we've been doing this almost like six, seven years now, a classic seven [11:15.10] year. [11:16.10] Overnexus. [11:17.10] Yeah. [11:18.10] Yes. [11:19.10] We did archive vanity and then worked on a bunch of like surrounding projects. [11:22.50] I was still like really interested in science publishing at that point. [11:25.26] And I'm trying to remember, because I tell a lot of like the condensed story to people [11:29.36] because I can't really tell like a seven year history is I'm trying to figure out like the [11:32.00] right, the right, the right, the right length to, we want to nail the definitive replicate [11:36.80] story here. [11:37.96] One thing that's really interesting about these machine learning papers is that these [11:42.68] machine learning papers are published on, on the archive. [11:45.60] And a lot of them are actual fundamental research. [11:47.80] So like should be like pros describing a theory, but a lot of them are just running pieces of [11:54.24] software that like a machine learning researcher made that did something, you know, it's like [11:58.16] an image classification model or something. [12:00.78] And they managed to make an image classification model that was better than the state of the [12:04.98] existing state of the arts. [12:07.14] And they've made an actual running piece of software that does image segmentation. [12:11.54] And then what they had to do is they then had to take that piece of software and write [12:15.82] it up as pros and math in a PDF. [12:18.74] And what's frustrating about that is like if, if you want to, so this was like Andreas [12:23.82] was, Andreas was a machine learning engineer at Spotify. [12:27.56] And some of his job was like, he did pure research as well. [12:31.06] Like he did a PhD and he was doing a lot of stuff internally, but part of his job was [12:34.08] also being an engineer and taking some of these existing things that people have made [12:38.96] and published and trying to apply them to actual problems at Spotify. [12:43.88] And he was like, you know, you get given a paper, which like describes roughly how the [12:48.36] model works. [12:49.36] It's probably listing lots of crucial information. [12:51.28] There's sometimes code on GitHub, more and more there's code on GitHub. [12:54.26] But back then it was kind of relatively rare, but it was quite often just like scrappy research [12:58.58] code and didn't actually run. [13:00.74] And you know, there was maybe the weights that were on Google Drive, but they accidentally [13:03.58] deleted the weights of Google Drive, you know, and it was like really hard to like take this [13:07.70] stuff and actually use it for real things. [13:10.14] We just started talking together about like his problems at Spotify. [13:14.18] And I connected this back to my work at Docker as well, was like, oh, this is what we created [13:19.82] containers for. [13:20.82] You know, we sold this problem for normal software by putting the thing inside a container [13:24.08] so that you could ship it around and it kept on running. [13:26.76] So we were sort of hypothesizing about like, hmm, what if we put machine learning models [13:30.48] inside containers so that they could actually be shipped around and they could be defined [13:34.88] in like some production ready formats and other researchers could run them to generate [13:38.96] baselines and you could, people who wanted to actually apply them to real problems in [13:42.60] the world could just pick up the container and run it, you know. [13:46.40] And we then thought this is like whether it gets normally in this part of the story, [13:50.60] I skip forward to be like, and then we created COG, this container stuff for machine learning [13:55.70] models and we created Replicate the place for people to publish these machine learning [13:58.22] models. [13:59.22] But there's actually like two or three years between that. [14:02.00] The thing we then got dialed into was Andreas was like, what if there was a CI system for [14:07.30] machine learning? [14:08.30] Because like one of the things he really struggled with as a researcher is generating baselines. [14:13.22] So when like he's writing a paper, he needs to like get like five other models that are [14:17.42] existing work and get them running. [14:21.16] On the same evals. [14:22.16] On the, exactly on the same evals so you can compare apples to apples because you can't [14:25.08] trust the numbers in the paper. [14:26.76] So you can be Google and just publish them anyway. [14:31.08] So I think this was coming from the thinking of like there should be containers for machine [14:34.44] learning, but why are people going to use that? [14:36.40] Okay, maybe we can create a supply of containers by like creating this useful tool for researchers. [14:42.16] And the useful tool was like, let's get researchers to package up their models and push them to [14:46.60] this central place where we run a standard set of benchmarks across the models so that [14:52.14] you can trust those results and you can compare these models apples to apples. [14:55.34] And for like a researcher for Andreas, like doing a new piece of research, he could trust [14:59.30] those numbers and he could like pull down those models, confirm it on his machine, use [15:03.74] the standard benchmark to then measure his model and you know, all this kind of stuff. [15:07.98] And so we started building that. [15:09.82] That's what we applied to YC with, got into YC and we started sort of building a prototype [15:13.94] of this. [15:15.08] And then this is like where it all starts to fall apart. [15:18.28] We were like, okay, that sounds great. [15:19.76] And we talked to a bunch of researchers and they really wanted that and that sounds brilliant. [15:22.12] That's a great way to create a supply of like models on this research platform. [15:25.76] But how the hell is this a business, you know, like how are we even going to make any money [15:29.08] out of this? [15:30.08] And we're like, oh, shit. [15:31.08] That's like the, that's the real unknown here of like what the business is. [15:34.80] So we thought it would be a really good idea to like, okay, before we get too deep into [15:40.12] this, let's try and like reduce the risk of this turning to a business. [15:44.26] So let's try and fit like research what the business could be for this research tool effectively. [15:49.02] So we went and talked to a bunch of companies trying to sell them something which didn't [15:52.02] exist. [15:53.02] So we're like, hey, do you want a way to share research inside your company? [15:56.74] So the other researchers or say like the product manager can test out the machine learning [16:00.22] model. [16:01.22] And they're like, maybe. [16:02.62] And we were like, do you want like a deployment platform for deploying models? [16:09.22] Like, do you want like a central place for versioning models? [16:12.32] Like we're trying to think of like lots of different like products we could sell that [16:14.56] were like related to this thing. [16:16.36] And terrible idea. [16:17.96] Like we're not salespeople and like people don't want to buy something that doesn't exist. [16:22.96] I think some people can pull this off, but we were just like, you know, a bunch of product [16:26.60] people, products and engineer people, and we just like couldn't pull this off. [16:30.40] So we then got halfway through our YC batch. [16:32.32] We hadn't built a product. [16:33.32] We had no users. [16:35.32] We had no idea what our business was going to be because we couldn't get anybody to like [16:38.08] buy something which doesn't exist. [16:39.08] And actually there was quite a way through our, I think it was like two thirds of the [16:42.62] way through our YC batch or something. [16:43.62] We're like, okay, well, we're kind of screwed now because we don't have anything to show [16:46.26] at demo day. [16:47.82] And then we then like tried to figure out, okay, what can we build in like two weeks? [16:51.98] That'll be something. [16:53.42] So we like desperately tried to, I can't remember what we tried to build at that point. [16:56.98] And then two weeks before demo day, I just remember it was all, we were going down to [17:00.94] mount a view every week for dinners and we got called onto like an all hand zoom call, [17:04.38] which was super weird. [17:05.38] We were like, what's going on? [17:06.72] And they were like, don't come to dinner tomorrow. [17:10.88] And we realized, we kind of looked at the news and we were like, oh, there's a pandemic [17:14.64] going on. [17:15.64] We were like so deep in our style up, we were just like completely oblivious to what was [17:19.40] going on around us. [17:20.40] Was this Jen or Feb? [17:22.36] This was March, 2020. [17:24.04] March, 2020. [17:25.04] Yeah. [17:26.04] Because I remember Silicon Valley at the time was early to COVID. [17:28.72] Yeah. [17:29.72] Like they started locking down a lot faster than the rest of the world. [17:31.84] Yeah, exactly. [17:32.84] Yeah. [17:33.84] Soon after that, like there was the San Francisco lockdowns and then like the YC batch just [17:38.14] like stopped. [17:39.14] There wasn't demo day and it was an incentive blessing for us because we just kind of... [17:46.78] In the normal course of events, you're actually allowed to defer to a future demo day. [17:50.66] Yeah. [17:51.66] So we didn't even tell you to defer because it just kind of didn't happen, you know? [17:55.46] So was YC helpful? [17:57.50] Yes. [17:58.50] We completely screwed up the batch and that was our fault. [18:00.58] I think the thing that YC has become incredibly valuable for us has been after YC. [18:06.92] I think there was a reasonable argument that we couldn't, didn't need to do YC to start [18:12.00] with because we were quite experienced. [18:14.52] We had done some startups before, we were kind of well connected with VCs, you know, [18:18.92] it was relatively easy to raise money because we were like a known quantity. [18:21.64] You know, if you go to a VC and be like, "Hey, I made this piece of..." [18:24.90] It's docker compose for AI. [18:26.40] Yeah. [18:27.40] Exactly. [18:28.40] And like, you know, people can pattern match like that and they can have some trust, you [18:31.90] know what you're doing. [18:32.90] Whereas it's much harder for people straight out of college and that's where like YC sweet [18:36.46] spot is like helping people straight out of college who are super promising, like figure [18:39.42] out how to do that. [18:40.42] Yeah. [18:41.42] No credentials. [18:42.42] Yeah. [18:43.42] Exactly. [18:44.42] So in some sense, we didn't need that, but the thing's been incredibly useful for us [18:45.42] since YC has been, this was actually, I think, so Docker was a YC company and Solomon, [18:51.08] the founder of Docker, I think told me this. [18:52.50] He was like, "A lot of people underestimate the value of YC after you finish the batch." [18:57.58] And his biggest regret was like, not staying in touch with YC. [19:01.08] I might be misattributing this, but I think it was him. [19:04.60] And so we made a point of that and we just stayed in touch with our batch partner who [19:07.44] Jared at YC has been fantastic. [19:09.32] Jared Harris. [19:10.32] Jared Friedman. [19:11.32] Friedman. [19:12.32] And all of like the team at YC, like there was the growth team at YC when they were still [19:16.16] there and they've been super helpful. [19:18.36] And two things been super helpful about that is like raising money, like they just know [19:21.96] exactly how to raise money and they've been super helpful during that process in all of [19:24.80] our rounds. [19:25.80] We've done three rounds since we did YC and they've been super helpful during the whole [19:28.42] process. [19:29.42] And also just like reaching a ton of customers. [19:32.02] So like the magic of YC is that you have all of, like there's thousands of YC companies, [19:35.60] I think, like on the road of thousands of things. [19:38.98] And they're all of your first customers and they're like super helpful, super receptive, [19:43.60] really want to like try out new things. [19:46.12] You have like a warm intro to every one of them basically and there's this mailing list [19:49.44] where you can post about updates to your products, which is like really receptive. [19:54.12] And that's just been fantastic for us. [19:55.48] Like we've just like got so many of our users and customers through YC. [20:00.12] Yeah. [20:01.12] Well, so the classic criticism or the sort of, you know, pushback is people don't buy [20:05.92] you because you are both from YC, but at least they'll open the email. [20:10.96] Yeah. [20:11.96] Right. [20:12.96] Like that's the, okay. [20:13.96] Yeah. [20:14.96] Yeah. [20:15.96] So that's been a really, really positive experience. [20:16.96] And sorry, I interrupted with the YC question. [20:18.28] Like you were, you just made it out of the YC, survived the pandemic. [20:22.36] I'll try and condense this a little bit. [20:24.36] When we started building tools for COVID weirdly, we were like, okay, we don't have a startup. [20:27.84] We haven't figured out anything. [20:28.84] It was always a useful thing we could be doing right now. [20:32.60] Save lives. [20:33.60] So yeah, let's try and save lives. [20:35.48] I think we failed at that as well. [20:36.48] We had a bunch of products that everybody caught anywhere. [20:38.72] We kind of worked on, yeah, a bunch of stuff like contact tracing, which I don't, didn't [20:42.88] really be useful thing. [20:45.52] Sort of Andreas worked on it, like a door dash for like people delivering food to people [20:50.96] who are vulnerable. [20:51.96] What else did we do? [20:53.24] The meta problem of like helping people direct their efforts to what was most useful and [20:57.44] a few other things like that. [20:58.44] I didn't really go anywhere. [20:59.44] So we're like, okay, this is not really working either. [21:00.44] We were considering actually just like doing like work for COVID. [21:03.52] We have this decision document early on in our company, which is like, should we become [21:06.36] a like government app contracting shop, you know? [21:10.84] We decided no- [21:11.84] Because you also did work for the Gov.uk. [21:13.60] Yeah, exactly. [21:14.60] We had experience like doing some like- [21:17.28] And the Guardian and all that. [21:18.28] Yeah. [21:19.28] For like government stuff. [21:20.28] And we were just like really good building stuff. [21:22.44] Like we were just like product people. [21:23.80] Like I was like the front-end product side and Andreas was the back-end side. [21:26.60] So we were just like a product and we were working with a designer at the time, a guy [21:30.32] called Mark, who did our early designs for Replicate and we're like, hey, what if we [21:33.88] just team up and like become it and build stuff. [21:36.56] But yeah, we gave up on that in the end for, can't remember the details. [21:39.88] So we went back to machine learning and then we were like, well, we're not really sure if [21:44.32] this is going to work and one of my most painful experiences from previous startups is shutting [21:49.80] them down. [21:50.80] Like when you realize it's not really working and having to shut it down, it's like a ton [21:52.96] of work and it's, people hate you and it's just sort of, you know. [21:57.60] So we were like, how can we make something we don't have to shut down? [22:00.48] And even better, how can we make something that won't page us in the middle of the night? [22:05.62] So we made an open source project. [22:07.72] We made a thing which was an open source weights and biases because we had this theory that [22:11.88] like people want open source tools. [22:13.48] There should be like an open source like version control experiment tracking like thing. [22:17.76] And it was intuitive to us and they were like, oh, we're software developers and we like [22:20.64] command line tools. [22:21.64] Like everyone loves command line tools and open source stuff for machine learning researchers [22:25.16] just really didn't care. [22:26.16] Like they just wanted to click on buttons. [22:27.44] They didn't mind that it was a cloud service. [22:29.12] Like it was all very visual as well that you need a lot to graphs and charts and stuff [22:33.92] like this. [22:35.12] So it wasn't right. [22:36.52] Like it was right. [22:37.52] We were actually real bleeding something that Andreas made at Spotify for just like saving [22:40.48] experiments to cloud storage automatically, but other people didn't really want this. [22:44.88] So we kind of gave up on that. [22:47.12] And then we, that was actually originally called replicate and we renamed that out the [22:50.08] way. [22:51.08] So it's now called keepsake. [22:52.08] And I think some people still use it. [22:53.60] Then we sort of came back, we looped back to our original idea. [22:58.60] So we were like, oh, maybe there was a thing in that thing we were originally sort of thinking [23:01.88] about of like researchers showing their work and containers for machine learning models. [23:06.20] So we just built that. [23:07.20] And at that point, we were kind of running out of the YC money. [23:10.32] So we were like, okay, this like feels good though. [23:12.24] Let's like give this a shot. [23:13.24] So that was the point we raised the seed round. [23:15.84] We raised seed launch. [23:18.48] We raised pre-launch and pre-team. [23:20.40] It was an idea basically. [23:21.48] We had a little prototype. [23:22.48] It was just an idea and a team, but we were like, okay, like, you know, when bootstrapping [23:28.88] this thing is getting hard. [23:29.88] So that's actually really some money. [23:31.80] Then we made cog and replicates. [23:35.08] It initially didn't have APIs, interestingly. [23:37.64] It was just the bit that I was talking about before of helping researchers share their [23:41.60] work. [23:42.60] It was helping researchers to put their work on a webpage such that other people could [23:47.30] try it out and so that you could download the Docker container. [23:50.04] We cut the benchmarks thing of it because we thought that was just like too complicated. [23:53.60] But it had a Docker container that like, you know, Andreas in a past life could download [23:57.76] and run with his benchmark and you could compare all these models apples to apples. [24:01.76] So that was like the theory behind it. [24:03.84] That kind of started to work. [24:05.80] It was like still when like, you know, it's pre-long time pre-AI hype and there was lots [24:11.76] of interesting stuff going on, but it was very much in like the classic deep learning [24:15.92] era. [24:16.92] So sort of image segmentation models and sentiment analysis and all these kind of things that [24:22.36] people were using deep learning models for. [24:25.48] And we were very much building for research because all of this stuff was happening in [24:29.00] research institutions. [24:30.00] You know, the sort of people who'd be publishing to archive. [24:32.24] So we were creating an accompanying material for their models, basically. [24:35.12] You know, they wanted a demo for their models and we were creating accompanying material [24:38.80] for it. [24:39.80] And what was funny about that is they were like not very good users. [24:42.16] Like they were, they were doing great work, obviously, but the way that research worked [24:46.92] is that they just made like one thing every six months and they just fired and forgot [24:51.92] it. [24:52.92] Like they published this piece of paper and like done, I've published it. [24:56.12] So they like output it to replicate and then they just stopped using replicate. [25:00.28] You know, they were like once every six monthly users. [25:04.12] And that wasn't great for us, but we stumbled across this early community. [25:08.76] This was early 2021 when OpenAI created this, created a clip and people started smushing [25:15.76] clip and GANs together to produce image generation models. [25:19.80] And this started with, you know, it's just a bunch of like tinkerers on Discord, basically. [25:25.04] There was an early model called Big Sleep by AdVadNown. [25:30.00] And then there was VEGA GAN Clip, which was like a bit more popular by Rivers Have Wings. [25:34.88] And it was all just people like tinkering on stuff in collabs and it was very dynamic [25:37.68] and it was people just making copies of collabs and playing around with things and forking. [25:41.32] And to me, this, I saw this and I was like, oh, this feels like open source software. [25:44.28] Like so much more than the research world where like people are publishing these papers. [25:48.72] You don't know their real names and it's just like a Discord thing. [25:51.24] Yeah, exactly. [25:52.24] But crucially, it was like people were tinkering and forking and people were, things were moving [25:55.68] really fast and it just felt like this creative, dynamic, collaborative community in a way [26:03.08] that research wasn't really like it was still stuck in this kind of six month [26:07.64] publication cycle. [26:09.76] So we just kind of latched onto that and started building for this community. [26:14.04] And a lot of those early models were published on replicates. [26:17.72] I think the first one that was really primarily on replicates was one called Pixray, which [26:22.92] was sort of mid 2021 and it had a really cool like pixel art output, but it also just like [26:28.16] produced general, they weren't like crisp images, but they're quite aesthetically pleasing [26:33.88] like some of these early image generation models. [26:36.92] And you know, that was like published primarily on replicates and then a few other models [26:40.04] around that were like published on replicates. [26:42.96] And that's where we really started to find early community and like where we really found [26:45.96] like, oh, we've actually built a thing that people want. [26:49.48] And they were great users as well. [26:50.76] And people really want to try out these models. [26:52.16] Lots of people were like running the models on replicate. [26:55.08] We still didn't have APIs though. [26:56.72] Interesting. [26:57.72] And this is like another like really complicated part of the story. [26:59.28] We had no idea what a business model was still at this point. [27:01.28] I don't think people could even pay for it. [27:03.20] You know, it's just like these web forms where people could run the model. [27:06.24] Just for historical interests, which discords were they and how did you find them? [27:09.40] Was this the Lyon discord? [27:10.40] Yeah, Lyon. [27:11.40] This is Luther. [27:12.40] Yeah. [27:13.40] It was the Luther one. [27:14.40] These two, right? [27:15.40] Luther, I particularly remember. [27:16.40] There was a channel where where VEAKYGUN clip, this was early 2021 where VEAKYGUN clip was [27:20.28] set up as a discord bot. [27:23.08] I just remember being completely just like captivated by this thing. [27:27.56] I was just like playing around with it all afternoon and like the sort of thing. [27:30.28] In discord. [27:31.28] Shit, it's 2am, you know. [27:32.28] Yeah. [27:33.28] This is the beginnings of mid-journey. [27:34.28] Yeah, exactly. [27:35.28] It was the start of mid-journey and you know, it's where that kind of user interface came [27:39.72] from. [27:40.72] Like what's beautiful about the user interface is like you could see what other people are [27:42.96] doing and that you could you could riff off other people's ideas. [27:47.52] And it was just so much fun to just like play around with this in like a channel for over [27:51.64] 100 people. [27:52.88] And yeah, that just like completely captivated me and I like, okay, this is something, you [27:56.20] know, so like we should get these things on replicate. [27:58.40] Yeah. [27:59.40] That's where that all came from. [28:00.60] And then you moved on to, so was it APIs next or was it stable diffusion next? [28:04.12] It was APIs next. [28:05.56] And the APIs happened because one of our users, our web form had like an internal API for [28:11.52] making the web form work. [28:12.68] Like with an API that was called from JavaScript. [28:15.52] And somebody like reverse engineered that to start generating images with a script. [28:19.96] You know, they did like, you know, web inspector copy his car, like figure out what the API [28:25.12] request was. [28:26.12] Yeah. [28:27.12] And it wasn't secured or anything. [28:28.12] Of course not. [28:29.12] They started generating a bunch of images and like we got tons of traffic and like what's [28:32.32] going on. [28:33.32] And I think like a sort of usual reaction to that would be like, Hey, you're abusing [28:37.72] our API and to shut them down. [28:40.24] And instead we're like, Oh, this is interesting. [28:41.76] Like people want to run these models. [28:43.60] So we documented the API in an ocean document, like our internal API in an ocean document [28:49.48] and like message this person being like, Hey, you seem to have found our API. [28:56.00] Here's the documentation. [28:57.00] That'll be like a thousand bucks a month with a straight form that we just click some [29:01.60] buttons to make. [29:02.60] And they were like, Sure, that sounds great. [29:03.72] So that was our first customer a thousand bucks a month. [29:07.04] It was, it was surprising. [29:08.04] A lot of money. [29:09.04] That's not. [29:10.04] It was on the casual. [29:11.04] It was on the order of a thousand bucks a month. [29:12.04] So was he, was it a business thing? [29:13.52] It was the creator of Picks Ray. [29:16.16] Like it was, he generated NFT art. [29:19.96] And so he like made a bunch of art with these models and was selling these NFTs effectively. [29:26.72] And I think lots of people in his community were doing similar things and like he then [29:29.72] referred us to other people who were also generating NFTs using the joint models, started [29:33.12] our API business. [29:34.12] Yeah. [29:35.12] Then we like made an official API and actually like added some billing to it. [29:37.84] So it wasn't just like a fixed fee. [29:40.48] And now people think of you as the host and models API business. [29:43.48] Yeah, exactly. [29:44.48] And that just turned out to be our business, you know, but what ended up being beautiful [29:47.96] about this is it was really fulfilling like the original goal of what we wanted to do [29:52.48] is that we wanted to make this research that people were making accessible to like other [29:57.84] people and for it to be used in the real world. [30:00.88] And this was like the just like ultimately the right way to do it because all of these [30:05.32] people making these generate models could publish them to replicate and they wanted [30:09.00] a place to publish it. [30:10.84] And software engineers, you know, like myself, like I'm not a machine learning expert, but [30:14.56] I want to use this stuff, could just run these models with a single line of code. [30:18.40] And we thought maybe the Docker image is enough, but it's actually super hard to get the Docker [30:21.44] image running on a GPU and stuff. [30:23.08] So it really needed to be the hosted API for this to work and to make it accessible to [30:27.00] software engineers. [30:28.00] And we just like wound our way to this, to yours to the customer. [30:31.88] Yeah, exactly. [30:33.08] Did you ever think about becoming my journey during that time? [30:36.64] You have like so much interest in image generation. [30:39.04] I mean, you're doing fine for the record, but you know, it was right there. [30:45.00] You were playing with it. [30:46.00] I don't think it was our expertise. [30:48.08] Like I think our expertise was DevTools rather than like mid-journeys, almost like a consumer [30:51.48] products, you know, so I don't think it was our expertise. [30:55.44] Certainly occurred to us. [30:56.44] I think at the time we were thinking about like, oh, maybe we could hire some of these [30:59.08] people in this community and make great models and stuff like this, but we ended up more [31:03.32] being at the tooling. [31:04.32] Like I think like before I was saying, like I'm not really a researcher, but I'm more [31:06.48] like the tool builder behind the scenes. [31:08.04] And I think both me and Andreas are like that. [31:09.88] I think this is an illustration of the tool builder philosophy. [31:12.92] Something where you very, you latch onto in DevTools, which is when you see people behaving [31:17.00] weird, it's not their fault. [31:18.00] It's yours. [31:19.00] And you want to pave the cow paths is what they say, right? [31:20.52] Like the unofficial paths that people are making, like make it official and make it easy [31:23.60] for them and then maybe charge a bit of money. [31:25.84] And now fast forward a couple of years, you have 2 million developers using Replicate. [31:30.56] Maybe more. [31:31.56] That was the last public number that I found. [31:33.84] It's 2 million users, not all those people are developers, but a lot of them are developers. [31:38.32] And then 30,000 paying customers was the number. [31:41.88] Late in space runs on Replicate. [31:43.44] So we had a small podcast there and we hosted a subscription on whispered diarization on [31:47.92] Replicate. [31:48.92] So and we're paying. [31:49.92] So we're late in space is in the 30,000. [31:52.48] You raised a $40 million series B. [31:54.52] I would say that maybe the stable diffusion time, August 22 was like really when the company [32:00.08] started to break out. [32:01.40] Tell us a bit about that and the community that came out and I know now you're expanding [32:05.16] beyond just image generation. [32:06.72] Yeah. [32:07.72] Like I think we kind of set us all, like we saw that was this really interesting to image [32:10.72] generative image world going on. [32:12.28] So we kind of, you know, like we're building the tools for that community already, really. [32:16.84] And we knew stable diffusion was coming out. [32:20.12] We knew it was a really exciting thing, you know, it was the best generative image model [32:22.80] so far. [32:23.80] I think the thing we didn't, we underestimated was just like what an inflection point it [32:27.96] would be where it was, I think Simon Welleson put it this way, where he said something along [32:33.84] the lines of it was a model that was open source and tinkerable and like, you know, it's just [32:39.92] good enough and open source and tinkerable such that it just kind of took off in a way [32:43.36] that none of the models had before. [32:46.28] And like what was really neat about stable diffusion is it was open source so you could [32:50.76] like, compared to like Dali, for example, which was like sort of equivalent quality. [32:55.04] And like the first week we saw like people making animation models out of it. [32:58.42] We saw people make like game texture models that like use circular convolutions to make [33:03.24] repeatable textures. [33:04.32] We saw, you know, a few weeks later, like people were fine tuning it so you could make, put [33:07.72] your face in these models and all of these other. [33:10.60] Textual inversion. [33:11.60] Yep. [33:12.60] Yeah. [33:13.60] Exactly. [33:14.60] That happened a bit before that. [33:15.60] And all of this sort of innovation was happening all of a sudden and people were publishing [33:20.04] out and replicate because you could just like publish arbitrary models and replicate. [33:22.80] So we had this sort of supply of like interesting stuff being built. [33:25.56] But because it was a sufficiently good model, there was also just like a ton of people building [33:31.48] with it. [33:32.48] They were like, oh, we can build products with this thing. [33:33.88] And this was like about the time where people were starting to get really interested in [33:36.40] AI. [33:37.40] So like tons of product builders wanted to build stuff with it and we were just like [33:39.64] sitting in there in the middle is like the interface layer between like, all these people [33:42.88] wanted to build and all these like machine learning experts who were building cool models. [33:46.18] And that's like really where it took off. [33:47.84] We were just sort of credible supply and credible demand and we were just like in the middle. [33:51.56] And then yes, since then we've just kind of grown and grown really. [33:55.12] And we, you know, been building a lot for like the indie hacker community, these like [33:58.08] individual tinkerers, but also startups and a lot of large companies as well who are sort [34:01.92] of exploring and building AI things. [34:05.14] Then kind of the same thing happened like middle of last year with language models and [34:09.92] Lama 2 where the same kind of stable diffusion effect happened with Lama and Lama 2 was like [34:14.88] our biggest week of growth ever because like tons of people wanted to tinker with it and [34:17.96] run it. [34:19.52] And you know, since then we've just been seeing a ton of growth in language models as well [34:22.36] as image models. [34:23.36] Yeah. [34:24.36] We're just kind of riding a lot of the interest that's going on in AI and all the people building [34:27.72] an AI, you know. [34:28.72] Yeah. [34:29.72] Kudos in the right place, right time. [34:30.72] But also, you know, took a while to position for the right place before the wave came. [34:34.76] I'm curious if like you have any insights on these different markets. [34:38.20] So Peter levels notably very loud person, very picky about his tools. [34:43.56] I wasn't sure actually if he used you. [34:45.20] He does. [34:46.20] He does. [34:47.20] Because you cited them on your Series B blog post and Danny Postmore as well, his competitor [34:49.12] all in that wave. [34:50.36] What are their needs versus, you know, the more enterprise or B2B type needs? [34:55.76] Did you come to a decision point where you're like, okay, you know, how serious are these [34:58.70] in the hackers versus like the actual businesses that are bigger and perhaps better customers [35:03.36] because they're less tourney? [35:04.76] They're surprisingly similar because I think a lot of people right now want to use and [35:09.32] build with AI, but they're not AI experts. [35:13.28] And they're not infrastructure experts either. [35:14.92] So they want to be able to use this stuff without having to like figure out all the internals [35:18.04] or the models and, you know, like touch pie torch and whatever. [35:23.12] And they also don't want to be like setting up and booting up servers. [35:26.44] And that's the same all the way from like indie hackers just getting started because [35:31.60] like obviously you just want to get started as quickly as possible all the way through [35:35.20] to like large companies who want to be able to use this stuff, but don't have like all [35:39.00] of the experts on stuff, you know, you know, big companies like Google and so on. [35:43.24] They do actually have a lot of experts on stuff, but the vast majority of companies [35:45.96] don't. [35:46.96] And they're all software engineers who want to be able to use this AI stuff, but they [35:49.32] just don't know how to use it. [35:51.36] And it's like, you really need to be an expert and it takes a long time to like learn the [35:54.64] skills to be able to use that. [35:55.64] So they're surprisingly similar in that sense. [35:57.36] I think it's kind of also unfair of like the indie community, like surprise, they're not [36:02.04] churning surprisingly, or churning or spiky surprisingly. [36:05.24] They're building real established businesses, which is like kudos to them, like a building [36:10.44] these really like large sustainable businesses, often just as solo developers. [36:16.96] And it's kind of remarkable how they can do that, actually, and it's credits or a lot [36:19.68] of their like product skills. [36:21.84] And you know, we're just like there to help them being like their machine learning team [36:24.96] effectively to help them use all of this stuff. [36:27.28] A lot of these indie hackers are some of our largest customers, like alongside some of our [36:31.48] biggest customers that you would think would be spending a lot more money than them. [36:34.68] But yeah. [36:35.68] And we should name some of these. [36:36.68] You have them on your landing page. [36:37.68] You have Unsplash, CharacterAI. [36:40.88] What do they power? [36:41.88] What can you say about their usage? [36:43.50] Yeah, totally. [36:44.50] It's kind of a various things. [36:46.84] Well, I mean, I'm naming them because they're on your landing page. [36:50.00] So you have logo rights. [36:51.92] It's useful for people to, like I'm not imaginative. [36:54.52] I see, monkey see monkey do, right, like if I see someone doing something that I want [36:58.28] to do, then I'm like, okay, replicates great for that. [37:01.04] So that's what I think about case studies on company landing pages is it's just a way [37:04.84] of explaining, like, yeah, this is something that we are good for. [37:08.64] Yeah, totally. [37:09.64] I mean, it's these companies are doing things all the way up and down the stack at different [37:14.52] levels of sophistication. [37:16.36] So like Unsplash, for example, they actually publicly posted this story on Twitter where [37:22.00] they're using blip to annotate all of the images in their catalog. [37:27.80] So you know, they have lots of images in the catalog and they want to create a text description [37:30.64] of it so you can search for it. [37:31.80] And they're annotating the images with, you know, off the shelf open source model, you [37:34.88] know, we have this big library of open source models that you can run. [37:38.32] And you know, we've got lots of people are running these open source models off the shelf. [37:42.02] And then most of our larger customers are doing more sophisticated stuff so like fine [37:46.72] tuning the models, they're running completely custom models on us. [37:50.76] A lot of these larger companies are like using us for a lot of their, you know, inference, [37:56.80] but it's like a lot of custom models and them like writing the Python themselves because [38:01.32] they've got machine learning experts and team on the team and they're using us for like, [38:05.68] you know, their inference infrastructure effectively. [38:08.64] So it's like lots of different levels of sophistication where like some people using these off the [38:12.00] shelf models, some people are fine tuning models. [38:14.88] So like level Peter levels is great example where a lot of his products are based off [38:18.24] like fine tuning, fine tuning image models, for example. [38:22.20] And then we've also got like larger customers who are just like using us as infrastructure [38:25.68] effectively. [38:26.68] So yeah, it's like all things up and down, up and down the stack. [38:29.12] Let's talk a bit about cog and the technical layer. [38:32.50] So there are a lot of a GPU clouds. [38:35.68] I think people have different pricing points and I think everybody tries to offer a different [38:39.96] developer experience on top of it, which then lets you charge a premium. [38:44.84] Why did you want to create cog? [38:46.80] You worked at doggie. [38:47.80] What were some of the issues with traditional container runtimes? [38:50.28] And maybe yeah, what were you surprised with as you built it? [38:54.12] Cog came right from the start actually when we were thinking about this, you know, evaluation [38:58.52] and the sort of benchmarking system for machine learning researchers where we wanted researchers [39:04.32] to publish their models in a standard format that was guaranteed to keep on running, that [39:10.36] you could replicate the results of, like that's where the name came from. [39:14.16] And we realized that we needed something like Docker to make that work, you know. [39:18.72] And I think it was just like natural from my point of view of like obviously that should [39:22.12] be open source that we should try and like create some kind of open standard here that [39:25.16] people can share because if more people use this format, then that's great for everyone [39:29.84] involved. [39:30.84] I think the magic of Docker is not really in the software, it's just like the standard [39:34.68] that people have agreed on like here are a bunch of keys for a JSON document basically. [39:40.84] And you know, that was the magic of like the metaphor of real containerization as well. [39:44.24] It's not the containers that are interesting, it's just like the size and shape of the damn [39:47.52] box. [39:48.52] Right. [39:49.52] And it's similar thing here where really we just wanted to get people to agree on like [39:52.88] this is what a machine learning model is. [39:55.00] This is how a prediction works, this is what the inputs are, this is what the outputs are. [39:59.76] So cog is really just a Docker container that attaches to a CUDA device if it needs a GPU [40:06.04] that has a open API specification as a label on the Docker image and the open API specification [40:12.52] defines the interface for the machine learning model, like the inputs and outputs effectively [40:19.04] or the the params in machine learning terminology. [40:21.92] And you know, we just tried wanted to get people to kind of agree on this thing. [40:25.04] And it's like general purpose enough that we weren't saying like some of the existing [40:28.08] things were like at the graph level, but we really wanted something general purpose enough [40:32.20] that you could just put anything inside this and it was like future compatible and it was [40:35.08] just like arbitrary software and you know, be future compatible with like future inference [40:38.82] servers and future machine learning model formats and all this kind of stuff. [40:42.20] So that was the intent behind it. [40:43.80] It just came naturally that we wanted to define this format and that's been really working [40:47.24] for us like a bunch of people have been using cog outside of replicates, which is kind of [40:51.16] our original intention. [40:52.16] Like this should be how she wonderful packaged and how people should use it. [40:55.64] Like it's common to use cog in situations where like maybe they can't use the SAS service [41:01.22] because I don't know, they're in a big company and they're not allowed to use a SAS service, [41:04.96] but they can use cog internally still and like they can download the models from replicates [41:08.36] and run them internally and in their org, which we've been seeing happen and that works [41:12.00] really well. [41:13.00] People who want to build like custom inference pipelines, but don't want to like reinvent [41:16.08] the world, they can use cog off the shelf and use it as like a component in their inference [41:19.80] pipelines. [41:20.80] We've been doing tons of usage like that and it's just been kind of happening organically. [41:23.80] We haven't really been trying, but it's like there if people want it and we've been seeing [41:27.36] people use it. [41:28.36] So that's great. [41:29.36] Yeah. [41:30.36] So a lot of it is just sort of philosophical of just like, this is how it should work from [41:31.76] my experience at Docker, you know, and there's just a lot of value from like the core being [41:35.36] open, I think, and other people can share it and it's like an integration point. [41:38.24] So, you know, if Replicate, for example, wanted to work with a testing system like a CI system [41:43.60] or whatever, we can just like interface at the cog level, like that system just needs [41:47.56] to put cog models and then you can like test your models on that CI system before they [41:51.80] get deployed to Replicate. [41:52.80] And it's just like a format that everyone, we can get everyone to agree on. [41:55.52] What do you think? [41:57.08] I guess Docker got wrong because if I look at a Docker Compose and a cog definition, first [42:01.56] of all, the cog is kind of like the Docker file plus the Compose versus in Docker Composer [42:06.20] just exposing the services and also Docker Compose is very like ports driven versus you [42:12.60] have like the actual, you know, predict this is what you have to run. [42:17.04] Yeah. [42:18.04] Any learning, some maybe tips for other people building container based runtimes, like how [42:21.72] much should you separate the API services versus the image building or how much you [42:27.96] want to build them together. [42:28.96] I think it was coming from two sides. [42:31.72] We were thinking about the design from the point of view of user needs, what are their [42:37.12] problems and what problems can be solved for them, but also what the interface should [42:41.96] be for a machine learning model. [42:43.36] And it's sort of the combination of two things that led us to this design. [42:47.64] So the thing I talked about before was a little bit of like the interface around the machine [42:50.96] learning model. [42:51.96] So we realized that we wanted to be general purpose. [42:54.40] We wanted to be at the like JSON, like human readable things rather than the tensor level. [43:02.40] So it's like an open API specification that wrapped a Docker container. [43:04.76] That's where that design came from. [43:07.12] And it's really just a wrapper around Docker. [43:08.76] So we kind of building on, standing on shoulders there, but with Docker's to low level. [43:13.00] So it's just like arbitrary software. [43:14.72] So we wanted to be able to like have a open API specification there that defined the function [43:21.44] effectively that is the machine learning model, but also like how that function is written, [43:27.68] how that function is run, which is all defined in code and stuff like that. [43:30.16] So it's like a bunch of abstraction on top of Docker to make that work. [43:34.04] And that's where that design came from. [43:36.28] But the core problems we were solving for users was that Docker's really hard to use [43:42.08] and productionizing machine learning models is really hard. [43:45.00] Right. [43:46.00] So on the first part of that, we knew we couldn't use Docker files. [43:49.56] Like Docker files are hard enough for software developers to write. [43:52.12] I'm saying this with love as somebody who works on Docker and like worked on Docker files, [43:56.36] but it's really hard to use. [43:57.36] And you need to know a bunch about Linux basically because you're running a bunch of CLI commands. [44:01.16] You need to know a bunch of Linux and best practices like how app works and all this [44:04.84] kind of stuff. [44:05.84] So we're like, okay, we can't, we can't do it to that level. [44:07.32] We need something that machine learning researchers will be able to understand like people who [44:09.88] are used to like co-lab notebooks. [44:12.28] And what they understand is they're like, I need this version of Python. [44:15.08] I need these Python packages. [44:16.80] And somebody told me to app get install something. [44:19.04] You know. [44:20.04] It throws pseudo in there when I don't really know what that means. [44:24.16] So we tried to create a format that was at that level and that's what cog.yaml is. [44:27.18] And we're really kind of trying to imagine like, what is that machine learning research [44:31.28] you're going to understand, you know, and trying to build for them. [44:34.16] Then the productionizing machine learning models thing is like, okay, how can we package [44:39.16] up all of the complexity of like productionizing machine learning models? [44:42.92] Like picking CUDA versions, like hooking it up to GPUs, writing an inference server, defining [44:50.00] a schema, doing batching, all of these just like really gnarly things that everyone does [44:55.16] again and again, and just like, you know, provide that as a tool. [44:59.80] And that's where that side of it came from. [45:01.40] So it's like combining those user needs with, you know, the sort of world need of needing [45:06.32] something like a common standard for like what a machine learning model is. [45:09.60] And that's, that's how we thought about the design. [45:11.12] I don't know whether that answers the question. [45:12.32] Yeah. [45:13.32] So your idea was like, hey, you really want what Docker stands for in terms of standard, [45:18.48] but you actually don't want people to do all the work that goes into Docker. [45:22.56] It needs to be higher level, you know. [45:25.12] So I want to, for the listener, you're not the only standard that is out there. [45:29.12] As with any standard, there must be 14 of them. [45:31.54] You are very surprisingly friendly with Olama, who is your former colleagues from Docker who [45:35.92] came out with the model file, Mozilla came out with the llama file. [45:40.26] And then I don't know if this is in the same category even, but I'm just going to throw [45:42.80] it in there. [45:43.80] Like hugging face has the transformers and diffusers library, which is a way of disseminating [45:46.60] models that obviously people use. [45:49.36] How would you compare your contrast, your approach of cog versus all these? [45:52.72] It's kind of complimentary actually, which is kind of neat in that a lot of transformers, [45:57.48] for example, is lower level than cog. [45:59.24] So it's, you know, Python library effectively, but you still need to like. [46:04.88] Expose them. [46:05.88] You still need to turn that into an inference survey, you still need to like install the [46:08.36] Python packages and that kind of thing. [46:09.84] So lots of replicate models are transformers models and diffusers models inside cog, you [46:17.20] know. [46:18.20] So that's like the level that that sits. [46:19.20] So it's very complimentary in some sense and, you know, we're kind of working on integrations [46:22.76] with hugging face, such as you can like deploy models from hugging face and into cog models [46:26.62] and stuff like that to replicate. [46:28.76] And some of these things like llama file and what Olama are working on are also very complimentary [46:34.40] in that they're doing a lot of the sort of running these things locally on laptops, which [46:39.48] is not a thing that works very well with cog, like cog is really designed around servers [46:43.80] and attaching to CUDA devices and, and, and video GPUs and this kind of thing. [46:48.08] So we're actually like, you know, figuring out ways that like we can, those things can [46:52.80] be interoperable because, you know, they should be, and they are quite complimentary in that [46:56.96] you should be able to like take a model and replicate and run it on your like a machine. [46:59.64] You should be able to take a model, you know, the machine and run it in the cloud. [47:02.72] Is the base layer something like, is it at the like the gguf level, which I, by the way, [47:06.52] I need to get it primarily on like the different formats that have emerged, or is it at the [47:11.20] star.file level, which is model file, llama file, whatever, whatever, or is it at the [47:14.88] cog level? [47:15.88] I don't know, to be honest. [47:16.88] And I think this is something we still have to figure out. [47:18.60] There's a lot. [47:19.60] Yeah. [47:20.60] Like exactly where there's lines of drawn. [47:21.60] Don't know exactly. [47:22.60] This is something we're trying to figure out ourselves, but I think there's only a lot [47:25.00] of promise about these systems into operating. [47:27.52] We just want things to work together. [47:28.52] You know, we want to try and reduce the number of standards so the more, the more these things [47:31.20] get into operation, you know, convert between each other and that kind of stuff at the manner. [47:34.96] Cool. [47:35.96] Well, there's a foundation for that. [47:36.96] Andreas comes out of Spotify. [47:38.52] Eric from Moto also comes out of Spotify. [47:42.32] You work like Docker and the Olama guys work like Docker. [47:45.64] The both you and Andreas know that there was somebody else you work with that had a kind [47:49.56] of like similar, not similar idea, but like was interested in the same thing. [47:53.12] Or did you then just see, oh, I know those people, they're doing something very similar. [47:59.08] We learned about both early on actually, yeah, because we know them both quite well. [48:04.20] And it's funny how I think we're all seeing the same problems and just like applying, you [48:08.72] know, trying to fix the same problems that we're all seeing. [48:10.84] I think the Olama one's particularly funny because I joined Docker through my startup. [48:18.16] Finally, actually the thing which worked from my startup was composed, but we were actually [48:22.24] working on another thing which was a bit like EC2 for Docker. [48:25.56] So we were working on like productionizing Docker containers and Olama was working on [48:31.28] a thing called Kitematic, which was a bit like a desktop app for Docker. [48:36.72] So and our companies both got bought by Docker at the same time. [48:41.60] And you know, Kitematic turned into Docker desktop. [48:44.56] And then, you know, our thing then turned into compose. [48:47.60] And it's funny how we're both applying our like the things we saw at Docker to the AI [48:53.12] world, but they're building like the local environment for us and we're building like [48:56.84] the cloud for it. [48:58.52] And yeah, so that's just like really pleasing. [49:01.12] And I think, you know, we're collaborating closely because there's just so much opportunity [49:04.76] of working there. [49:05.76] When you have a hammer, everything's a nail. [49:07.72] Yeah, exactly. [49:08.72] Exactly. [49:09.72] So I think a lot of where we're coming from a lot with AI is we're all kind of on the [49:13.68] replicated team. [49:14.68] We're all kind of people who have built developer tools in the past. [49:17.32] So we've got a team, like I worked at Docker, I've got people who worked at Heroku and GitHub [49:22.56] and like the iOS ecosystem and all this kind of thing. [49:25.36] Like the previous generation of developer tools where we like figured out a bunch of [49:30.84] stuff and then like AI's come along and we just don't yet have those tools and abstractions [49:36.72] like to make it easy to use. [49:39.08] So we're trying to like take the lessons that we learned from the previous generation [49:42.52] of stuff and apply it to this new generation of stuff. [49:46.04] And obviously there's a bit of nuance there because the trick is to take like the right [49:48.76] lessons and do new stuff where it makes sense. [49:51.92] You can't just like cut and paste, you know, but that's like how we're approaching this [49:56.20] is we're trying to like as much as possible, like take some of those lessons we learned [49:59.64] from like, you know, how Heroku and GitHub was built, for example, and apply them to [50:04.64] AI. [50:05.64] We should also talk a little bit about your compute availability. [50:08.92] We're trying to ask this of all, you know, it's compute provider month. [50:11.40] Do you own your own GPUs? [50:12.84] How many do you have access to? [50:14.48] What do you feel about the tightness of the GPU market? [50:17.52] We don't own our own GPUs. [50:18.88] We've got a few that we play around with, but not for production workloads. [50:23.00] And we are primarily built under public clouds, so primarily GCP and CoreWeave and like it's [50:27.84] some smatterings elsewhere. [50:29.36] Not from NVIDIA, which is you're a new investor. [50:31.76] We work with NVIDIA. [50:33.28] So, you know, they're kind of helping us get GP availability, like GPUs are hard to get [50:37.64] hold of. [50:38.64] Like if you go to AWS and ask for one A100, they won't give you an A100. [50:43.20] But if you go to AWS and say I'd like a hundred A100 to two years, they're like, sure, we've [50:46.80] got some. [50:47.80] And I think the problem is, like that makes sense from their point of view. [50:50.48] They want just like reliable sustained usage. [50:53.20] They don't want like spiky usage and like wastage in their infrastructure, which makes [50:56.00] total sense. [50:57.00] But that makes it really hard for startups, you know, who are wanting to just like get [51:00.60] hold of GPUs. [51:02.08] I think we're in a fortunate position where we can aggregate demand so we can make commits [51:06.52] to cloud providers. [51:07.96] And then, you know, we actually have good availability, like, you know, we don't have [51:11.88] infinite availability, obviously, but you know, if you want an A100 from Replicate, you [51:14.92] can get it. [51:15.92] You know, we're seeing other companies pop up as well, like SF Compute is a great example [51:19.92] of this where they're doing the same idea for training almost where, you know, a lot [51:24.20] of startups need to be able to train a model, but they can't get hold of GPUs from much [51:27.36] cloud providers. [51:28.36] So SF Compute is like letting people rent, you know, 10H100s for two days, which is just [51:33.32] impossible otherwise. [51:34.32] And, you know, what their effects we're doing there is that aggregating demand such that [51:37.60] they can make a big commit to the cloud provider and then let people use smaller chunks of [51:40.44] it. [51:41.44] And that's kind of what we're doing is Replicate as well. [51:42.88] So we're aggregating demand such that we make big commits to the cloud providers and, you [51:46.76] know, then people can run like a 100 millisecond API request on an A100, you know. [51:51.36] Coming from a finance background, this sounds surprisingly similar to banks, where the job [51:55.88] of a bank is maturity transformation is what you call it. [51:59.28] You take short-term deposits, which technically can be withdrawn at any time, and you turn [52:02.92] that into long-term loans for mortgages and stuff, and you pocket the difference in interest. [52:07.92] And that's the bank. [52:09.40] Yeah. [52:10.40] That's exactly what we're doing. [52:11.40] So you run a bank. [52:12.40] You run a bank. [52:13.40] Right, yeah. [52:14.40] And it's so much a finance problem as well, because we have to make bets on the future. [52:18.76] You have to do forecasting. [52:19.76] On the value of GPUs. [52:20.76] Yeah. [52:21.76] What are you... [52:22.76] Okay. [52:23.76] I don't know how much you can disclose, but what are you forecasting down? [52:28.00] Up a lot? [52:29.00] Yeah. [52:30.00] Up 10X? [52:31.00] I can't really... [52:32.00] We're projecting our growth with some educated guesses about what kind of models are going [52:35.08] to come out and what kind of models these will run, you know. [52:38.08] We need to bet that, like, okay, maybe language models are getting larger. [52:40.68] So we need to, like, have GPUs without a RAM, or, like, multi-GPU nodes, or maybe models [52:45.68] are getting smaller. [52:46.68] We actually need smaller GPUs. [52:47.68] We have to make some educated guesses about that kind of stuff. [52:49.92] Yeah. [52:50.92] Speaking of which, the mixture of experts' models must be throwing a spanner into the [52:55.08] planning. [52:56.08] Not so much. [52:57.08] We've got, like, multi-node A100 machines, and multi-node H100 machines, which can run [53:01.56] there's no problem. [53:02.56] So we're set up for that. [53:03.56] Yeah. [53:04.56] Okay. [53:05.56] Right? [53:06.56] I didn't expect it to be so easy. [53:07.56] The question was that the amount of RAM per model is increasing a lot, especially on a [53:12.04] sort of per parameter basis, per active parameter basis, going from, like, mixed trial being [53:16.56] eight experts to, like, the deep-seek MOE models. [53:19.56] I don't know if you saw them being, like, 30, 60 experts, and you can see it keep going [53:25.36] up, I guess. [53:26.36] Yeah. [53:27.36] I think we might run into problems at some point, and, yeah, I don't know exactly what's [53:31.00] going on there. [53:32.00] I think something that we're finding which is kind of interesting, like, I don't know [53:35.44] this in depth. [53:36.44] You know, we're certainly seeing a lot of good results from lower-precision models. [53:39.92] So, like, you know, 90% of the performance with just, like, much less RAM required. [53:44.84] That means, like, we can run them on GPUs we have available, and it's good for customers [53:48.04] as well, because it runs faster, and, like, they want that trade-off, you know, where [53:52.62] it's just slightly worse, but, like, way faster and cheaper. [53:55.92] Do you see a lot of GPU waste in terms of people running the thing on a GPU that is, [54:00.88] like, too advanced? [54:01.88] I think we use C4 to run Whisper. [54:03.88] So we were at the bottom end of it. [54:05.80] Yeah. [54:06.80] Any thoughts? [54:07.80] I think one of the hackathons we were at, people were like, "Oh, how do I get access [54:10.36] to, like, H100?" [54:11.36] And it's like, you need to run, like, SIP on the future, and it's like, you don't need [54:15.80] H100. [54:16.80] Yeah. [54:17.80] Yeah. [54:18.80] Well, if you want low-lacency, like, sure, like, spend a lot of money on the H100, yeah, [54:22.36] we see a ton of that kind of stuff, and it's surprisingly hard to optimize these models [54:28.08] right now. [54:29.08] So a lot of people are just running, like, really unoptimized models. [54:31.72] We're doing the same, honestly. [54:32.72] Like, we're a lot of models on Replicate. [54:34.20] I've just been, like, not been optimized very well. [54:37.04] So something we want to, like, be able to help people with is optimizing those models. [54:42.64] Like, either we, you know, show people how to with guides, or we make it easier to use [54:47.40] some of these more optimized inference servers, or we show people how to compile the models, [54:53.00] or we do that automatically, or something like that. [54:55.92] But that's only something we're exploring, because there's so much wastage. [54:58.64] Like, it's not just wasting the GPUs, it's also, like, a bad experience, and the models [55:01.76] run slow, you know? [55:02.88] Right. [55:03.88] So the models on Replicate almost all pushed by our community, like, people who have pushed [55:07.44] those models themselves. [55:08.96] But like, it's like a big-headed distribution where there's, like, a long tail of lots of [55:12.52] models that people have pushed, and then, like, a big head of, like, the models most [55:16.52] people run. [55:17.52] So models like Llama 2, like Stable Diffusion, you know, we work with Meta and Stability [55:23.00] to, like, maintain those models, and we've done a ton of optimizations to make those [55:26.84] really fast. [55:28.08] So those models are optimized, but the long tail is not, and there's, like, a lot of [55:31.36] wastage there. [55:32.36] Yeah. [55:33.36] And going into the, well, it's already the new year, do you see the customer demand [55:38.08] and the GPU, like, hardware demand kind of, like, stink together? [55:41.56] Because I think a lot of people are saying, "Oh, there's, like, hundreds of thousands [55:44.64] of GPUs being shipped this year, like, the crunch is going to be over," but you also [55:48.28] have, like, millions of people that don't care about using AI. [55:50.96] You know, how do you see the two lines progressing? [55:53.08] Are you seeing customer demand is going to outpace the GPU growth? [55:57.32] Do you see them together? [55:58.52] Do you see, maybe, a lot of this, like, model improvement work kind of helping alleviate [56:04.16] that? [56:05.16] Yeah, that's a good question. [56:06.32] From our point of view, demand is not outpacing supply GPUs, like, we have enough, from our [56:10.68] point of view, we have enough GPUs to go around, but that might change for sure. [56:14.48] Yeah. [56:15.48] That's a very nicely pulled away as a startup founder to respond, I think. [56:21.52] So as you framed it more, it's, like, sort of picking the wrong box model, whereas yours [56:24.80] is more about maybe the inference stack, if you can call it. [56:28.24] Were you referencing VLM? [56:30.60] What other sort of techniques are you referencing? [56:33.04] And also keeping in mind that when I talk to your competitors, and I don't know if, we [56:37.76] don't have to name any of them, but they are working on trying to optimize the kinds of [56:42.00] models. [56:43.00] Like, they basically, they'll quantize their models for you with their special stack. [56:46.04] So you basically use their versions of Lama2, you use their versions of Mistral. [56:51.40] And that's when we need to approach it. [56:53.56] I don't see it as the replicate DNA2 do that, because that would be, like, sort of, you [56:57.76] would have to slap the replicate house brand on something, which, I mean, just comment [57:02.36] on any of that. [57:03.36] Like, what do you mean when you say optimize models? [57:04.88] Yeah, things like quantizing the models, you can imagine a way that we could help people [57:09.00] quantize their models if we want to. [57:11.24] We've had success using inference servers like VLM and TRTLM, and we're using those [57:18.08] kind of things to serve language models. [57:20.36] We've had success with things like AI templates, which compile the models, all of those kind [57:25.52] of things. [57:26.52] And there's like some, even really just boring things of just, like, making the code more [57:29.92] efficient. [57:30.92] Like some people, like when they're just writing some Python code, it's really easy to just [57:34.08] write an efficient Python code, and there's, like, really boring things like that as well. [57:38.08] But it's like a whole smash of things like that. [57:40.16] You'll do that for a customer, like, you look at their code, and we, yeah, we've certainly [57:43.96] helped some of our customers be able to do that some of the stuff, that some stuff, yeah. [57:47.20] And a lot of the models on, like the popular models on Replicate, we've rewritten them [57:51.60] to use that stuff as well. [57:53.88] And like the stable diffusion that we run, for example, is compiled for the AI template [57:57.40] to make it super fast. [57:59.44] And it's all open source that you can see all of this stuff on GitHub if you want to [58:02.68] see how we do it. [58:03.68] But you can imagine ways that we could help people, you know, it's almost like built into [58:07.28] the cog layer, maybe, where we could help people, like, use these fast inference servers [58:11.24] or use AI template to compile their models to make it faster, whether it's like manual, [58:16.48] semi-manual or automatic. [58:17.48] We're not really sure. [58:18.48] You know, that's something we want to explore because, you know, that benefits everyone. [58:21.24] And then on the competitive piece, there was a price war on mixed trial last year, this [58:25.56] last December. [58:26.56] As far as I can tell, you guys did not enter that war. [58:29.20] You have mixed trial, but, you know, it's just regular pricing. [58:32.12] I think also some of these players are probably losing money on their pricing. [58:36.68] You know, you don't have to say anything, but it's, you know, it's somewhere, the break [58:38.96] even is somewhere between 50 to 75 cents per million tokens served. [58:43.28] How are you thinking about like the, just the overall competitiveness in the market? [58:46.32] How should people choose when everyone's an API? [58:50.36] So for Lama 2 and mixed trial, I think not mixed trial, I can't remember exactly. [58:55.52] We have, you know, similar performance and similar price to some of these other services. [59:01.68] We're not like bargain basement, like to some of the others, because to your point, like, [59:06.16] we don't want to like burn tons of money, but we're, you know, pricing it sensibly and [59:10.76] sustainably to a point where we think it's, we think, you know, it's competitive with [59:14.72] other people such that we want developers using Replicate and we don't want to price [59:18.68] it such that it's like only affordable by big companies. [59:21.72] We want to make it cheap enough such that the developers can afford it, but we also [59:24.28] don't like want the super cheap prices because then it's almost like then your customers [59:29.12] are hostile, you know, and the more customers you get, the worse it gets, you know. [59:32.96] So we're pricing it sensibly, but still to the point where, you know, hopefully it's [59:37.48] cheap enough to build on and I think the thing we really care about, like we want to, obviously [59:42.08] we want, you know, models and Replicate to be comparable to other people. [59:45.84] But I think the really crucial thing about Replicate and the way I think we think about [59:49.68] it is that it's not just the API for them, particularly an open source. [59:54.20] It's not just the API for the model that is the important bit. [59:57.80] It's because quite often with open source models, like the whole point of open source [60:00.96] is that you can tinker on it and you can customize it and you can fine tune it and you can like [60:04.16] smash it together with another model like Lava, for example. [60:08.08] And you can't do that if it's just like a hosted API because it's just like, you know, [60:12.00] you can't touch the code. [60:13.10] So what we want to do with Replicate is build a platform that's actually open. [60:17.36] So like we've got all of these models where the performance and price is on par with everything [60:22.56] else. [60:23.56] But if you want to customize it, you can fine tune it. [60:25.56] You can go to GitHub and get the source code for it and edit the source code and push up [60:28.64] your own custom version and this kind of thing because that's the crucial thing for open source [60:32.96] machine learning is be able to tinker on it and customizing it. [60:35.60] And we think, we think that's really important to make open source AI work. [60:39.76] You mentioned open source. [60:41.56] How do you think about levels of openness? [60:43.64] When Lama2 came out, I wrote a post about this, about it's like open source and there's [60:48.20] open weights, then there's restricted weights. [60:51.12] It was on the front page of AgriNews, so there was like all sort of comments from people. [60:55.32] So I'm always curious to hear your thoughts. [60:57.64] Like what do you think is okay for people to license? [61:01.88] What's okay for people to not release? [61:04.00] Before, it was just like close source, big models, open source, little models, purely [61:10.36] open source stuff. [61:12.52] And we're now seeing lots of variations where model companies putting restrictive licenses [61:18.04] on their models. [61:19.56] That means it can only be used for non-commercial use and a lot of the open source crowd is [61:25.52] complaining it's not true, open source and all this kind of thing. [61:30.12] And I think a lot of that is coming from philosophy, the free software movement kind of philosophy. [61:36.78] And I don't think it's necessarily a bad thing. [61:38.56] I think it's good that model companies can make money out of their models. [61:42.08] That's like how it's well incentivized people to make more models and this kind of thing. [61:45.76] And I think it's totally fine if somebody made something to ask for some money in return [61:50.40] if you're making money out of it. [61:51.40] And I think that's totally okay. [61:53.40] I think there's some really interesting midpoints as well where people are releasing the codes. [61:56.52] You can still tinker on it, but the person who trained the model still wants to get a [62:00.88] cut of it if you're making a bunch of money out of it. [62:02.72] And I think that's good and that's going to make the ecosystem more sustainable. [62:07.00] I don't think anybody's really figured it out yet. [62:08.56] We're going to see more experimentation with this and more people try to figure out what [62:13.68] are the business models around building models and how can I make money out of this. [62:18.00] And we'll just see where it ends up. [62:19.68] And I think it's something we want to support as a replicate as well because we believe [62:24.08] in open source. [62:25.08] That's great, but there's also going to be lots of models which are close source as well. [62:30.88] And these companies might not be, there's probably going to be a long tail of a bunch [62:34.96] of people building models that don't have the reach that OpenAI have. [62:39.60] And hopefully as a replicate, we can help those people find developers and help them [62:44.60] make money in that kind of thing. [62:46.16] I think the computer requirements of AI kind of changed the thing. [62:49.48] I started an open source company. [62:51.04] I'm a big open source fan and before it was kind of man hours was really all that went [62:56.12] into open source. [62:57.12] It wasn't much monetary investment. [62:59.44] Well, not that man hours are not worth a lot. [63:03.04] But if you think about Llama 2, it's like $25 million, you know, like all in. [63:07.96] It's like you can't just spin up a discord and like spend $25 million. [63:11.68] So I think it's not positive for everybody that Llama 2 is open source. [63:15.88] And well, it's the open source, you know, it's the open source term. [63:19.36] I think people like you're saying it's like they kind of argue on the semantics of it. [63:24.36] But like all we care about is that Llama 2 is open because if Llama 2 was an open source [63:29.00] today, like that if Mr. All was not open source, we will be in a bad spot, you know. [63:33.44] So. [63:34.44] And I think the nuance here is making sure that these models are still tinkerable because [63:38.68] the beautiful thing about Llama 2 as a base model is that, yeah, it costs $25 million to [63:43.60] train to start with, but then you can fine tune it for like $50. [63:48.76] And that's what's so beautiful about the open source ecosystem and something I think is really [63:53.00] surprising as well. [63:54.00] Like completely surprised me, like I think a lot of people assumed that it's not going [63:58.48] to be open source machine learning is just not going to be practical because it's so [64:01.88] expensive to train these models, but like fine tuning is unreasonably effective and [64:06.42] people are getting really good results out of it and it's really cheap. [64:09.16] So people can effectively create open source models really cheaply and there's going to [64:14.16] be like this sort of ecosystem of tons of models being made and I think the risk there [64:19.12] from a licensing point of view is we need to make sure that the licenses let people do [64:22.68] that because if you release a big model under a non-commercial license and people can't [64:27.52] fine tune it, you've lost the magic of it being open. [64:30.96] And I'm sure there are ways to structure that such that the person paying $25 million feels [64:35.68] like they're compensated somehow and they can feel like they can, you know, they should [64:39.76] keep on training models and people can keep on fine tuning it, but I guess we just have [64:43.76] to figure out exactly how that plays out. [64:46.08] Excellent. [64:47.08] So just wanted to round it out that you've been excellent and very open. [64:51.04] I should have started my intro with this, but I feel like you found the sort of AI engineer [64:55.04] crew before I did and, you know, something I really resonated with you in sort of the [65:00.36] series B announcement was that you put in some stats here about how there are two orders [65:04.80] of magnitude more software engineers than there are machine learning engineers, about [65:07.68] 30 million software engineers and 500,000 machine learning engineers. [65:11.24] You can maybe plus, minus one of those orders of magnitude, but it's around that ballpark. [65:14.76] And so obviously there will be a lot more AI engineers than there will be ML engineers. [65:19.24] How do you see this group? [65:21.36] Like is it all software engineers? [65:23.32] Are they going to specialize? [65:25.80] What would you advise someone trying to become an AI engineer? [65:29.16] Is this a legitimate career path? [65:30.92] Yeah, absolutely. [65:31.92] I mean, it's very clear that AI is going to be a large part of how we build software in [65:37.04] the future now. [65:38.64] It's a bit like being a software developer in the nineties and ignoring the internet, [65:42.68] you know, you just need to, you need to learn about this stuff and you need to figure this [65:46.24] stuff out. [65:47.24] I don't think it needs to be super low level. [65:50.80] You don't need to be like, you know, the metaphor here is like, you don't need to be digging [65:55.08] it down into like this sort of pytorch level if you don't want to, in the same way as a [66:01.24] software engineer in the nineties, you don't need to be like understanding how network [66:04.24] stacks work to be able to build a website, you know, but you need to understand the shape [66:07.12] of this thing and how to hold it and what it's good at and what it's not. [66:10.68] And that's really important. [66:12.84] So yeah, certainly just advise people to like just start playing around with it, get a feel [66:17.20] of like how language models work, get a feel of like how these diffusion models work, get [66:22.44] a feel of like what fine tuning is and how it works because some of your job might be [66:28.08] building data sets, you know, get a feeling of how prompting works because some of your [66:31.08] job might be writing a prompt. [66:33.20] And those are just all really important skills to sort of figure out. [66:36.98] Yeah. [66:37.98] Well, thanks for building the definitive platform for doing all that. [66:41.08] Yeah, of course. [66:42.60] Any final call to actions, who should come work at Replicate anything for the audience? [66:47.40] Yeah. [66:48.40] Well, I mean, we're hiring a few click on jobs at the bottom of our Replicate.com. [66:53.44] There's some jobs. [66:55.12] And I don't think I would use it like just like try out AI even if you don't even if [67:00.12] you think you're not smart enough. [67:01.12] Like the whole reason I started this company is because I was looking at the cool stuff [67:03.88] that Andreas was making. [67:04.88] Like Andreas is like a proper machine learning person with a PhD, you know, and I was like [67:08.44] just like, you know, a sort of lowly software engineer. [67:11.72] I was like, you're doing really cool stuff and I want to be able to do that. [67:15.08] And by us working together, you know, we've now made it accessible to dummies like me and [67:19.92] just encourage anyone who's like wants to try this stuff out, just give it a try. [67:24.24] I would also encourage people who are tool builders, like the limiting factor now on [67:28.08] AI is not like the technology like technologies made incredible advances. [67:32.40] And there's just so many incredible machine learning models that can do a ton of stuff. [67:37.64] The limiting factor is just like making that accessible to people who build products because [67:42.00] it's really hard to use this stuff right now. [67:44.48] And obviously we're building some of that stuff as Replicate, but there's just like [67:46.80] a ton of other tooling and abstraction abstractions that need to be built out to make this stuff [67:50.44] usable. [67:51.44] So I just encourage people who like like building developer tools to just like get stuck into [67:55.56] it as well. [67:56.56] Because that's going to make this stuff accessible to everyone. [67:58.84] Yeah. [67:59.84] I especially want to highlight you have a hacker and residence job opening available, which [68:03.32] not every company has, which means just join you and hack stuff. [68:07.32] I think Charlie Halt is doing a fantastic job with that. [68:09.68] Yep. [68:10.68] Effectively, like most of our, a lot of our job is just like showing people how to use [68:15.40] AI. [68:16.40] So we've just got a team of like software developers and people have kind of figured [68:18.52] this stuff out who are writing about it, who are making videos about it or making example [68:23.72] applications to like show people what you can do with this stuff. [68:26.12] Yeah. [68:27.12] In my world that used to be called DevRel, but now it's hacker and residence. [68:31.28] This came from Zeke is another one of our hackers. [68:38.32] Tell me this came from Chroma cause I'll just start that one. [68:41.28] We developed, like they answered actually was like, Hey, we came up with that first. [68:45.44] But I think we came up with it independently because the story behind this is we originally [68:51.68] called it the DevRel team and DevRel is cursed now. [68:55.28] Zeke was like, that sounds so boring. [68:58.92] I'm going to say I'm a developer relations person or developer advocate or something. [69:05.44] So we're like, okay, what's the like, the way we can make this sound the most fun? [69:08.84] All right. [69:09.84] You're right. [69:10.84] I would say like that, that is consistently the vibe I get from replicate everyone on your [69:14.12] team. [69:15.12] I interact with when I go to your San Francisco office, like that's the vibe that you're [69:18.84] generating. [69:19.84] It's a hacker space more than an office and you hold a fantastic meetup that's meetups [69:23.32] there and I think you're really positive presence in our community. [69:25.88] So thank you for doing all that. [69:27.60] And it's instilling the hacker vibe and culture into AI. [69:31.28] I'm really glad that. [69:32.28] I'm really glad that's working. [69:33.28] Cool. [69:34.28] That's a wrap I think. [69:35.28] Thank you so much for coming on man. [69:36.28] Yeah, of course. [69:37.28] Thank you. [69:38.28] Bye. [69:53.56] Bye. [69:54.56] Bye. [69:55.56] Bye. [69:56.56] Bye. [69:57.56] Bye. [69:58.56] Bye. [69:59.56] Bye. [70:00.56] [BLANK_AUDIO]