[by:whisper.cpp] [00:00.00](音乐) [00:06.00]大家好 欢迎到Lit and Space Podcast [00:08.40]这是Alessio 和CTO的计划人士 和我参加的计划人士 [00:11.80]我参加了麦克欧的计划 专门 邱小雅 [00:15.00]今天我们回到工作室了 [00:17.20]和Andreas 和 卢安 欢迎你 [00:20.20]谢谢 太好了 谢谢你 [00:22.40]我会介绍你分别的 但也希望你会更多学习 [00:27.40]So Andreas it looks like you started Alicit first and joined later [00:32.40]That's right [00:33.00]For all intents and purposes, the illicit and also the odd that existed before then were very different from what I started [00:39.60]So I think it's like fair to say that you co-funded it [00:42.60]Got it [00:43.00]And Joanne you're a co-founder and COO of Alicit now [00:46.20]Yeah that's right [00:47.00]So there's a little bit of a history to this [00:48.80]I'm not super aware of like the sort of journey [00:51.80]I was aware of odd and illicit as sort of a non-profit type situation [00:55.80]And recently you turned into like a public benefit corporation [00:59.40]So yeah maybe if you want you could take us through that journey of finding the problem [01:04.00]You know obviously you're working together now [01:06.20]So like how do you get together to decide to leave your startup career to join him [01:11.20]Yeah it's truly a very long journey [01:12.80]I guess truly it kind of started in Germany when I was born [01:17.20]So even as a kid I was always interested in AI [01:20.00]Like I kind of went to the library [01:21.40]There were books about how to write programs in QBasic [01:24.20]And like some of them talked about how to implement chatbots [01:27.20]And to be clear [01:28.80]He grew up in like a tiny village on the outskirts of Munich called Dinkelscherbin [01:33.20]Where it's like a very very idyllic German village [01:36.20]Yeah important to the story [01:38.40]So basically the main thing is I've kind of always been thinking about AI my entire life [01:42.80]And been thinking about at some point this is going to be a huge deal [01:46.00]It's going to be transformative [01:47.00]How can I work on it [01:48.20]And was thinking about it from when I was a teenager [01:51.60]After high school did a year where I started a startup with the intention to become rich [01:56.80]And then once I'm rich I can affect the trajectory of AI [02:00.40]Did not become rich [02:01.40]Decided to go back to college [02:03.00]And study cognitive science there [02:05.00]Which was like the closest thing I could find at the time to AI [02:08.00]In the last year of college moved to the US to do a PhD at MIT [02:12.60]Working on broadly kind of new programming languages for AI [02:15.00]Because it kind of seemed like the existing languages were not great at expressing [02:19.60]World models and learning world models during Bayesian inference [02:22.60]Was obviously thinking about ultimately the goal is to actually build tools that help people reason more clearly [02:27.60]Ask and answer better questions and make better decisions [02:31.60]But for a long time it seemed like the technology to put reasoning in machines just wasn't there [02:35.60]Initially at the end of my postdoc at Stanford was thinking about well what to do [02:39.60]I think the standard path is you become an academic and do research [02:43.60]But it's really hard to actually build interesting tools as an academic [02:48.60]You can't really hire great engineers [02:50.60]Everything is kind of on a paper-to-paper timeline [02:53.60]And so I was like well maybe I should start a startup [02:56.60]Pursuit that for a little bit [02:57.60]But it seemed like it was too early because you could have tried to do an AI startup [03:01.60]But probably would not have been this kind of AI startup we're seeing now [03:05.60]So then decided to just start a non-profit research lab [03:08.60]That's going to do research for a while until we better figure out how to do thinking in machines [03:13.60]And that was odd [03:14.60]And then over time it became clear how to actually build actual tools for reasoning [03:19.60]Then only over time we developed a better way to [03:23.60]I'll let you fill in some of the details here [03:25.60]Yeah so I guess my story maybe starts around 2015 [03:29.60]I kind of wanted to be a founder for a long time [03:31.60]And I wanted to work on an idea that stood the test of time for me [03:34.60]Like an idea that stuck with me for a long time [03:37.60]And starting in 2015 [03:38.60]Actually originally I became interested in AI based tools from the perspective of mental health [03:43.60]So there are a bunch of people around me who are really struggling [03:45.60]One really close friend in particular is really struggling with mental health [03:48.60]And didn't have any support [03:50.60]And it didn't feel like there was anything before kind of like getting hospitalized [03:54.60]That could just help her [03:56.60]And so luckily she came and stayed with me for a while [03:58.60]And we were just able to talk through some things [04:00.60]But it seemed like you know lots of people might not have that resource [04:04.60]And something maybe AI enabled could be much more scalable [04:07.60]I didn't feel ready to start a company then [04:10.60]That's 2015 [04:11.60]And I also didn't feel like the technology was ready [04:13.60]So then I went into fintech [04:15.60]And like kind of learned how to do the tech thing [04:17.60]And then in 2019 [04:18.60]I felt like it was time for me to just jump in [04:21.60]And build something on my own [04:22.60]I really wanted to create [04:24.60]And at the time I looked around at tech [04:26.60]And felt like not super inspired by the options [04:28.60]I just I didn't want to have a tech career ladder [04:31.60]Or like I didn't want to like climb the career ladder [04:33.60]There are two kind of interesting technologies at the time [04:35.60]There was AI and there was crypto [04:37.60]And I was like well the AI people seemed like a little bit more nice [04:41.60]And maybe like slightly more trustworthy [04:44.60]Both super exciting [04:45.60]But through my bed and on the AI side [04:47.60]And then I got connected to Andreas [04:49.60]And actually the way he was thinking about [04:51.60]Pursuing the research agenda at AUT [04:53.60]Was really compatible with what I had envisioned [04:56.60]For an ideal AI product [04:58.60]Something that helps kind of take down [05:00.60]Really complex thinking [05:01.60]Overwhelming thoughts [05:02.60]And breaks it down into small pieces [05:04.60]And then this kind of mission [05:05.60]We need AI to help us figure out [05:07.60]What we ought to do [05:08.60]It was really inspiring, right? [05:10.60]Yeah, because I think it was clear [05:12.60]That we were building the most powerful [05:14.60]Optimizer of our time [05:16.60]But as a society [05:17.60]We hadn't figured out [05:18.60]How to direct that optimization potential [05:21.60]And if you kind of direct tremendous [05:23.60]Optimization potential at the wrong thing [05:25.60]That's really disastrous [05:26.60]So the goal of AUT was [05:28.60]Make sure that if we build [05:29.60]The most transformative technology of our lifetime [05:31.60]It can be used for something really impactful [05:34.60]And that's really good reasoning [05:35.60]Like not just generating ads [05:37.60]My background was in marketing [05:38.60]But like so [05:39.60]It's like I want to do [05:40.60]More than generate ads with this [05:42.60]And also if these AI systems [05:44.60]Get to be super intelligent enough [05:46.60]That they are doing this [05:47.60]Really complex reasoning [05:48.60]That we can trust them [05:49.60]That they are aligned with us [05:51.60]And we have ways of evaluating [05:53.60]That they are doing the right thing [05:54.60]So that's what AUT did [05:55.60]We did a lot of experiments [05:56.60]You know, like Andreas said [05:57.60]Before foundation models [05:59.60]Really like took off [06:00.60]A lot of the issues we were seeing [06:01.60]Were more in reinforcement learning [06:03.60]But we saw a future [06:04.60]Where AI would be able to do [06:06.60]More kind of logical reasoning [06:08.60]Not just kind of extrapolate [06:09.60]From numerical trends [06:10.60]We actually kind of [06:11.60]Set up experiments with people [06:13.60]Where kind of people stood in [06:14.60]As super intelligent systems [06:16.60]And we effectively gave them [06:17.60]Context windows [06:18.60]So they would have to [06:19.60]Like read a bunch of text [06:20.60]And one person would get less text [06:23.60]And one person would get all the text [06:24.60]And the person with less text [06:26.60]Would have to evaluate the work [06:28.60]Of the person who could read much more [06:30.60]So like in the world [06:31.60]We were basically simulating [06:32.60]Like in, you know, 2018-2019 [06:34.60]A world where an AI system [06:36.60]Could read significantly more than you [06:38.60]And you as the person [06:39.60]Who couldn't read that much [06:40.60]Had to evaluate the work [06:41.60]Of the AI system [06:42.60]So there's a lot of the work we did [06:44.60]And from that we kind of [06:45.60]Iterated on the idea [06:46.60]Of breaking complex tasks down [06:47.60]Into smaller tasks [06:48.60]Like complex tasks [06:49.60]Like open-ended reasoning [06:51.60]Logical reasoning [06:52.60]Into smaller tasks [06:53.60]So that it's easier [06:54.60]To train AI systems on them [06:55.60]And also so that it's easier [06:57.60]To evaluate the work of the AI system [06:59.60]When it's done [07:00.60]And then also kind of [07:01.60]We really pioneered this idea [07:02.60]The importance of supervising [07:03.60]The process of AI systems [07:05.60]Not just the outcomes [07:06.60]And so a big part [07:07.60]Of how elicit is built [07:08.60]Is we're very intentional [07:10.60]About not just throwing [07:11.60]A ton of data into a model [07:13.60]And training it [07:14.60]And then saying cool [07:15.60]Here's like scientific output [07:16.60]Like that's not at all [07:17.60]What we do [07:18.60]Our approach is very much [07:19.60]Like what are the steps [07:20.60]That an expert human does [07:21.60]Or what is like an ideal process [07:23.60]As granularly as possible [07:25.60]Let's break that down [07:26.60]And then train AI systems [07:27.60]To perform each of those steps [07:29.60]Very robustly [07:30.60]When you train like that [07:32.60]From the start [07:33.60]After the fact [07:34.60]It's much easier to evaluate [07:35.60]It's much easier to troubleshoot [07:36.60]At each point [07:37.60]Like where did something break down [07:38.60]So yeah [07:39.60]We were working on those experiments [07:40.60]For a while [07:41.60]And then at the start of 2021 [07:43.60]Decided to build a product [07:44.60]Do you mind if I [07:45.60]Because I think you're about [07:46.60]To go into more modern [07:47.60]Hot and elicit [07:49.60]And I just wanted to [07:50.60]Because I think a lot of people [07:51.60]Are in where you were [07:53.60]Like sort of 2018-19 [07:55.60]Where you chose a partner [07:57.60]To work with [07:58.60]And you didn't know him [07:59.60]Yeah yeah [08:00.60]You were just kind of cold introduced [08:01.60]Yep [08:02.60]A lot of people are cold introduced [08:03.60]I've been cold introduced [08:04.60]To tons of people [08:05.60]And I never work with them [08:06.60]I assume you had a lot [08:07.60]A lot of other options [08:08.60]Like how do you advise [08:09.60]People to make those choices [08:10.60]We were not totally cold introduced [08:12.60]So one of our closest friends [08:13.60]Introduced us [08:14.60]And then Andreas had written a lot [08:16.60]On the website [08:17.60]A lot of blog posts [08:18.60]A lot of publications [08:19.60]And I just read it [08:20.60]And I was like, wow [08:21.60]This sounds like my writing [08:22.60]And even other people [08:23.60]Some of my closest friends [08:24.60]I asked for advice from [08:25.60]They were like, oh [08:26.60]This sounds like your writing [08:28.60]But I think [08:29.60]I also had some kind of [08:30.60]Like things I was looking for [08:31.60]I wanted someone [08:32.60]With a complimentary skill set [08:33.60]I want someone [08:34.60]Who was very values aligned [08:36.60]And yeah [08:37.60]That was all a good fit [08:38.60]We also did a pretty [08:40.60]Lengthy mutual evaluation process [08:42.60]Where we had a Google doc [08:43.60]Where we had all kinds of questions [08:45.60]For each other [08:46.60]And I think it ended up being [08:48.60]Round 50 pages or so [08:49.60]Off like various questions [08:51.60]Was it the YC list? [08:53.60]There's some lists going around [08:54.60]For co-founder questions [08:55.60]No, we just made our own [08:57.60]But I guess it's probably related [08:59.60]And that you asked yourself [09:00.60]What are the values you care about [09:01.60]How would you approach [09:02.60]Various decisions [09:03.60]And things like that [09:04.60]I shared like all of my past [09:05.60]Performance reviews [09:06.60]Yeah [09:07.60]Yeah [09:08.60]And he never had any [09:09.60]No [09:10.60]Yeah, sorry [09:14.60]I just had to [09:15.60]A lot of people are going through [09:16.60]That phase [09:17.60]And you kind of skipped over it [09:18.60]I was like, no, no, no [09:19.60]There's like an interesting story [09:20.60]Yeah [09:21.60]Before we jump into what it is [09:22.60]It is today [09:23.60]The history is a bit [09:24.60]Cutter intuitive [09:25.60]So you start [09:26.60]Now, oh, if we had [09:27.60]A super powerful model [09:29.60]How we align it [09:30.60]How we use it [09:31.60]But then you were actually [09:32.60]Like, well, let's just build [09:33.60]The product so that people [09:34.60]Can actually leverage it [09:35.60]And I think there are [09:36.60]A lot of folks today [09:37.60]That are now back [09:38.60]To where you were [09:39.60]Maybe five years ago [09:40.60]They're like, oh, what if [09:41.60]This happens rather than [09:42.60]Focusing on actually building [09:43.60]Something useful with it [09:45.60]What click for you [09:46.60]To like move into a list [09:47.60]And then we can cover [09:48.60]That story too [09:49.60]I think in many ways [09:50.60]The approach is still the same [09:51.60]Because the way we're [09:52.60]Building a list is not [09:54.60]Let's train a foundation model [09:55.60]To do more stuff [09:56.60]It's like [09:57.60]Let's build a scaffolding [09:58.60]Such that we can [09:59.60]Deploy powerful models [10:00.60]To good ends [10:01.60]I think it's different [10:02.60]Now in that [10:03.60]We actually have [10:04.60]Like some of the models to plug in [10:05.60]But if in 2017 [10:06.60]We had had the models [10:08.60]We could have run [10:09.60]The same experiments [10:10.60]We did run with humans [10:11.60]Back then [10:12.60]Just with models [10:13.60]And so in many ways [10:14.60]Our philosophy is always [10:15.60]Let's think add to the future [10:16.60]What models are going to exist [10:17.60]In one, two years [10:19.60]Or longer [10:20.60]And how can we make it [10:22.60]So that they can [10:23.60]Actually be deployed [10:24.60]In many transparent [10:25.60]Controllable ways [10:26.60]Yeah, I think [10:27.60]Motivationally we both [10:28.60]Are kind of [10:29.60]Product people at heart [10:30.60]The research was [10:31.60]Really important [10:32.60]And it didn't [10:33.60]Make sense to build [10:34.60]A product at that time [10:35.60]But at the end of the day [10:36.60]The thing that always [10:37.60]Motivated us is [10:38.60]Imagining a world [10:39.60]Where high quality [10:40.60]Reasoning is really abundant [10:41.60]And AI is a technology [10:43.60]That's going to get us there [10:44.60]And there's a way [10:45.60]To guide that technology [10:46.60]With research [10:47.60]But you can have [10:48.60]A more direct effect [10:49.60]Through product [10:50.60]Because with research [10:51.60]You publish the research [10:52.60]And someone else [10:53.60]Product felt [10:54.60]Like a more direct path [10:55.60]And we wanted to [10:56.60]Concretely have an impact [10:57.60]On people's lives [10:58.60]Yeah, I think [10:59.60]The kind of personally [11:00.60]The motivation was [11:01.60]We want to build [11:02.60]For people [11:03.60]Yep, and then [11:04.60]Just to recap as well [11:05.60]Like the models [11:06.60]You're using back then were [11:07.60]Like, I don't know [11:08.60]With the like BERT type stuff [11:10.60]Or T5 or [11:12.60]I don't know what time frame [11:13.60]We're talking about here [11:14.60]I guess to be clear [11:15.60]At the very beginning [11:16.60]We had humans do the work [11:18.60]And then I think [11:19.60]The first models [11:20.60]That kind of makes sense [11:21.60]Or GPT-2 [11:22.60]And TNLG [11:23.60]And early generative models [11:25.60]We do [11:26.60]We also use [11:27.60]Like T5 based models [11:28.60]Even now [11:29.60]Started with GPT-2 [11:30.60]Yeah, cool [11:31.60]I'm just kind of curious about [11:32.60]Like how do you [11:33.60]Start so early [11:34.60]Like now it's obvious [11:35.60]Where to start [11:36.60]But back then it wasn't [11:37.60]Yeah, I used to [11:38.60]Nag Andreas a lot [11:39.60]I was like [11:40.60]Why are you [11:41.60]Talking to this? [11:42.60]I don't know [11:43.60]I felt like [11:44.60]GPT-2 is like [11:45.60]Clearly can't do anything [11:46.60]And I was like [11:47.60]Andreas, you're wasting your time [11:48.60]Like playing with this toy [11:49.60]But yeah, it was right [11:50.60]So what's the history [11:51.60]Of what Elisit [11:52.60]Actually does as a product [11:53.60]You recently announced that [11:55.60]After four months [11:56.60]You get to a million of revenue [11:57.60]Obviously a lot of people [11:58.60]Use it, get a lot of value [11:59.60]But it would [12:00.60]Initially kind of like [12:01.60]Structured data [12:02.60]Instruction from papers [12:03.60]Then you had [12:04.60]Kind of like concept grouping [12:05.60]And today it's maybe [12:06.60]Like a more full stack [12:07.60]Research enabler [12:09.60]Kind of like paper [12:10.60]Understand their platform [12:11.60]What's the definitive definition [12:13.60]Of what Elisit is [12:14.60]And how did you get here [12:15.60]Yeah, we say Elisit [12:16.60]As an AI research assistant [12:17.60]I think it will continue [12:18.60]To evolve [12:19.60]You know, we're so excited [12:20.60]About building and research [12:21.60]Because there's just so much space [12:22.60]I think the current phase [12:23.60]We're in right now [12:24.60]We talk about it [12:25.60]As really trying to make Elisit [12:27.60]The best place to understand [12:28.60]What is known [12:29.60]So it's all a lot about like [12:31.60]Literature summarization [12:32.60]There's a ton of information [12:33.60]That the world already knows [12:34.60]It's really hard to navigate [12:35.60]Hard to make it relevant [12:37.60]So a lot of it is around [12:38.60]Document discovery [12:39.60]And processing and analysis [12:41.60]I really kind of want to [12:42.60]Import some of the incredible [12:44.60]Productivity improvements [12:45.60]We've seen in software engineering [12:47.60]And data science [12:48.60]And into research [12:49.60]So it's like [12:50.60]How can we make researchers [12:51.60]Like data scientists of text [12:53.60]That's why we're launching [12:54.60]This new set of features [12:55.60]Called notebooks [12:56.60]It's very much inspired [12:57.60]By computational notebooks [12:58.60]Like Jupyter notebooks [12:59.60]Deep note or colab [13:01.60]Because they're so powerful [13:02.60]And so flexible [13:03.60]And ultimately [13:04.60]When people are trying [13:05.60]To get to an answer [13:07.60]Or understand insight [13:08.60]They're kind of like [13:09.60]Manipulating evidence [13:10.60]And information [13:11.60]Today that's all packaged [13:12.60]In PDFs [13:13.60]Which are super brittle [13:14.60]But with language models [13:15.60]We can decompose [13:16.60]These PDFs [13:17.60]And then we can [13:18.60]Interly claims [13:19.60]And evidence [13:20.60]And insights [13:21.60]And then let researchers [13:22.60]Mash them up together [13:23.60]Remix them [13:24.60]And analyze them together [13:25.60]So yeah [13:26.60]I would say quite simply [13:27.60]Overall listed [13:28.60]As an AI research assistant [13:29.60]Right now we're focused [13:30.60]On text based workflows [13:32.60]But long term [13:33.60]Really want to kind of [13:34.60]Go further and further [13:35.60]Into reasoning [13:36.60]And decision making [13:37.60]And when you say [13:38.60]AI research assistant [13:39.60]This is kind of [13:40.60]Matter research [13:41.60]So researchers [13:42.60]Use a list [13:43.60]As a research assistant [13:44.60]It's not a generic [13:45.60]You can research [13:46.60]Or it could be [13:47.60]But what are people [13:48.60]Using it for today [13:49.60]So specifically in science [13:51.60]A lot of people use [13:52.60]Human research assistants [13:53.60]To do things [13:54.60]You tell your grad student [13:56.60]Here are a couple of papers [13:57.60]Can you look at [13:58.60]All of these [13:59.60]See which of these [14:00.60]Have kind of sufficiently [14:01.60]Large populations [14:02.60]And actually study [14:03.60]The disease that [14:04.60]I'm interested in [14:05.60]And then write out [14:06.60]Like what are the experiments [14:07.60]They did [14:08.60]What are the interventions [14:09.60]They did [14:10.60]What are the outcomes [14:11.60]And kind of organize [14:12.60]That for me [14:13.60]And the first phase [14:14.60]Of understanding [14:15.60]This is on [14:16.60]Automating that work flow [14:17.60]Because a lot of that work [14:18.60]Is pretty road work [14:19.60]I think it's not [14:20.60]The kind of thing [14:21.60]That we need humans to do [14:22.60]Language models can do it [14:23.60]And then if [14:24.60]Language models can do it [14:25.60]That you can obviously [14:26.60]Scale it up [14:27.60]Much more than a grad student [14:28.60]Or undergrad [14:29.60]Research assistant [14:30.60]Would be able to do [14:31.60]Yeah the use cases [14:32.60]Are pretty broad [14:33.60]So we do have [14:34.60]A very large [14:35.60]Percent of our users [14:36.60]Are just using it personally [14:37.60]Or for a mix [14:38.60]Of personal and professional [14:39.60]Things [14:40.60]People who care a lot [14:41.60]About health [14:42.60]Or biohacking [14:43.60]Or parents [14:44.60]Or disease [14:45.60]Or want to understand [14:46.60]The literature directly [14:47.60]So there is an [14:48.60]Individual consumer use [14:49.60]Case [14:50.60]We're most focused [14:51.60]On the power users [14:52.60]So that's where [14:53.60]We're really excited [14:54.60]To build [14:55.60]So Lisit was [14:56.60]Very much inspired [14:57.60]By this work flow [14:58.60]In literature [14:59.60]Called systematic reviews [15:00.60]Or meta analysis [15:01.60]Which is basically [15:02.60]The human state [15:03.60]Of the art [15:04.60]For summarizing [15:05.60]Scientific literature [15:06.60]It typically involves [15:07.60]Like five people [15:08.60]Working together [15:09.60]For over a year [15:10.60]And they kind of [15:11.60]First start by trying [15:12.60]To find the maximally [15:13.60]First possible [15:14.60]So it's like [15:15.60]Ten thousand papers [15:16.60]And they kind of [15:17.60]Systematically narrow [15:18.60]That down to like [15:19.60]Hundreds or fifty [15:20.60]Extract key details [15:22.60]From every single paper [15:23.60]Usually have two people [15:24.60]Doing it [15:25.60]Like a third person [15:26.60]Reviewing it [15:27.60]So it's like [15:28.60]Incredibly laborious [15:29.60]Time-consuming process [15:30.60]But you see it [15:31.60]In every single domain [15:32.60]So in science [15:33.60]In machine learning [15:34.60]In policy [15:35.60]Because it's so structured [15:36.60]And designed to be reproducible [15:37.60]It's really amenable [15:38.60]To automation [15:39.60]So it's kind of [15:40.60]The workflow that we want [15:41.60]To automate first [15:42.60]It's accessible [15:43.60]For any question [15:44.60]And make [15:45.60]You know kind of [15:46.60]These really robust [15:47.60]Living summaries of science [15:48.60]So yeah [15:48.60]It's one of the [15:49.60]Workflows that we're [15:50.60]Starting with [15:51.60]Our previous guest [15:52.60]Mike Conover [15:53.60]He's building a new [15:54.60]Company got BrightWave [15:55.60]Which is an AI [15:56.60]Research assistant [15:57.60]For financial research [15:58.60]How do you see [15:59.60]The future of these tools [16:00.60]Like does everything [16:01.60]Converged [16:02.60]Like a God researcher [16:03.60]Assisted [16:04.60]Or is every domain [16:05.60]Gone to have its own thing [16:06.60]I think that's a good [16:07.60]And mostly open question [16:09.60]I do think there are [16:10.60]Some differences [16:11.60]Data analysis [16:12.60]And other research [16:13.60]Is more high-level [16:15.60]Cross-domain thinking [16:16.60]And we definitely [16:17.60]Want to contribute to [16:18.60]The broad [16:19.60]Generalist reasoning type [16:20.60]Space like if [16:21.60]Researchers are [16:22.60]Making discoveries often [16:23.60]It's like hey [16:24.60]This thing in biology [16:25.60]Is actually analogous to [16:26.60]Like these equations [16:27.60]In economics or something [16:28.60]And that's just [16:29.60]Fundamentally a thing [16:30.60]That where you need [16:31.60]To reason across domains [16:32.60]At least within research [16:33.60]I think there will be [16:34.60]Like one best platform [16:36.60]More or less [16:37.60]For this type of [16:38.60]Generalist research [16:39.60]I think there may still be [16:40.60]Tools like for genomics [16:41.60]Like particular types [16:42.60]Of modules [16:43.60]Of genes [16:44.60]And proteins [16:45.60]And whatnot [16:46.60]But for a lot of [16:47.60]The kind of high-level reasoning [16:48.60]That humans do [16:49.60]I think that is [16:50.60]A more open or type [16:51.60]All thing [16:52.60]I wanted to ask [16:53.60]A little bit deeper about [16:54.60]I guess the workflow [16:55.60]That you mentioned [16:56.60]I like that phrase [16:57.60]I see that [16:58.60]In your UI now [16:59.60]But that's [17:00.60]As it is today [17:01.60]And I think you were [17:02.60]About to tell us about [17:03.60]How it was in 2021 [17:04.60]And how it maybe progressed [17:05.60]How has this workflow [17:06.60]Evolved over time [17:07.60]So the very first [17:08.60]Version of illicit [17:09.60]In the research assistant [17:10.60]It was a forecasting assistant [17:12.60]So we set out [17:13.60]And we were thinking about [17:14.60]What are some of the most [17:15.60]Impactful types of reasoning [17:16.60]That if we could scale up [17:17.60]AI would really transform [17:18.60]The world [17:19.60]And we actually started [17:20.60]With literature review [17:21.60]But we're like [17:22.60]So many people are going to build [17:23.60]Literature review tools [17:24.60]So let's start there [17:25.60]So then we focused [17:26.60]On geopolitical forecasting [17:27.60]So I don't know [17:28.60]If you're familiar [17:29.60]With like manifold or [17:30.60]Manifold markets [17:31.60]Yeah, that kind of stuff [17:32.60]Before manifold [17:33.60]Yeah, yeah [17:34.60]I'm not predicting relationships [17:35.60]We're predicting like [17:36.60]Is China going to invade Taiwan? [17:38.60]Yeah [17:39.60]That's in a relationship [17:40.60]Yeah, that's fair [17:41.60]Yeah, it's true [17:42.60]And then we worked [17:43.60]On that for a while [17:44.60]And then after GPT-3 [17:45.60] came out [17:46.60]I think by that time [17:47.60]We realized that [17:48.60]Originally we were trying [17:49.60]To help people convert [17:50.60]Their beliefs into [17:51.60]Probability distributions [17:53.60]So take fuzzy beliefs [17:54.60]But like model them [17:55.60]More concretely [17:56.60]And then after a few months [17:57.60]Of iterating on that [17:58.60]Just realize the thing [17:59.60]That's blocking people [18:00.60]From making [18:01.60]Interesting predictions [18:02.60]About important events [18:03.60]In the world [18:04.60]Is less kind of [18:05.60]On the probabilistic side [18:06.60]And much more [18:07.60]Research side [18:08.60]And so that kind [18:09.60]Of combined with [18:10.60]The very generalist [18:11.60]Capabilities of GPT-3 [18:12.60]Prompted us to [18:13.60]Make a more general [18:14.60]Research assistant [18:15.60]Then we spent [18:16.60]A few months iterating [18:17.60]On what even is [18:18.60]A research assistant [18:19.60]So we would embed [18:20.60]With different researchers [18:21.60]We built data labeling [18:23.60]Workflows in the beginning [18:24.60]Kind of right off the bat [18:25.60]We built ways to find [18:27.60]Experts in a field [18:29.60]And like ways to ask [18:30.60]Good research questions [18:31.60]We just kind of [18:32.60]Iterated through a lot [18:33.60]Of workflows and no one else [18:34.60]Was really building at this [18:35.60]Time and it was like [18:36.60]Let's do some prompt [18:37.60]Engineering and see [18:38.60]Like what is a task [18:39.60]That is at the [18:40.60]Intersection of what's [18:41.60]Technologically capable [18:42.60]And like important [18:43.60]For researchers [18:44.60]And we had like [18:45.60]A very nondescript [18:46.60]Landing page [18:47.60]It said nothing [18:48.60]But somehow people were [18:49.60]Signing up and we had [18:50.60]The sign of form [18:51.60]That was like [18:52.60]Why are you here [18:53.60]And everyone was like [18:54.60]I need help [18:55.60]With literature review [18:56.60]And we're like [18:57.60]A literature review [18:58.60]That sounds so hard [18:59.60]I don't even know [19:00.60]What that means [19:01.60]We don't want to work on it [19:02.60]But then eventually [19:03.60]We're like [19:04.60]Everyone is saying [19:05.60]Yeah [19:06.60]And we also kind of [19:07.60]Personally knew literature [19:08.60]Review was hard [19:09.60]And if you look at the graphs [19:10.60]For academic literature [19:11.60]Being published every [19:12.60]Single month you guys [19:13.60]Know this in machine learning [19:14.60]It's like up into the right [19:15.60]Like superhuman amounts [19:16.60]Of papers [19:17.60]So we're like [19:18.60]All right, let's just try it [19:19.60]I was really nervous [19:20.60]But Andres was like [19:21.60]This is kind of like [19:22.60]The right problem space [19:23.60]To jump into [19:24.60]Even if we don't [19:25.60]Know what we're doing [19:26.60]So my take was like [19:27.60]Fine [19:28.60]This feels really scary [19:29.60]But let's just launch [19:30.60]A feature every single week [19:31.60]And double our user [19:32.60]Numbers every month [19:33.60]And if we can do that [19:34.60]We will find something [19:35.60]I was worried about like [19:36.60]Getting lost [19:37.60]In the kind of academic white [19:38.60]Space [19:39.60]So the very first version [19:40.60]Was actually a weekend prototype [19:41.60]That Andres made [19:42.60]Do you want to explain [19:43.60]How that worked [19:44.60]I mostly remember [19:45.60]That it was really bad [19:47.60]So the thing I remember [19:48.60]Is you entered a question [19:50.60]And it would give you back [19:51.60]A list of claims [19:52.60]So your question could be [19:53.60]I don't know [19:54.60]How does creatine effect cognition [19:56.60]And it would give you back [19:57.60]Some claims [19:58.60]That are to some extent [19:59.60]Based on papers [20:00.60]But they were often irrelevant [20:02.60]The papers were often [20:03.60]And so we ended up [20:04.60]Soon just printing out [20:05.60]A bunch of examples [20:06.60]Of results [20:07.60]And putting them up [20:08.60]On the wall [20:09.60]So that we would [20:10.60]Kind of feel the constant [20:11.60]Shame of having [20:12.60]Such a bad product [20:13.60]And would be incentivized [20:14.60]To make it better [20:15.60]And I think overtime [20:16.60]It has gotten a lot better [20:17.60]But I think [20:18.60]The initial version [20:19.60]Was like really very bad [20:20.60]But it was basically [20:21.60]Like a natural language [20:22.60]Summary of an abstract [20:23.60]Like kind of a one-sentence [20:24.60]Summary [20:25.60]And which we still have [20:26.60]And then as we learned [20:27.60]Kind of more about this [20:28.60]Systematic review workflow [20:29.60]We started expanding [20:30.60]The capability so that [20:31.60]You could extract a lot [20:32.60]And more with that [20:33.60]And were you using [20:34.60]Like embeddings [20:35.60]And cosine similarity [20:36.60]That kind of stuff [20:37.60]For retrieval [20:38.60]Or was it keyword based [20:39.60]Or [20:40.60]I think the very first version [20:42.60]Didn't even have [20:43.60]It's own search engine [20:44.60]I think the very first version [20:45.60]Probably used [20:46.60]The semantic school or API [20:48.60]Or something similar [20:49.60]And only later when we discovered [20:51.60]That the API is not very semantic [20:53.60]Then built our own search [20:55.60]Search and that has helped a lot [20:57.60]And then we're going to go into [20:59.60]Like more recent products stuff [21:01.60]But like you know [21:02.60]I think you seem the more [21:03.60]So to start up oriented [21:04.60]Business person [21:05.60]And you seem sort of more [21:06.60]Ideologically like interested [21:08.60]In research obviously [21:09.60]Because of your PhD [21:10.60]What kind of market sizing [21:11.60]Were you guys thinking [21:12.60]Right? [21:13.60]Because you're here saying [21:14.60]Like we have to double every month [21:15.60]And I'm like [21:16.60]I don't know how you make [21:17.60]That conclusion from this [21:19.60]Right? [21:20.60]Especially also as a nonprofit [21:21.60]At the time [21:22.60]I mean market size wise [21:23.60]I felt like in this space [21:25.60]Where so much was changing [21:27.60]And it was very unclear [21:29.60]What of today was actually [21:30.60]Will be true tomorrow [21:31.60]We just like [21:32.60]Really rested a lot [21:33.60]On very very simple [21:34.60]Fundamental principles [21:35.60]Which is like [21:36.60]If you can understand [21:37.60]The truth that is [21:38.60]Very economically beneficial [21:40.60]Like valuable [21:41.60]If you like know the truth [21:42.60]On principle [21:43.60]That's enough for you [21:44.60]Research is the key to many [21:45.60]Breakthroughs that are [21:46.60]Very commercially valuable [21:47.60]Because my version of it [21:48.60]Is students are poor [21:49.60]And they don't pay [21:50.60]For anything [21:51.60]Right? [21:52.60]But that's obviously not true [21:53.60]As you guys have found out [21:54.60]But you had to have [21:55.60]Some market insight [21:56.60]For me to have believed that [21:57.60]But you skipped that [21:58.60]We did encounter [21:59.60]Talking to vcs [22:00.60]For our seed round [22:01.60]A lot of vcs were like [22:02.60]You know researchers [22:03.60]They don't have any money [22:04.60]Why don't you build [22:05.60]Legal assistant [22:07.60]I think in some [22:09.60]Short-sighted way [22:10.60]Maybe that's true [22:11.60]But I think in the long run [22:12.60]R&D is such a big space [22:13.60]Of the economy [22:14.60]I think if you can [22:15.60]Substantially improve [22:17.60]How quickly people find [22:19.60]New discoveries [22:20.60]Or avoid controlled trials [22:22.60]That don't go anywhere [22:23.60]I think that's just [22:24.60]Huge amounts of money [22:25.60]And there are a lot [22:26.60]Of questions obviously [22:27.60]About between here and there [22:28.60]But I think as long as [22:29.60]The fundamental principle is there [22:31.60]We were okay with that [22:32.60]And I guess we found [22:33.60]Some investors who also were [22:34.60]Yeah congrats [22:35.60]I'm sure we can cover [22:37.60]The sort of flip later [22:39.60]I think you're about to start [22:40.60]As on like GPT-3 [22:41.60]And how like that [22:42.60]Changed things for you [22:43.60]It's funny like I guess [22:44.60]Every major GPT version [22:45.60]You have like some big insight [22:47.60]Yeah I mean [22:49.60]What do you think [22:50.60]I think it's a little bit [22:52.60]Less true for us than for others [22:54.60]Because we always believe [22:55.60]That there will basically [22:57.60]Human level machine work [23:00.60]And so [23:01.60]It is definitely true [23:02.60]That in practice [23:03.60]For your product [23:04.60]As new models come out [23:06.60]Your product starts working better [23:07.60]You can add some features [23:08.60]That you couldn't add before [23:09.60]But I don't think [23:11.60]We really ever had the [23:13.60]Moment where we were like [23:14.60]Oh wow [23:15.60]That is super unanticipated [23:17.60]We need to do something [23:18.60]Entirely different now [23:19.60]From what was on the roadmap [23:21.60]I think GPT-3 [23:22.60]Was a big change [23:23.60]Because it kind of said [23:25.60]Oh now is the time [23:26.60]To build these tools [23:27.60]And then GPT-4 [23:28.60]Was maybe a little bit [23:29.60]More of an extension [23:30.60]Of GPT-3 [23:31.60]GPT-3 over GPT-2 [23:32.60]Was like qualitative level [23:34.60]Shift [23:35.60]Then GPT-4 was like [23:36.60]Okay great [23:37.60]Now it's like more accurate [23:38.60]We're more accurate [23:39.60]On these things [23:40.60]We can answer harder questions [23:41.60]But the shape of the product [23:42.60]Had already taken place [23:43.60]By that time [23:44.60]I kind of want to ask you [23:45.60]About this sort of pivot [23:46.60]That you made [23:47.60]But I guess that was just [23:48.60]A way to sell [23:49.60]What you were doing [23:50.60]Which is you're adding [23:51.60]Extra features on grouping [23:52.60]My concepts [23:53.60]The GPT-4 pivot [23:54.60]Quote unquote pivot [23:55.60]Yeah yeah [23:56.60]Exactly [23:57.60]Yeah yeah [23:58.60]When we launched [23:59.60]This workflow [24:00.60]Now that GPT-4 [24:01.60]Was available [24:02.60]Basically [24:03.60]Elisa was at a place [24:04.60]Where we have very tabular [24:05.60]Interfaces [24:06.60]So given a table of papers [24:07.60]You can extract data [24:08.60] Across all the tables [24:09.60]But you kind of want [24:10.60]To take the analysis [24:11.60]A step further [24:12.60]Sometimes what you'd care [24:13.60]About is not having [24:14.60]A list of papers [24:15.60]But a list of arguments [24:17.60]A list of effects [24:18.60]A list of interventions [24:19.60]A list of techniques [24:20.60]And so that's [24:21.60]One of the things we're [24:22.60]Working on is now that [24:23.60]You've extracted this information [24:24.60]A way [24:25.60]Can you pivot it [24:26.60]Or group by [24:27.60]Whatever the information [24:28.60]That you extracted [24:29.60]To have more insight [24:30.60]First information [24:31.60]Still supported [24:32.60]By the academic literature [24:33.60]Yeah [24:34.60]There was a big revelation [24:35.60]When I saw it [24:36.60]Basically I think [24:37.60]I'm very just impressed [24:38.60]By how first principles [24:39.60]Your ideas [24:40.60]Around the workflow is [24:42.60]And I think [24:43.60]That's why [24:44.60]You're not as reliant [24:45.60]On like the LM [24:46.60]Improving [24:47.60]Because it's actually [24:48.60]Just about improving [24:49.60]The workflow [24:50.60]That you will recommend [24:51.60]To people [24:52.60]Today we might call [24:53.60]It's rely on [24:54.60]This is the way [24:55.60]That elicit [24:56.60]Does research [24:57.60]And this is [24:58.60]What we think [24:59.60]Is most effective [25:00.60]Based on talking to our users [25:01.60]The problem space [25:02.60]Is still huge [25:03.60]Like if it's [25:04.60]Like this big [25:05.60]We're all still operating [25:06.60]At this tiny part [25:07.60]Bit of it [25:08.60]So you know [25:09.60]I think about this a lot [25:10.60]In the context of motes [25:11.60]People are like [25:12.60]Oh what's your mode [25:13.60]What happens [25:14.60]If GPT-5 comes out [25:15.60]It's like if GPT-5 comes out [25:16.60]There's still like [25:17.60]All of this other space [25:18.60]That we can go into [25:19.60]And so I think being [25:20.60]Really obsessed [25:21.60]With the problem [25:22.60]It's a robust [25:23.60]And just kind of [25:24.60]Directly incorporate [25:25.60]Model improvements [25:26.60]And they keep going [25:27.60]And then I first encountered [25:28.60]You guys with Charlie [25:29.60]You can tell us [25:30.60]About that project [25:31.60]Basically yeah [25:32.60]Like how much did cost [25:34.60]Become a concern [25:35.60]As you're working more [25:36.60]And more with OpenAI [25:37.60]How do you manage [25:38.60]That relationship [25:39.60]Let me talk about [25:40.60]Who Charlie is [25:41.60]You can talk about that [25:42.60]Charlie is a special character [25:43.60]So Charlie [25:44.60]When we found him [25:45.60]Was had just finished [25:46.60]His freshman year [25:47.60]At the University of Warwick [25:48.60]I think he had heard [25:49.60]About us on some discord [25:50.60]And then he applied [25:51.60]And then we just saw [25:52.60]That he had done so many [25:53.60]Incredible side projects [25:54.60]And we were actually [25:55.60]On a team retreat [25:56.60]In Barcelona [25:57.60]Visiting our head of engineering [25:58.60]At that time [25:59.60]And everyone was talking [26:00.60]About this wonder kid [26:01.60]They're like this kid [26:02.60]And then on our take home [26:03.60]Project he had done [26:04.60]Like the best of anyone [26:05.60]To that point [26:06.60]And so people were [26:07.60]Just like so excited [26:08.60]To hire him [26:09.60]So we hired him [26:10.60]As an intern [26:11.60]And then we're like Charlie [26:12.60]What if you just dropped [26:13.60]Out of school [26:14.60]And so then we convinced [26:15.60] him to take a year off [26:16.60]And he's just [26:17.60]Incredibly productive [26:18.60]And I think the thing [26:19.60]You're referring to [26:20.60]He kind of launched [26:21.60]Their constitutional AI paper [26:23.60]And within a few days [26:24.60]I think four days [26:25.60]He had basically implemented [26:26.60]That in production [26:27.60]And then we had it [26:28.60]In app a week or so after that [26:30.60]And he has since kind of [26:31.60]Contributed to major improvements [26:33.60]Like cutting costs down [26:34.60]To a tenth of what they were [26:36.60]Really large scale [26:37.60]But yeah, you can talk [26:38.60]About the technical stuff [26:39.60]Yeah, on the [26:40.60]Constitutional AI project [26:41.60]This was for abstract summarization [26:43.60]Where in illicit [26:44.60]If you run a query [26:45.60]It'll return papers to you [26:47.60]And then it will summarize [26:48.60]Each paper [26:49.60]The query for you [26:50.60]On the fly [26:51.60]And that's a really [26:52.60]Important part of illicit [26:53.60]Because illicit does it so much [26:55.60]We run a few searches [26:56.60]It'll have done it [26:57.60]A few hundred times for you [26:58.60]And so we cared a lot [26:59.60]About this both [27:00.60]Being like fast, cheap [27:02.60]And also very low on hallucination [27:04.60]I think if illicit [27:05.60]Hollucinate something [27:06.60]About the abstract [27:07.60]That's really not good [27:08.60]And so what Charlie did [27:09.60]In that project was [27:11.60]Created a constitution [27:12.60]That expressed [27:13.60]Where are the attributes [27:14.60]Of a good summary [27:15.60]Everything in the summary [27:16.60]Is reflected in the actual abstract [27:18.60]It was like [27:19.60]Very concise [27:20.60]Etc. [27:21.60]And then [27:22.60]Used RLHF [27:24.60]With a model [27:25.60]That was trained [27:26.60]On the constitution [27:27.60]To basically [27:29.60]Find you a better [27:30.60]Summarizer [27:31.60]On an open source model [27:32.60]Yeah, I think [27:33.60]That might still be in use [27:34.60]Yeah, yeah, definitely [27:35.60]Yeah, I think [27:36.60]At the time [27:37.60]The models hadn't been [27:38.60]Trained at all [27:39.60]To be faithful to a text [27:41.60]So they were just generating [27:42.60]So then when you [27:43.60]Ask them a question [27:44.60]They tried too hard [27:45.60]To answer the question [27:46.60]And didn't try hard [27:47.60]Answer the question [27:48.60]Given the text [27:49.60]Or answer what the text [27:50.60] Said about the question [27:51.60]So we had to [27:52.60]Basically teach the models [27:53.60]To do that specific task [27:54.60]How do you monitor [27:55.60]The ongoing performance [27:57.60]Of your models [27:58.60]Not to get [27:59.60]To LLMopsy [28:00.60]But you are one of the [28:01.60]Larger more well-known [28:02.60]Operations [28:03.60]Doing NLP at scale [28:04.60]I guess effectively [28:06.60]Like you have to monitor [28:07.60]These things and nobody [28:08.60]Has a good answer [28:09.60]That talks to you [28:10.60]Yeah, I don't think [28:11.60]We have a good answer yet [28:12.60]I think the answers [28:13.60]Are actually a little bit [28:14.60]Clear on the [28:15.60]Just kind of basic [28:16.60]The business side [28:17.60]Of where you can [28:18.60]Import ideas [28:19.60]From normal [28:20.60]Soft engineering [28:21.60]And normal kind [28:22.60]Of DevOps [28:23.60]You're like [28:24.60]Well, you need to [28:25.60]Monitor kind [28:26.60]Of latencies [28:27.60]And response times [28:28.60]And optime and whatnot [28:29.60]Performance is more [28:30.60]Of hallucination rate [28:31.60]And then things [28:32.60]Like hallucination rate [28:33.60]Where I think there [28:34.60]The really [28:35.60]Important thing [28:36.60]Is training time [28:37.60]So we care a lot [28:38.60]About having [28:39.60]Our own internal [28:41.60]Benchmarks [28:42.60]For model development [28:44.60]That reflect [28:45.60]So that we can [28:46.60]Know ahead of time [28:47.60]How well [28:48.60]Is the model [28:49.60]Gonna perform [28:50.60]On different types [28:51.60]Of tasks [28:52.60]So the tasks being [28:53.60]Summarization [28:54.60]Question answering [28:55.60]Given a paper [28:56.60]Ranking [28:57.60]And for each of those [28:58.60]We wanna know [28:59.60]What's the distribution [29:00.60]Of things the model [29:01.60]Is gonna see [29:02.60]So that we can [29:03.60]Have well-calibrated [29:04.60]Predictions on [29:05.60]How well the model [29:06.60]Is gonna do in production [29:07.60]And I think, yeah, [29:08.60]There's like [29:09.60]Some chance [29:10.60]That there's distribution [29:11.60]Shift and actually [29:12.60]The things users enter [29:13.60]Are gonna be different [29:14.60]Trainings right [29:15.60]And having [29:16.60]Very high quality [29:17.60]Well-vetted data [29:18.60]Sets at training time [29:19.60]I think we also [29:20.60]End up effectively [29:21.60]Monitoring by trying [29:22.60]To evaluate new models [29:23.60]As they come out [29:24.60]And so that like [29:25.60]Kind of prompts us [29:26.60]To go through [29:27.60]Our eval suite [29:28.60]Every couple of months [29:29.60]And so every time [29:30.60]A new model comes out [29:31.60]We have to see [29:32.60]Like how is this performing [29:33.60]Relative to production [29:34.60]And what we currently have [29:35.60]Yeah, I mean [29:36.60]Since we're on this topic [29:37.60]Any new models [29:38.60]That really call [29:39.60]Your eye this year [29:40.60]Like cloud came out [29:41.60]Yeah, I think cloud [29:42.60]Is pretty pretty [29:43.60]Like a good point [29:44.60]On the kind of [29:45.60]Predo frontier [29:46.60]It's neither [29:47.60]The cheapest model [29:48.60]Nor is it [29:49.60]The most accurate [29:51.60]Most high quality model [29:52.60]But it's just [29:53.60]Like a really good tradeoff [29:54.60]Between cost and accuracy [29:56.60]You apparently [29:57.60]Have to 10 shot it [29:58.60]To make it good [29:59.60]I tried using [30:00.60]Aiku for summarization [30:01.60]But zero shot [30:02.60]Was not great [30:03.60]Then they were like [30:04.60]You know, it's a skill issue [30:05.60]You have to try it harder [30:06.60]Interesting [30:07.60]I think GPT-4 [30:08.60]Unlocked tables for us [30:10.60]Processing data from tables [30:11.60]Which was huge [30:12.60]GPT-4 vision [30:13.60]Yeah [30:14.60]Did you try for you [30:15.60]I guess you can't try for you [30:16.60]Because it's noncommercial [30:17.60]That's the adept model [30:18.60]Yeah, we haven't tried that one [30:19.60]Yeah [30:20.60]Yeah, but cloud is multimodal as well [30:22.60]Yeah [30:23.60]I think the interesting insight [30:24.60]That we got from talking to David Luan [30:25.60]Who is CEO of Adept [30:26.60]Was that multimodality [30:28.60]Has effectively two different flavors [30:30.60]Like one is [30:31.60]Rerecognize images from a camera [30:33.60]In the outside natural world [30:35.60]And actually the more important [30:37.60]Multimodality for knowledge work [30:38.60]Is screenshots [30:39.60]And you know [30:40.60]PDFs and charts and graphs [30:42.60]So we need a new term [30:43.60]For that kind of multimodality [30:45.60]But is a claim [30:46.60]That current models [30:47.60]Are good at one or the other [30:49.60]Yeah, they're over index [30:50.60]Because of the history of computer vision [30:51.60]Is coco, right? [30:53.60]So now we're like [30:54.60]Oh, actually, you know [30:55.60]Screens are more important [30:56.60]OCR handwriting [30:58.60]You mentioned a lot of [30:59.60]Closed model lab stuff [31:01.60]And then you also have [31:02.60]Like this open source model [31:03.60]Find tuning stuff [31:04.60]Like what is your workload [31:05.60]Now between close and open [31:06.60]It's a good question [31:07.60]I think [31:08.60]It's half and half [31:09.60]Is that even a relevant question [31:10.60]Or not [31:11.60]This is a nonsensical question [31:12.60]It depends a little bit on [31:13.60]Like how you index [31:14.60]Whether you index by [31:15.60]Like computer cost [31:16.60]The number of queries [31:17.60]I'd say like [31:18.60]In terms of number of queries [31:19.60]Is maybe similar [31:20.60]In terms of like costing computer [31:22.60]I think the closed models [31:23.60]Make up more of the budget [31:25.60]Since the main cases [31:26.60]Where you want to use closed models [31:28.60]Are cases where [31:29.60]They're just smarter [31:31.60]Where there are no existing [31:33.60]Open source models [31:34.60]Are quite smart enough [31:35.60]Yeah [31:36.60]We have a lot of [31:37.60]Interesting technical questions [31:38.60]To go in [31:39.60]But just to wrap [31:40.60]The kind of like [31:41.60]UX evolution [31:42.60]Now you have the notebooks [31:43.60]We talked a lot [31:44.60]About how chatbots [31:45.60]Are not the final frontier [31:47.60]You know [31:48.60]How did you decide [31:49.60]To get into notebooks [31:50.60]Which is a very iterative [31:51.60]Kind of like interactive [31:52.60]Interface [31:53.60]And yeah [31:54.60]Maybe learnings from that [31:55.60]Yeah this is actually [31:56.60]Our fourth time [31:57.60]Trying to make this work [31:59.60]I think the first time [32:00.60]Was probably in early 2021 [32:03.60]I think because [32:04.60]We've always been obsessed [32:05.60]With this idea of task [32:06.60]Decomposition [32:07.60]And like branching [32:08.60]We always wanted a tool [32:10.60]That could be kind of [32:11.60]Unbounded [32:12.60]Where you could keep going [32:13.60]Could do a lot of branching [32:14.60]Where you could kind of apply [32:15.60]Language model operations [32:17.60]Or computations on other tasks [32:19.60]So in 2021 [32:20.60]We had this thing called [32:21.60]Composite tasks [32:22.60]Where you could use GPT-3 [32:23.60]To brainstorm [32:24.60]A bunch of research questions [32:25.60]And then take [32:26.60]Each research question [32:27.60]And decompose those [32:28.60]Further into subquestions [32:30.60]This kind of again [32:31.60]That like task decomposition [32:32.60]Tree type thing [32:33.60]Was always very exciting to us [32:35.60]But that was like [32:36.60]It was kind of overwhelming [32:37.60]Then at the end of 22 [32:39.60]I think we tried again [32:40.60]And at that point [32:41.60]We were thinking [32:42.60]Okay we've done a lot [32:43.60]With this literature review thing [32:44.60]We also want to start helping [32:45.60]With kind of adjacent domains [32:47.60]And different workflows [32:48.60]Like we want to help more [32:49.60]With machine learning [32:50.60]What does that look like [32:51.60]And as we were thinking [32:52.60]About it we're like [32:53.60]Well there are so many [32:54.60]Research workflows [32:55.60]How do we not just build [32:56.60]Three new workflows [32:57.60]Into elicit [32:58.60]But make elicit [32:59.60]Really generic [33:00.60]To lots of workflows [33:01.60]What is like a generic [33:02.60]Composable system [33:03.60]With nice abstractions [33:04.60]That can like [33:05.60]Scale to all these workflows [33:06.60]So we like [33:07.60]Iterated on that a bunch [33:08.60]And like [33:09.60]Didn't quite narrow [33:10.60]The problem space enough [33:11.60]Or like [33:12.60]Get to what we wanted [33:13.60]And then I think it was [33:14.60]At the beginning of 2023 [33:16.60]We were like [33:17.60]Wow computational notebooks [33:18.60]Kind of enable this [33:19.60]Where they have a lot [33:20.60]Of flexibility [33:21.60]But you know [33:22.60]Kind of robust primitive [33:23.60]Such that you can extend [33:24.60]The workflow [33:25.60]And it's not limited [33:26.60]It's not like [33:27.60]You ask a query [33:28.60]You get an answer [33:29.60]You're done [33:30.60]You can just constantly [33:31.60]Keep building on top of that [33:32.60]And each little step [33:33.60]Seems like a really good [33:34.60]Work for the language model [33:35.60]And also there was just [33:36.60]Like really helpful [33:37.60]To have a bit more [33:38.60]Preexisting work to emulate [33:40.60]Yeah, that's kind of [33:41.60]How we ended up at [33:42.60]Computational notebooks [33:43.60]For elicit [33:44.60]Maybe one thing [33:45.60]That's worth making explicit [33:46.60]Is the difference between [33:47.60]Computational notebooks [33:48.60]And chat because [33:49.60]On the surface [33:50.60]They seem pretty similar [33:51.60]It's kind of this iterative [33:52.60]Interaction where you add stuff [33:53.60]In both cases [33:54.60]You have a back and forth [33:55.60]Between you enter stuff [33:56.60]And then you get some output [33:57.60]And then you enter stuff [33:58.60]But the important difference [33:59.60]In our minds is [34:00.60]With notebooks [34:01.60]You can define a process [34:03.60]So in data science [34:04.60]You know like [34:05.60]Here's like my data analysis [34:06.60]Process that takes in a CSV [34:08.60]And then does some extraction [34:09.60]And then generates a figure [34:10.60]At the end [34:11.60]And you can prototype it [34:13.60]Using a small CSV [34:14.60]And then you can run it [34:15.60]Over a much larger CSV [34:16.60]Later [34:17.60]And similarly [34:18.60]The vision for notebooks [34:19.60]In our case [34:20.60]Is to not make it this [34:22.60]Like one of chat interaction [34:23.60]But to allow you to then [34:25.60]Say if you start [34:27.60]And first you're like [34:28.60]Okay, let me just [34:29.60]Analyze a few papers [34:30.60]And see do I get to [34:31.60]The correct conclusions [34:32.60]For those few papers [34:33.60]Can I then later [34:34.60]Go back and say [34:35.60]Now let me run this [34:36.60]Over 10,000 papers [34:38.60]Now that I've debug [34:39.60]The process [34:40.60]Using a few papers [34:41.60]And that's an interaction [34:42.60]That doesn't fit [34:43.60]Quite as well [34:44.60]Into the chat framework [34:45.60]Because that's more [34:46.60]For kind of quick [34:47.60]Back and forth [34:48.60]Interaction [34:49.60]Do you think in notebooks [34:50.60]That's kind of like [34:51.60]Structure, editable [34:52.60]Chain of thought [34:53.60]Basically step by step [34:54.60]Like is that kind of [34:55.60]Where you see this going [34:56.60]And then are people [34:57.60]Gonna reuse notebooks [34:59.60]As like templates [35:00.60]And maybe in traditional [35:01.60]Notebooks [35:02.60]As like cookbooks [35:03.60]Right, you share a cookbook [35:04.60]You can start from there [35:05.60]Is that similar [35:06.60]And illicit [35:07.60]Yeah, that's exactly right [35:08.60]So that's our hope [35:09.60]That people will build templates [35:10.60]Share them with other people [35:12.60]I think chain of thought [35:13.60]Is maybe still like [35:14.60]Kind of one level [35:15.60]Lower on the abstraction hierarchy [35:17.60]Then we would think of notebooks [35:19.60]I think we'll probably [35:20.60]Want to think about [35:21.60]More semantic pieces [35:22.60]Like a building block [35:23.60]Is more like a paper search [35:25.60]Or an extraction [35:26.60]Or a list of concepts [35:28.60]And then the models [35:30.60]And the reasoning [35:31.60]Will probably often be [35:32.60]One level down [35:33.60]You always want to [35:34.60]Be able to see it [35:35.60]But you don't always [35:36.60]Want it to be front and center [35:37.60]Yeah, what's the difference [35:38.60]Between a notebook [35:39.60]And an agent [35:40.60]Since everybody always [35:41.60]Ask me what's an agent [35:42.60]Like how do you think [35:43.60]About where the line is [35:45.60]Yeah, it's an interesting [35:46.60]Question [35:47.60]In the notebook world [35:48.60]I would [35:49.60]Generally think of [35:50.60]The human as the agent [35:51.60]In the first iteration [35:52.60]So you have the notebook [35:53.60]And the human kind of [35:54.60]Adds little action steps [35:56.60]And then the next point [35:58.60]On this kind of progress [35:59.60]Okay, now you can use [36:00.60]Language models to predict [36:01.60]Which action [36:02.60]Would you take as a human [36:03.60]And at some point [36:04.60]You're probably going to [36:05.60]Be very good at this [36:06.60]You'll be like, okay [36:07.60]In some cases, I can [36:08.60]With 99.9% accuracy [36:09.60]Predict what you do [36:10.60]And then you might [36:11.60]As well just execute it [36:12.60]Like why wait for the human [36:13.60]And eventually [36:14.60]As you get better at this [36:15.60]That will just look [36:16.60]More and more like agents [36:18.60]Taking actions [36:19.60]As opposed to you [36:20.60]Doing the thing [36:21.60]I think templates [36:22.60]Are a specific case of this [36:23.60]Very like, okay, well [36:24.60]There's just particular [36:25.60]Sequences of actions [36:26.60]That you often want to chunk [36:27.60]And have available [36:28.60]Just like in normal [36:29.60]Programming [36:30.60]And those [36:31.60]You can view them as [36:32.60]Action sequences of agents [36:33.60]Or you can view them as [36:34.60]More normal programming [36:36.60]Language abstraction thing [36:37.60]And I think those [36:38.60]Are two valid views [36:40.60]How do you see this [36:41.60]Changes [36:42.60]Like you said, the models [36:43.60]Get better and you need [36:44.60]Less and less human [36:45.60]Actual interfacing [36:47.60]With the model [36:48.60]You just get the results [36:49.60]Like how does the UX [36:50.60]And the way people [36:51.60]Perceive it change [36:52.60]Yeah, I think this [36:53.60] kind of interaction [36:54.60]Paradimes for evaluation [36:55.60]Is not really something [36:56.60]The internet has encountered [36:57.60]Yet because up to now [36:58.60]The internet has all been [36:59.60]About getting data [37:00.60]And work from people [37:02.60]So increasingly [37:03.60]I really want kind of [37:04.60]Evaluation both from [37:05.60]An interface perspective [37:06.60]And from like a [37:07.60]Technical perspective [37:08.60]Operation perspective [37:09.60]To be a superpower [37:10.60]For elicit because I think [37:11.60]Over time models will do [37:12.60]More and more of the work [37:13.60]And people will have [37:14.60]To do more and more [37:15.60]Of the evaluation [37:16.60]So I think yeah [37:17.60]In terms of the interface [37:18.60]Some of the things we have [37:19.60]Today, you know [37:20.60]For every kind of [37:21.60]Language model generation [37:22.60]There's some citation back [37:23.60]And we kind of try to [37:24.60]Highlight the ground truth [37:25.60]In the paper [37:26.60]To whatever elicit said [37:27.60]And make it super easy [37:28.60]So you can click on it [37:29.60]And quickly see [37:30.60]In context and validate [37:31.60]Whether the text [37:32.60]Actually supports [37:33.60]The answer that elicit gave [37:34.60]So I think we'd probably [37:35.60]Want to scale things up [37:37.60]Like that, like the ability [37:38.60]To kind of spot check [37:39.60]The models work super [37:40.60]Quickly scale up [37:41.60]Interfaces like that [37:42.60]And who would spot check [37:44.60]The user [37:45.60]Yeah, to start [37:46.60]It would be the user [37:47.60]One of the other things [37:48.60]We do is also kind of flag [37:49.60]The models uncertainty [37:50.60]So we have models report [37:52.60]Out how confident are you [37:53.60]That this was the [37:54.60]Sample size of this study [37:55.60]The model's not sure [37:56.60]We throw a flag [37:57.60]And so the user knows [37:58.60]To prioritize checking that [37:59.60]So again, we can kind of [38:00.60]Scale that up [38:01.60]So when the model's like [38:02.60]Well, I searched this [38:03.60]On Google, I'm not sure [38:04.60]If that was the right thing [38:05.60]I have an uncertainty flag [38:06.60]And the user can go [38:07.60]And be like, okay [38:08.60]That was actually [38:09.60]The right thing to do or not [38:10.60]I've tried to do [38:11.60]Uncertainty ratings [38:12.60]From models [38:13.60]I don't know [38:14.60]If you have this live [38:15.60]Because I just [38:16.60]Didn't find them reliable [38:17.60]Because they just elucidated [38:18.60]Their own uncertainty [38:19.60]I would love to [38:20.60]Based on log probes [38:22.60]Or something more [38:23.60]Native within the model [38:24.60]Better than generated [38:25.60]But it sounds like [38:27.60]The scale properly for you [38:29.60]Yeah, we found it [38:30.60]To be pretty calibrated [38:31.60]Diverse on the model [38:32.60]I think in some cases [38:33.60]We also used [38:34.60]To different models [38:35.60]For the answer estimates [38:36.60]Then for the question [38:37.60]Answering [38:38.60]So one model would say [38:39.60]Here's my chain of thought [38:40.60]Here's my answer [38:41.60]And then a different [38:42.60]Type of model [38:43.60]Let's say the first model [38:44.60]Is Lama [38:45.60]And let's say the second [38:46.60]Model is GP3.5 [38:47.60]And then the second model [38:49.60]Just looks over [38:50.60]The results and like [38:51.60]Okay, how confident [38:52.60]Are you in this [38:53.60]And I think [38:54.60]Sometimes using [38:55.60]A different model [38:56.60]Can be better than [38:57.60]Using the same model [38:58.60]Yeah, you know [38:59.60]On topic of models [39:00.60]Evaluating models [39:01.60]Obviously you can [39:02.60]Do that all day long [39:03.60]Like what's your budget [39:04.60]Like because [39:05.60]Your queries [39:06.60]Fan out a lot [39:07.60]And then you have [39:08.60]Models evaluating models [39:09.60]One person typing [39:10.60]In a question [39:11.60]Can lead to [39:12.60]A thousand calls [39:13.60]It depends on the project [39:14.60]So if the project [39:15.60]Is basically [39:16.60]A systematic review [39:17.60]That otherwise [39:18.60]Human research assistance [39:19.60]Would do [39:20.60]Then the project [39:21.60]Is basically [39:22.60]Can get quite large [39:23.60]For those projects [39:24.60]I don't know [39:25.60]Let's say [39:26.60]A hundred thousand dollars [39:27.60]So in those cases [39:28.60]You're happier [39:29.60]To spend compute [39:30.60]Then in the [39:31.60]Can of shallow search case [39:32.60]Where someone [39:33.60]Just enters a question [39:34.60]Because I don't know [39:35.60]Maybe like it [39:36.60]I heard about creatine [39:37.60]What's it about [39:38.60]Probably don't want [39:39.60]To spend a lot of compute [39:40.60]On that [39:41.60]This sort of [39:42.60]Being able to invest [39:43.60]More or less compute [39:44.60]Into getting [39:45.60]More or less accurate answers [39:46.60]I think one of the [39:47.60]Core things we care about [39:48.60]And that I think [39:49.60]Is currently undervalued [39:50.60]In the AI space [39:51.60]You can't choose [39:52.60]Which model you want [39:53.60]And you can sometimes [39:54.60]I don't know [39:55.60]You'll tip it [39:56.60]It'll try harder [39:57.60]Or you can try various [39:58.60]Things to get it to work harder [40:00.60]But you don't have great [40:01.60]Ways of converting [40:02.60]Willingness to spend [40:03.60]Into better answers [40:04.60]And we really [40:05.60]Want to build a product [40:06.60]That has this sort of [40:07.60]Unbounded flavor [40:08.60]Where like if you care [40:09.60]About it a lot [40:10.60]You should be able to get [40:11.60]Really high quality answers [40:12.60]Really double-checked [40:13.60]In every way [40:14.60]And you have a [40:15.60]Credit-based pricing [40:16.60]So unlike most products [40:17.60]It's not a fixed monthly [40:19.60]Right exactly [40:20.60]Some of the [40:21.60]Higher costs are [40:22.60]Teared [40:23.60]So for most casual users [40:25.60]They'll just get [40:26.60]The abstract summary [40:27.60]Which is kind of [40:28.60]An open source model [40:29.60]Then you can [40:30.60]Add more columns [40:31.60]Which have more [40:32.60]Extractions [40:33.60]And these uncertainty features [40:34.60]And then you can also [40:35.60]Add the same columns [40:36.60]In high accuracy mode [40:37.60]Which also parses the table [40:38.60]So we kind of [40:39.60]Stack the complexity [40:40.60]And the cost [40:41.60]You know the fun thing [40:42.60]You can do with a credit system [40:43.60]Which is data for data [40:44.60]Basically you can [40:45.60]Give people more credit [40:46.60]If they give [40:47.60]Data back to you [40:48.60]I don't know [40:49.60]You don't have money [40:50.60]But you have time [40:51.60]How do you exchange that [40:53.60]It's a fair trade [40:54.60]I think it's interesting [40:55.60]We haven't quite operationized it [40:56.60]And then you know [40:57.60]There's been some kind of like [40:58.60]Adverse selection [40:59.60]Like you know for example [41:00.60]It would be really valuable [41:01.60]To get feedback on our model [41:02.60]So maybe if you were willing [41:03.60]To give more robust feedback [41:04.60]On our results [41:05.60]We could give you credits [41:06.60]Or something like that [41:07.60]But then there's kind of this [41:08.60]Will people take it seriously [41:09.60]And you want the good people [41:10.60]Exactly [41:11.60]Can you tell who are the good people [41:12.60]Not right now [41:13.60]But yeah maybe [41:14.60]At the point where we can [41:15.60]We can offer it [41:16.60]We can offer it up to them [41:17.60]The perplexity of questions asked [41:18.60]If it's higher perplexity [41:19.60]These are smarter people [41:20.60]Yeah maybe [41:21.60]And if you make a lot of typos [41:22.60]In your queries [41:23.60]You're not going to get off [41:24.60]How does that change [41:25.60]Negative social credit [41:28.60]It's very topical right now [41:29.60]To think about [41:30.60]The threat of long context windows [41:32.60]All these models [41:34.60]We're talking about these days [41:35.60]All like a million tokens plus [41:36.60]Is that relevant for you [41:38.60]Can you make use of that [41:39.60]Is that just prohibitively expensive [41:41.60]Because you're just paying [41:42.60]For all those tokens [41:43.60]Or you're just doing right [41:44.60]It's definitely relevant [41:45.60]And when we think about search [41:46.60]As many people do [41:47.60]We think about kind of [41:48.60]A staged pipeline [41:49.60]Of retrieval [41:50.60]Where first you use [41:51.60]Semitic search database [41:53.60]With embeddings [41:54.60]Get like the [41:55.60]In our case maybe 400 [41:56.60]Or so most relevant papers [41:57.60]And then [41:58.60]You still need to rank those [41:59.60]And I think at that point [42:01.60]It becomes pretty interesting [42:02.60]To use larger models [42:04.60]So specifically in the past [42:06.60]I think a lot of ranking [42:07.60]Was kind of per item ranking [42:09.60]Where you would score [42:10.60]Each individual item [42:11.60]Maybe using increasingly [42:12.60]Expensive scoring methods [42:13.60]And then rank based on the scores [42:15.60]But I think list wise [42:16.60]We ranking where [42:17.60]You have a model [42:18.60]That can see [42:19.60]All the elements [42:20.60]Is a lot more powerful [42:21.60]Because often you can [42:22.60]Only really tell [42:23.60]How good a thing is [42:24.60]In comparison to other things [42:26.60]And what things should come first [42:28.60]It really depends on [42:29.60]Like well what other things [42:30.60]Are available [42:31.60]Maybe you even care about [42:32.60]Diversity and your results [42:33.60]You don't want to show [42:34.60]Ten very similar papers [42:35.60]As the first 10 results [42:36.60]So I think along context models [42:38.60]Are quite interesting there [42:40.60]And especially for our case [42:41.60]Where we care more about [42:43.60]Power users who are perhaps [42:45.60]A little bit more [42:46.60]Welling to wait a little bit longer [42:47.60]To get higher quality results [42:48.60]Relative to people who just [42:50.60]Quickly check out things [42:51.60]Because why not [42:52.60]I think being able to spend [42:53.60]More on longer context [42:54.60]Is quite valuable [42:55.60]Yeah I think one thing [42:56.60]The longer context models [42:57.60]Changed for us [42:58.60]Is maybe a focus from [43:00.60]Breaking down tasks [43:01.60]To breaking down the evaluation [43:03.60]So before you know [43:05.60]If we wanted to answer [43:06.60]A question from the full text [43:08.60]Of a paper [43:09.60]We had to figure out [43:10.60]How to chunk it and like [43:11.60]Find the relevant chunk [43:12.60]And then answer [43:13.60]Based on that chunk [43:14.60]Then you know [43:15.60]Which chunk the model [43:16.60]Used to answer the question [43:17.60]So if you want to help [43:18.60]The user to check it [43:19.60]Yeah you can be like [43:20.60]Well this was the chunk [43:21.60]That the model got [43:22.60]And now if you put the whole [43:23.60]Text in the paper [43:24.60]You have to kind of [43:25.60]Find the chunk [43:26.60]Like more retroactively [43:27.60]Basically and so you need [43:28.60]Kind of like a different [43:29.60]Set of abilities [43:30.60]And obviously like [43:31.60]A different technology [43:32.60]To figure out [43:33.60]You still want to point [43:34.60]The user to the supporting [43:35.60]Quotes in the text [43:36.60]But then the interaction [43:37.60]Is a little different [43:38.60]You like scan through [43:39.60]And find some rouge score [43:40.60]Yeah the floor [43:41.60]I think there's an [43:42.60]Interesting space of [43:43.60]Almost research problems [43:44.60]Here because [43:45.60]You would ideally [43:46.60]Make causal claims [43:47.60]Like if this [43:48.60]Hadn't been in the text [43:49.60]The model wouldn't [43:50.60]Have said this thing [43:51.60]And maybe you can do [43:52.60]Expensive approximations [43:53.60]To that where like [43:54.60]I don't know you just [43:55.60]Throw a chunk of the paper [43:56.60]And re-answer [43:57.60]And see what happens [43:58.60]But hopefully [43:59.60]There are better [44:00.60]Ways of doing that [44:01.60]Where you just get [44:03.60]That kind of counterfactual [44:04.60]Information for free [44:05.60]From the model [44:06.60]Do you think at all [44:07.60]About the cost of maintaining [44:09.60]Reg versus just putting [44:10.60]More tokens in the window [44:12.60]I think in software [44:13.60]Development a lot of [44:14.60]Times people buy [44:15.60]Developer productivity [44:16.60]Things so that [44:17.60]We don't have to worry [44:18.60]About it context windows [44:19.60]Kinda the same right [44:20.60]You have to maintain [44:21.60]Chunking and like [44:22.60]Reg retrieval and like [44:23.60]Re-ranking and all of this [44:24.60] Versus I just shove [44:25.60]Everything into the context [44:26.60]And like it costs [44:27.60]A little more [44:28.60]But at least I don't [44:29.60]Have to do all of that [44:30.60]Is that something [44:31.60]You thought about [44:32.60]I think we still [44:33.60]Like hit up against [44:34.60]Context limits enough [44:35.60]That it's not really [44:36.60]Do we still want [44:37.60]To keep this rag around [44:38.60]It's like we do still [44:39.60]Need it for the scale [44:40.60]The worth we're doing [44:41.60]I think there are [44:42.60]Different kinds of [44:43.60]Maintainability in [44:44.60]One sense I think [44:45.60]You write that [44:46.60]Throw everything into [44:47.60]The context window thing [44:48.60]Is easier to maintain [44:49.60]Because you just [44:50.60]Can swap out a model [44:52.60]In another sense [44:53.60]If things go wrong [44:54.60]It's harder to debug [44:55.60]Like if you know [44:56.60]Here's the process [44:57.60]That we go through [44:58.60]To go from [45:00.60]200 million papers [45:01.60]To an answer [45:02.60]And there are like [45:03.60]Little steps [45:04.60]And you understand [45:05.60]Okay this is the step [45:06.60]That finds the relevant [45:07.60]Paragraph or whatever [45:08.60]Maybe you'll know [45:09.60]Which step breaks [45:10.60]If it's just like [45:11.60]A new model [45:12.60]Version came out [45:13.60]And now it suddenly [45:14.60]Doesn't find your needle [45:15.60]In a haystack anymore [45:16.60]Then you're like [45:17.60]Okay what can you do [45:18.60]You're kind of at a loss [45:20.60]Yeah let's talk [45:21.60]A bit about needle [45:22.60]In a haystack [45:23.60]And like maybe [45:24.60]The opposite of it [45:25.60]Which is like hard [45:26.60]Grounding I don't know [45:27.60]That's like the best thing [45:28.60]To think about it [45:29.60]But I was using [45:30.60]One of these [45:31.60]Chavicher documents [45:32.60]Features [45:33.60]And I put the [45:34.60]AMD MI300 [45:35.60]Spacks and the [45:36.60]Blackwell chips [45:37.60]From NVIDIA [45:38.60]And I was asking questions [45:39.60]And we like [45:40.60]And the response was like [45:41.60]Oh it doesn't say [45:42.60]In the specs [45:43.60]But if you ask [45:44.60]GbD4 without the docs [45:45.60]It would tell you no [45:46.60]Because nvlink [45:47.60]It's an NVIDIA [45:48.60]It's technology [45:49.60]Just as your N.V. [45:50.60]Yeah hey man [45:51.60]It just says in the thing [45:52.60]How do you think about [45:53.60]That having the context [45:54.60]Sometimes suppress [45:55.60]The knowledge [45:56.60]That the model has [45:57.60]It really depends on the task [45:58.60]Because I think [45:59.60]Sometimes that is [46:00.60]Exactly what you want [46:01.60]So imagine your researcher [46:02.60]You're writing the background [46:03.60]Section of your paper [46:04.60]And you're trying to describe [46:05.60]What these other papers say [46:06.60]You really don't want [46:07.60]Extra information [46:08.60]To be introduced there [46:09.60]In other cases [46:10.60]Where you're just trying [46:11.60]To figure out the truth [46:12.60]And you're giving [46:13.60]The documents because [46:14.60]You think they will help [46:15.60]The model figure out [46:16.60]What the truth is [46:17.60]I think you do want [46:18.60]If the model has a hunch [46:19.60]That there might be [46:21.60]Something that's not [46:22.60]In the papers [46:23.60]You do want to surface that [46:24.60]I think ideally [46:25.60]You still don't want [46:26.60]The model to just tell you [46:27.60]I think probably [46:28.60]The ideal thing [46:29.60]Looks a bit more like [46:30.60]Agent control [46:31.60]Where the model can issue [46:33.60]A query that then [46:35.60]Is intended to surface [46:36.60]The documents that [46:37.60]Substantiate its hunch [46:38.60]That's maybe [46:39.60]A reasonable middle ground [46:40.60]Between [46:41.60]While just telling you [46:42.60]And while being fully [46:43.60]Limited to the papers [46:44.60]You give it [46:45.60]Yeah, I would say [46:46.60]They're just kind of [46:47.60]Different tasks right now [46:48.60]And the tasks that [46:49.60]Elicit is mostly focused on [46:50.60]Is what do these papers say [46:51.60]But there is another task [46:52.60]Which is like [46:53.60]Just give me the best [46:54.60]Possible answer [46:55.60]And that give me [46:56.60]The best possible answer [46:57.60]Sometimes depends [46:58.60]On what do these papers say [46:59.60]But it can also depend [47:00.60]On other stuff [47:01.60]That's not in the papers [47:02.60]So ideally [47:03.60]We can do both [47:04.60]And then kind of [47:05.60]We can ask [47:06.60]For you [47:07.60]More going forward [47:08.60]We have [47:09.60]See a lot of details [47:10.60]But just to zoom [47:11.60]Back out a little bit [47:12.60]What are maybe [47:13.60]The most underrated [47:14.60]Features of elicit [47:16.60]And what is [47:17.60]One thing that [47:18.60]Maybe the users [47:19.60]Surprise you the most [47:20.60]By using it [47:21.60]I think the most [47:22.60]Powerful feature of elicit [47:23.60]Is the ability to [47:24.60]Extract [47:25.60]Add columns to this table [47:26.60]Which effectively [47:27.60]Extracts data [47:28.60]From all of your [47:29.60]Papers at once [47:30.60]It's well used [47:31.60]But there are [47:32.60]Kind of many different [47:33.60]Extensions of that [47:34.60]We let you [47:35.60]Give a description [47:36.60]Of the column [47:37.60]We let you give instructions [47:38.60]Of a column [47:39.60]We let you create custom [47:40.60]Column [47:41.60]So we have like 30 [47:42.60]Plus predefined fields [47:43.60]That users can extract [47:44.60]Like what were the methods [47:45.60]What were the main findings [47:46.60]How many people were studied [47:48.60]And we actually show [47:49.60]You basically the prompts [47:50.60]That we're using to [47:51.60]Extract that from [47:52.60]Our predefined fields [47:53.60]And then you can fork this [47:54.60]And you can say [47:55.60]Oh, actually I don't care [47:56.60]About the population of people [47:57.60]I only care about [47:58.60]The population of rats [47:59.60]Like you can change [48:00.60]The instructions [48:01.60]So I think users [48:02.60]Are still kind of discovering [48:03.60]This predefined [48:04.60]Easy to use default [48:06.60]But that they can extend it [48:07.60]To be much more [48:08.60]Specific to them [48:09.60]And then they can also ask [48:10.60]Custom questions [48:11.60]One use case of that [48:12.60]Is you can start to [48:13.60]Create different column types [48:14.60]That you might not expect [48:15.60]So rather than just [48:16.60]Creating generative answers [48:17.60]Like a description [48:18.60]Of the methodology [48:19.60]You can say [48:20.60]Classify the methodology [48:22.60]Into a prospective study [48:23.60]A retrospective study [48:24.60]Or a case study [48:26.60]And then you can filter [48:27.60]Based on that [48:28.60]It's like all using [48:29.60]The same kind of technology [48:30.60]And the interface [48:31.60]But it unlocks [48:32.60]So I think that [48:33.60]The ability to ask [48:34.60]Custom questions [48:35.60]Give instructions [48:36.60]And specifically use [48:37.60]That to create different [48:38.60]Types of columns [48:39.60]Like classification columns [48:41.60]Is still pretty underrated [48:42.60]In terms of use case [48:44.60]I spoke to someone [48:45.60]Who works in medical affairs [48:47.60]At a genomic sequencing [48:48.60]Company recently [48:49.60]So you know [48:50.60]The doctors kind of order [48:52.60]These genomic tests [48:53.60]These sequencing tests [48:54.60]To kind of identify [48:55.60]If a patient has [48:56.60]A particular disease [48:57.60]This company helps [48:58.60]And process it [48:59.60]And this person [49:00.60]Basically interacts [49:01.60]With all the doctors [49:02.60]And if the doctors [49:03.60]Have any questions [49:04.60]My understanding is that [49:05.60]Medical affairs [49:06.60]Is kind of like customer [49:07.60]Support or customer success [49:08.60]In pharma [49:09.60]So this person [49:10.60]Talks to doctors all day long [49:11.60]And one of the things [49:12.60]They started using elicit for [49:13.60]Is like putting the results [49:14.60]Of their tests [49:15.60]As a query [49:17.60]Like this test showed [49:18.60]You know this percentage [49:19.60]Presence of this [49:20.60]And 40% that [49:21.60]And whatever [49:22.60]You know what genes are present [49:23.60]Here or within this sample [49:25.60]And getting kind of [49:26.60]A list of academic papers [49:27.60]That would support their findings [49:29.60]And using this to help [49:30.60]The doctors [49:31.60]Interpret their tests [49:32.60]So we talked about [49:33.60]Okay cool [49:34.60]Like if we built [49:35.60]He's pretty interested [49:36.60]In kind of doing a survey [49:37.60]Of infectious disease [49:38.60]Specialists [49:39.60]And getting them [49:40.60]To evaluate [49:41.60]You know having them [49:42.60]Right up their answers [49:43.60]Comparing it to elicit [49:44.60]Answers trying to see [49:45.60]Can elicit start being [49:46.60]Used to interpret [49:47.60]The results of [49:48.60]These diagnostic tests [49:49.60]Because the way [49:50.60]They ship these tests [49:51.60]To doctors [49:52.60]Is they report [49:53.60]On a really wide [49:54.60]Array of things [49:55.60]He was saying [49:56.60]That at a large [49:57.60]Well resourced hospital [49:58.60]Like a city hospital [49:59.60]There might be [50:00.60]A team of infectious disease [50:01.60]Specialists who can [50:02.60]Help interpret [50:03.60]These results [50:04.60]But at underresourced [50:05.60]Hospitals or more [50:06.60]Rural hospitals [50:07.60]The primary care physician [50:08.60]Can't interpret [50:09.60]The test results [50:10.60]So then they can't order [50:11.60]They can't use it [50:12.60]They can't help [50:13.60]The patients with it [50:14.60]So thinking about [50:15.60]An evidence backed way [50:16.60]Of interpreting these tests [50:17.60]Definitely kind of [50:18.60]An extension of the product [50:19.60]That I hadn't considered [50:20.60]Before [50:21.60]But yeah the idea of [50:22.60]Using that to bring [50:23.60]More access to physicians [50:24.60]In all different parts [50:25.60]Of the country [50:26.60]And helping them [50:27.60]Interpret complicated [50:28.60]We are kenjun [50:29.60]From mv1 [50:30.60]On the podcast [50:31.60]And we talked about [50:32.60]Better allocating [50:33.60]Scientific resources [50:34.60]How do you think about [50:35.60]These use cases [50:36.60]And maybe [50:37.60]How illicit [50:38.60]Can help drive [50:39.60]More research [50:40.60]And do you see [50:41.60]A world in which [50:42.60]You know maybe the models [50:43.60]Actually do [50:44.60]Some of the research [50:45.60]Before suggesting us [50:46.60]Yeah I think [50:47.60]That's like [50:48.60]Very close to [50:49.60]What we care about [50:50.60]Our product values [50:51.60]Are systematic [50:52.60]Transparent and unbounded [50:53.60]And I think [50:54.60]You make research [50:55.60]Especially more systematic [50:56.60]And unbounded [50:57.60]And here's [50:58.60]The thing [50:59.60]That's at stake here [51:00.60]So for example [51:01.60]I was [51:02.60]Recently talking [51:03.60]To people in longevity [51:04.60]And I think [51:05.60]There isn't really [51:06.60]One field of longevity [51:07.60]There are kind of [51:08.60]Different [51:09.60]Scientific subdomains [51:10.60]That are surfacing [51:11.60]Various things [51:12.60]That are related [51:13.60]To longevity [51:14.60]And I think [51:14.60]If you could [51:15.60]More systematically [51:16.60]Say look [51:17.60]Here all the different [51:18.60]Interventions [51:19.60]We could do [51:20.60]And here's [51:21.60]The expected [51:22.60]RI of these experiments [51:23.60]Here's like [51:24.60]The evidence so far [51:25.60]That supports [51:26.60]So much more systematic [51:27.60]Than [51:28.60]Sciences today [51:29.60]I'd guess in like [51:30.60]10 20 years we'll look back [51:31.60]And it will be [51:32.60]Incredible how [51:33.60]Unsystematic science [51:34.60]Was back in the day [51:35.60]Our views kind of [51:36.60]Have models [51:37.60]Catch up to expert humans today [51:39.60]Start with kind of [51:40.60]Novice humans [51:41.60]And then increasingly [51:42.60]Expert humans [51:43.60]But we really want [51:44.60]The models to earn [51:45.60]Their right to the expertise [51:47.60]So that's why we do [51:48.60]Things in this very step-by-step way [51:49.60]That's why we don't [51:50.60]Just like throw a bunch of data [51:51.60]And apply a bunch of compute [51:52.60]And hope we get good results [51:54.60]But obviously at some point [51:55.60]It's kind of [51:56.60]Earned its stripes [51:57.60]It can surpass [51:58.60]Human researchers [51:59.60]But I think that's where [52:00.60]Making sure [52:01.60]That the models [52:02.60]Processes are really [52:03.60]Explicit and transparent [52:05.60]And that it's really [52:06.60]Easy to evaluate [52:07.60]Is important because [52:08.60]If it does surpass [52:09.60]Human understanding [52:10.60]People will still need [52:11.60]To be able to audit [52:12.60]It's work somehow [52:13.60]Or spot check [52:14.60]It's work somehow [52:15.60]To be able to reliably [52:16.60]Trust it and use it [52:17.60]So yeah [52:18.60]That's kind of why [52:19.60]The process-based approaches [52:20.60]Is really important [52:21.60]And on the question [52:22.60]Of will models [52:23.60]Do their own research [52:24.60]Teachers that models [52:25.60]Currently don't have [52:26.60]That will need [52:27.60]To be better there [52:28.60]Is better world models [52:30.60]I think currently models [52:31.60]Are just not great [52:32.60]At representing [52:33.60]What's going on [52:34.60]In a particular situation [52:35.60]Or domain in a way [52:36.60]That allows them to [52:37.60]Come to interesting [52:38.60]Surprising conclusions [52:40.60]I think they're very good [52:41.60]At coming to conclusions [52:42.60]That are nearby [52:43.60]To conclusions [52:44.60]That people have come to [52:45.60]Not as good [52:46.60]At kind of reasoning [52:47.60]And making [52:48.60]Surprising connections maybe [52:49.60]And so having [52:50.60]Deeper models of [52:52.60]What are the underlying [52:53.60]Domains [52:54.60]How are they related [52:55.60]Or not related [52:56.60]I think there will be [52:57.60]An important ingredient [52:58.60]From all to actually [52:59.60]Being able to make [53:00.60]Novel contributions [53:01.60]On the topic of [53:02.60]Hiring more expert humans [53:03.60]You've hired some [53:04.60]Very expert humans [53:05.60]My friend Maggie [53:06.60]Appleton joined you guys [53:07.60]I think maybe [53:08.60]A year ago-ish [53:09.60]In fact, I think [53:10.60]You're doing an offsite [53:11.60]And we're actually [53:12.60]Organizing our big [53:13.60]AI UX meetup around [53:14.60]Whenever she's [53:15.60]In town in San Francisco [53:16.60]How big is the team [53:17.60]How have you sort of [53:18.60]Transition your company [53:19.60]Into this sort of PBC [53:20.60]And sort of the plan [53:21.60]For the future [53:22.60]About half of us [53:23.60]Are in the Bay Area [53:24.60]And then distributed [53:25.60]Across US and Europe [53:26.60]A mix of mostly kind [53:28.60]Of roles in engineering [53:29.60]And product [53:30.60]And I think that [53:31.60]The transition to [53:32.60]PBC was really [53:33.60]Not that eventful [53:34.60]Because I think [53:35.60]We were already [53:36.60]Even as a nonprofit [53:37.60]We were already [53:38.60]Shipping every week [53:39.60]So very much [53:40.60]Operating as a product [53:41.60]And then I would say [53:43.60]The kind of PBC component [53:44.60]Was to very explicitly [53:46.60]Stay that we have [53:47.60]A mission that we care [53:48.60]A lot about [53:49.60]There are a lot of ways [53:50.60]To make money [53:51.60]We make us [53:52.60]A lot of money [53:53.60]But we are going [53:54.60]To be opinionated [53:55.60]About how we make money [53:56.60]We're going to take [53:57.60]The version of making [53:58.60]A lot of money [53:59.60]That's in line [54:00.60]With our mission [54:01.60]But it's like [54:02.60]All very convergent [54:03.60]Alicit is not going [54:04.60]To make any money [54:05.60]If it's a bad product [54:06.60]If it doesn't actually [54:07.60]Help you discover truth [54:08.60]And do research [54:09.60]More rigorously [54:10.60]So I think for us [54:11.60]The kind of mission [54:12.60]And the success [54:13.60]Of the company [54:14.60]Are very intertwined [54:15.60]We're hoping to grow [54:16.60]The team quite a lot [54:17.60]This year [54:18.60]Probably some of our [54:19.60]Highest priority roles [54:20.60]In marketing [54:21.60]Go to market [54:22.60]Do you want to talk [54:23.60]About their roles? [54:24.60]Yeah, broadly [54:25.60]We're just looking [54:26.60]For senior software engineers [54:27.60]And don't need [54:28.60]Any particular AI expertise [54:29.60]A lot of it is just [54:30.60]How do you [54:31.60]Build good orchestration [54:33.60]For complex tasks [54:34.60]So we talked earlier [54:35.60]About these notebooks [54:36.60]Scaling up [54:37.60]Task orchestration [54:38.60]And I think a lot [54:39.60]Of this looks more [54:40.60]Like traditional [54:41.60]Soft engineering [54:42.60]Than it does look [54:43.60]Like machine learning [54:44.60]Research and I think [54:45.60]The people who are [54:46.60]Like really good at [54:47.60]Building good abstractions [54:48.60]Building applications [54:49.60]We've survived [54:50.60]Even if some [54:51.60]Of their pieces break [54:52.60]Like making reliable [54:53.60]Components out of [54:54.60]Unreliable pieces [54:55.60]I think those are the [54:56.60]People we're looking for [54:57.60]You know that's exactly [54:58.60]What I used to do [54:59.60]Have you explored [55:00.60]The existing orchestration [55:01.60]Frameworks, Temporal, Airflow [55:03.60]Daxter, Prefects [55:05.60]We've looked into [55:06.60] Them a little bit [55:07.60]I think we have [55:08.60]Some specific requirements [55:09.60]Around being able [55:10.60]To stream work back [55:11.60]Very quickly [55:12.60]To our users [55:13.60]Those could definitely [55:14.60]Be relevant [55:15.60]Okay, well you're hiring [55:16.60]I'm sure we'll plug [55:17.60]All the links [55:18.60]And parting words [55:19.60]Any words of wisdom [55:20.60]Models you live by [55:22.60]I think it's a really important [55:23.60]Time for humanity [55:24.60]So I hope everyone [55:25.60]Listening to this podcast [55:27.60]Can think hard about exactly [55:29.60]How they want to [55:30.60]Participate in this story [55:31.60]There's so much to build [55:33.60]And we can be really [55:34.60]Intentional about what [55:35.60]We align ourselves with [55:36.60]There are a lot of applications [55:38.60]That are going to be really good [55:39.60]For the world [55:39.60]And a lot of applications [55:40.60]That are not [55:41.60]And so yeah [55:42.60]I hope people can [55:43.60]Take that seriously [55:44.60]And kind of seize the moment [55:45.60]Yeah, I love how intentional [55:46.60]You guys have been [55:47.60]Thank you for sharing [55:48.60]Thank you [55:49.60]Thank you for coming on [55:50.60](音乐) [55:52.60](音樂) [55:54.60](音樂) [55:56.60](音樂) [55:58.60](音樂) [56:00.60](音樂) [56:03.60](音樂) [56:06.60](音樂) [56:09.60](音樂) [56:11.60](音樂) [56:13.60](音樂) [56:15.60]中文字幕:J Chong [56:16.60]我只想要你和我一起去做一件事