2883 lines
89 KiB
Plaintext
2883 lines
89 KiB
Plaintext
[by:whisper.cpp]
|
|
[00:00.00](音乐)
|
|
[00:06.00]大家好 欢迎到Lit and Space Podcast
|
|
[00:08.40]这是Alessio 和CTO的计划人士 和我参加的计划人士
|
|
[00:11.80]我参加了麦克欧的计划 专门 邱小雅
|
|
[00:15.00]今天我们回到工作室了
|
|
[00:17.20]和Andreas 和 卢安 欢迎你
|
|
[00:20.20]谢谢 太好了 谢谢你
|
|
[00:22.40]我会介绍你分别的 但也希望你会更多学习
|
|
[00:27.40]So Andreas it looks like you started Alicit first and joined later
|
|
[00:32.40]That's right
|
|
[00:33.00]For all intents and purposes, the illicit and also the odd that existed before then were very different from what I started
|
|
[00:39.60]So I think it's like fair to say that you co-funded it
|
|
[00:42.60]Got it
|
|
[00:43.00]And Joanne you're a co-founder and COO of Alicit now
|
|
[00:46.20]Yeah that's right
|
|
[00:47.00]So there's a little bit of a history to this
|
|
[00:48.80]I'm not super aware of like the sort of journey
|
|
[00:51.80]I was aware of odd and illicit as sort of a non-profit type situation
|
|
[00:55.80]And recently you turned into like a public benefit corporation
|
|
[00:59.40]So yeah maybe if you want you could take us through that journey of finding the problem
|
|
[01:04.00]You know obviously you're working together now
|
|
[01:06.20]So like how do you get together to decide to leave your startup career to join him
|
|
[01:11.20]Yeah it's truly a very long journey
|
|
[01:12.80]I guess truly it kind of started in Germany when I was born
|
|
[01:17.20]So even as a kid I was always interested in AI
|
|
[01:20.00]Like I kind of went to the library
|
|
[01:21.40]There were books about how to write programs in QBasic
|
|
[01:24.20]And like some of them talked about how to implement chatbots
|
|
[01:27.20]And to be clear
|
|
[01:28.80]He grew up in like a tiny village on the outskirts of Munich called Dinkelscherbin
|
|
[01:33.20]Where it's like a very very idyllic German village
|
|
[01:36.20]Yeah important to the story
|
|
[01:38.40]So basically the main thing is I've kind of always been thinking about AI my entire life
|
|
[01:42.80]And been thinking about at some point this is going to be a huge deal
|
|
[01:46.00]It's going to be transformative
|
|
[01:47.00]How can I work on it
|
|
[01:48.20]And was thinking about it from when I was a teenager
|
|
[01:51.60]After high school did a year where I started a startup with the intention to become rich
|
|
[01:56.80]And then once I'm rich I can affect the trajectory of AI
|
|
[02:00.40]Did not become rich
|
|
[02:01.40]Decided to go back to college
|
|
[02:03.00]And study cognitive science there
|
|
[02:05.00]Which was like the closest thing I could find at the time to AI
|
|
[02:08.00]In the last year of college moved to the US to do a PhD at MIT
|
|
[02:12.60]Working on broadly kind of new programming languages for AI
|
|
[02:15.00]Because it kind of seemed like the existing languages were not great at expressing
|
|
[02:19.60]World models and learning world models during Bayesian inference
|
|
[02:22.60]Was obviously thinking about ultimately the goal is to actually build tools that help people reason more clearly
|
|
[02:27.60]Ask and answer better questions and make better decisions
|
|
[02:31.60]But for a long time it seemed like the technology to put reasoning in machines just wasn't there
|
|
[02:35.60]Initially at the end of my postdoc at Stanford was thinking about well what to do
|
|
[02:39.60]I think the standard path is you become an academic and do research
|
|
[02:43.60]But it's really hard to actually build interesting tools as an academic
|
|
[02:48.60]You can't really hire great engineers
|
|
[02:50.60]Everything is kind of on a paper-to-paper timeline
|
|
[02:53.60]And so I was like well maybe I should start a startup
|
|
[02:56.60]Pursuit that for a little bit
|
|
[02:57.60]But it seemed like it was too early because you could have tried to do an AI startup
|
|
[03:01.60]But probably would not have been this kind of AI startup we're seeing now
|
|
[03:05.60]So then decided to just start a non-profit research lab
|
|
[03:08.60]That's going to do research for a while until we better figure out how to do thinking in machines
|
|
[03:13.60]And that was odd
|
|
[03:14.60]And then over time it became clear how to actually build actual tools for reasoning
|
|
[03:19.60]Then only over time we developed a better way to
|
|
[03:23.60]I'll let you fill in some of the details here
|
|
[03:25.60]Yeah so I guess my story maybe starts around 2015
|
|
[03:29.60]I kind of wanted to be a founder for a long time
|
|
[03:31.60]And I wanted to work on an idea that stood the test of time for me
|
|
[03:34.60]Like an idea that stuck with me for a long time
|
|
[03:37.60]And starting in 2015
|
|
[03:38.60]Actually originally I became interested in AI based tools from the perspective of mental health
|
|
[03:43.60]So there are a bunch of people around me who are really struggling
|
|
[03:45.60]One really close friend in particular is really struggling with mental health
|
|
[03:48.60]And didn't have any support
|
|
[03:50.60]And it didn't feel like there was anything before kind of like getting hospitalized
|
|
[03:54.60]That could just help her
|
|
[03:56.60]And so luckily she came and stayed with me for a while
|
|
[03:58.60]And we were just able to talk through some things
|
|
[04:00.60]But it seemed like you know lots of people might not have that resource
|
|
[04:04.60]And something maybe AI enabled could be much more scalable
|
|
[04:07.60]I didn't feel ready to start a company then
|
|
[04:10.60]That's 2015
|
|
[04:11.60]And I also didn't feel like the technology was ready
|
|
[04:13.60]So then I went into fintech
|
|
[04:15.60]And like kind of learned how to do the tech thing
|
|
[04:17.60]And then in 2019
|
|
[04:18.60]I felt like it was time for me to just jump in
|
|
[04:21.60]And build something on my own
|
|
[04:22.60]I really wanted to create
|
|
[04:24.60]And at the time I looked around at tech
|
|
[04:26.60]And felt like not super inspired by the options
|
|
[04:28.60]I just I didn't want to have a tech career ladder
|
|
[04:31.60]Or like I didn't want to like climb the career ladder
|
|
[04:33.60]There are two kind of interesting technologies at the time
|
|
[04:35.60]There was AI and there was crypto
|
|
[04:37.60]And I was like well the AI people seemed like a little bit more nice
|
|
[04:41.60]And maybe like slightly more trustworthy
|
|
[04:44.60]Both super exciting
|
|
[04:45.60]But through my bed and on the AI side
|
|
[04:47.60]And then I got connected to Andreas
|
|
[04:49.60]And actually the way he was thinking about
|
|
[04:51.60]Pursuing the research agenda at AUT
|
|
[04:53.60]Was really compatible with what I had envisioned
|
|
[04:56.60]For an ideal AI product
|
|
[04:58.60]Something that helps kind of take down
|
|
[05:00.60]Really complex thinking
|
|
[05:01.60]Overwhelming thoughts
|
|
[05:02.60]And breaks it down into small pieces
|
|
[05:04.60]And then this kind of mission
|
|
[05:05.60]We need AI to help us figure out
|
|
[05:07.60]What we ought to do
|
|
[05:08.60]It was really inspiring, right?
|
|
[05:10.60]Yeah, because I think it was clear
|
|
[05:12.60]That we were building the most powerful
|
|
[05:14.60]Optimizer of our time
|
|
[05:16.60]But as a society
|
|
[05:17.60]We hadn't figured out
|
|
[05:18.60]How to direct that optimization potential
|
|
[05:21.60]And if you kind of direct tremendous
|
|
[05:23.60]Optimization potential at the wrong thing
|
|
[05:25.60]That's really disastrous
|
|
[05:26.60]So the goal of AUT was
|
|
[05:28.60]Make sure that if we build
|
|
[05:29.60]The most transformative technology of our lifetime
|
|
[05:31.60]It can be used for something really impactful
|
|
[05:34.60]And that's really good reasoning
|
|
[05:35.60]Like not just generating ads
|
|
[05:37.60]My background was in marketing
|
|
[05:38.60]But like so
|
|
[05:39.60]It's like I want to do
|
|
[05:40.60]More than generate ads with this
|
|
[05:42.60]And also if these AI systems
|
|
[05:44.60]Get to be super intelligent enough
|
|
[05:46.60]That they are doing this
|
|
[05:47.60]Really complex reasoning
|
|
[05:48.60]That we can trust them
|
|
[05:49.60]That they are aligned with us
|
|
[05:51.60]And we have ways of evaluating
|
|
[05:53.60]That they are doing the right thing
|
|
[05:54.60]So that's what AUT did
|
|
[05:55.60]We did a lot of experiments
|
|
[05:56.60]You know, like Andreas said
|
|
[05:57.60]Before foundation models
|
|
[05:59.60]Really like took off
|
|
[06:00.60]A lot of the issues we were seeing
|
|
[06:01.60]Were more in reinforcement learning
|
|
[06:03.60]But we saw a future
|
|
[06:04.60]Where AI would be able to do
|
|
[06:06.60]More kind of logical reasoning
|
|
[06:08.60]Not just kind of extrapolate
|
|
[06:09.60]From numerical trends
|
|
[06:10.60]We actually kind of
|
|
[06:11.60]Set up experiments with people
|
|
[06:13.60]Where kind of people stood in
|
|
[06:14.60]As super intelligent systems
|
|
[06:16.60]And we effectively gave them
|
|
[06:17.60]Context windows
|
|
[06:18.60]So they would have to
|
|
[06:19.60]Like read a bunch of text
|
|
[06:20.60]And one person would get less text
|
|
[06:23.60]And one person would get all the text
|
|
[06:24.60]And the person with less text
|
|
[06:26.60]Would have to evaluate the work
|
|
[06:28.60]Of the person who could read much more
|
|
[06:30.60]So like in the world
|
|
[06:31.60]We were basically simulating
|
|
[06:32.60]Like in, you know, 2018-2019
|
|
[06:34.60]A world where an AI system
|
|
[06:36.60]Could read significantly more than you
|
|
[06:38.60]And you as the person
|
|
[06:39.60]Who couldn't read that much
|
|
[06:40.60]Had to evaluate the work
|
|
[06:41.60]Of the AI system
|
|
[06:42.60]So there's a lot of the work we did
|
|
[06:44.60]And from that we kind of
|
|
[06:45.60]Iterated on the idea
|
|
[06:46.60]Of breaking complex tasks down
|
|
[06:47.60]Into smaller tasks
|
|
[06:48.60]Like complex tasks
|
|
[06:49.60]Like open-ended reasoning
|
|
[06:51.60]Logical reasoning
|
|
[06:52.60]Into smaller tasks
|
|
[06:53.60]So that it's easier
|
|
[06:54.60]To train AI systems on them
|
|
[06:55.60]And also so that it's easier
|
|
[06:57.60]To evaluate the work of the AI system
|
|
[06:59.60]When it's done
|
|
[07:00.60]And then also kind of
|
|
[07:01.60]We really pioneered this idea
|
|
[07:02.60]The importance of supervising
|
|
[07:03.60]The process of AI systems
|
|
[07:05.60]Not just the outcomes
|
|
[07:06.60]And so a big part
|
|
[07:07.60]Of how elicit is built
|
|
[07:08.60]Is we're very intentional
|
|
[07:10.60]About not just throwing
|
|
[07:11.60]A ton of data into a model
|
|
[07:13.60]And training it
|
|
[07:14.60]And then saying cool
|
|
[07:15.60]Here's like scientific output
|
|
[07:16.60]Like that's not at all
|
|
[07:17.60]What we do
|
|
[07:18.60]Our approach is very much
|
|
[07:19.60]Like what are the steps
|
|
[07:20.60]That an expert human does
|
|
[07:21.60]Or what is like an ideal process
|
|
[07:23.60]As granularly as possible
|
|
[07:25.60]Let's break that down
|
|
[07:26.60]And then train AI systems
|
|
[07:27.60]To perform each of those steps
|
|
[07:29.60]Very robustly
|
|
[07:30.60]When you train like that
|
|
[07:32.60]From the start
|
|
[07:33.60]After the fact
|
|
[07:34.60]It's much easier to evaluate
|
|
[07:35.60]It's much easier to troubleshoot
|
|
[07:36.60]At each point
|
|
[07:37.60]Like where did something break down
|
|
[07:38.60]So yeah
|
|
[07:39.60]We were working on those experiments
|
|
[07:40.60]For a while
|
|
[07:41.60]And then at the start of 2021
|
|
[07:43.60]Decided to build a product
|
|
[07:44.60]Do you mind if I
|
|
[07:45.60]Because I think you're about
|
|
[07:46.60]To go into more modern
|
|
[07:47.60]Hot and elicit
|
|
[07:49.60]And I just wanted to
|
|
[07:50.60]Because I think a lot of people
|
|
[07:51.60]Are in where you were
|
|
[07:53.60]Like sort of 2018-19
|
|
[07:55.60]Where you chose a partner
|
|
[07:57.60]To work with
|
|
[07:58.60]And you didn't know him
|
|
[07:59.60]Yeah yeah
|
|
[08:00.60]You were just kind of cold introduced
|
|
[08:01.60]Yep
|
|
[08:02.60]A lot of people are cold introduced
|
|
[08:03.60]I've been cold introduced
|
|
[08:04.60]To tons of people
|
|
[08:05.60]And I never work with them
|
|
[08:06.60]I assume you had a lot
|
|
[08:07.60]A lot of other options
|
|
[08:08.60]Like how do you advise
|
|
[08:09.60]People to make those choices
|
|
[08:10.60]We were not totally cold introduced
|
|
[08:12.60]So one of our closest friends
|
|
[08:13.60]Introduced us
|
|
[08:14.60]And then Andreas had written a lot
|
|
[08:16.60]On the website
|
|
[08:17.60]A lot of blog posts
|
|
[08:18.60]A lot of publications
|
|
[08:19.60]And I just read it
|
|
[08:20.60]And I was like, wow
|
|
[08:21.60]This sounds like my writing
|
|
[08:22.60]And even other people
|
|
[08:23.60]Some of my closest friends
|
|
[08:24.60]I asked for advice from
|
|
[08:25.60]They were like, oh
|
|
[08:26.60]This sounds like your writing
|
|
[08:28.60]But I think
|
|
[08:29.60]I also had some kind of
|
|
[08:30.60]Like things I was looking for
|
|
[08:31.60]I wanted someone
|
|
[08:32.60]With a complimentary skill set
|
|
[08:33.60]I want someone
|
|
[08:34.60]Who was very values aligned
|
|
[08:36.60]And yeah
|
|
[08:37.60]That was all a good fit
|
|
[08:38.60]We also did a pretty
|
|
[08:40.60]Lengthy mutual evaluation process
|
|
[08:42.60]Where we had a Google doc
|
|
[08:43.60]Where we had all kinds of questions
|
|
[08:45.60]For each other
|
|
[08:46.60]And I think it ended up being
|
|
[08:48.60]Round 50 pages or so
|
|
[08:49.60]Off like various questions
|
|
[08:51.60]Was it the YC list?
|
|
[08:53.60]There's some lists going around
|
|
[08:54.60]For co-founder questions
|
|
[08:55.60]No, we just made our own
|
|
[08:57.60]But I guess it's probably related
|
|
[08:59.60]And that you asked yourself
|
|
[09:00.60]What are the values you care about
|
|
[09:01.60]How would you approach
|
|
[09:02.60]Various decisions
|
|
[09:03.60]And things like that
|
|
[09:04.60]I shared like all of my past
|
|
[09:05.60]Performance reviews
|
|
[09:06.60]Yeah
|
|
[09:07.60]Yeah
|
|
[09:08.60]And he never had any
|
|
[09:09.60]No
|
|
[09:10.60]Yeah, sorry
|
|
[09:14.60]I just had to
|
|
[09:15.60]A lot of people are going through
|
|
[09:16.60]That phase
|
|
[09:17.60]And you kind of skipped over it
|
|
[09:18.60]I was like, no, no, no
|
|
[09:19.60]There's like an interesting story
|
|
[09:20.60]Yeah
|
|
[09:21.60]Before we jump into what it is
|
|
[09:22.60]It is today
|
|
[09:23.60]The history is a bit
|
|
[09:24.60]Cutter intuitive
|
|
[09:25.60]So you start
|
|
[09:26.60]Now, oh, if we had
|
|
[09:27.60]A super powerful model
|
|
[09:29.60]How we align it
|
|
[09:30.60]How we use it
|
|
[09:31.60]But then you were actually
|
|
[09:32.60]Like, well, let's just build
|
|
[09:33.60]The product so that people
|
|
[09:34.60]Can actually leverage it
|
|
[09:35.60]And I think there are
|
|
[09:36.60]A lot of folks today
|
|
[09:37.60]That are now back
|
|
[09:38.60]To where you were
|
|
[09:39.60]Maybe five years ago
|
|
[09:40.60]They're like, oh, what if
|
|
[09:41.60]This happens rather than
|
|
[09:42.60]Focusing on actually building
|
|
[09:43.60]Something useful with it
|
|
[09:45.60]What click for you
|
|
[09:46.60]To like move into a list
|
|
[09:47.60]And then we can cover
|
|
[09:48.60]That story too
|
|
[09:49.60]I think in many ways
|
|
[09:50.60]The approach is still the same
|
|
[09:51.60]Because the way we're
|
|
[09:52.60]Building a list is not
|
|
[09:54.60]Let's train a foundation model
|
|
[09:55.60]To do more stuff
|
|
[09:56.60]It's like
|
|
[09:57.60]Let's build a scaffolding
|
|
[09:58.60]Such that we can
|
|
[09:59.60]Deploy powerful models
|
|
[10:00.60]To good ends
|
|
[10:01.60]I think it's different
|
|
[10:02.60]Now in that
|
|
[10:03.60]We actually have
|
|
[10:04.60]Like some of the models to plug in
|
|
[10:05.60]But if in 2017
|
|
[10:06.60]We had had the models
|
|
[10:08.60]We could have run
|
|
[10:09.60]The same experiments
|
|
[10:10.60]We did run with humans
|
|
[10:11.60]Back then
|
|
[10:12.60]Just with models
|
|
[10:13.60]And so in many ways
|
|
[10:14.60]Our philosophy is always
|
|
[10:15.60]Let's think add to the future
|
|
[10:16.60]What models are going to exist
|
|
[10:17.60]In one, two years
|
|
[10:19.60]Or longer
|
|
[10:20.60]And how can we make it
|
|
[10:22.60]So that they can
|
|
[10:23.60]Actually be deployed
|
|
[10:24.60]In many transparent
|
|
[10:25.60]Controllable ways
|
|
[10:26.60]Yeah, I think
|
|
[10:27.60]Motivationally we both
|
|
[10:28.60]Are kind of
|
|
[10:29.60]Product people at heart
|
|
[10:30.60]The research was
|
|
[10:31.60]Really important
|
|
[10:32.60]And it didn't
|
|
[10:33.60]Make sense to build
|
|
[10:34.60]A product at that time
|
|
[10:35.60]But at the end of the day
|
|
[10:36.60]The thing that always
|
|
[10:37.60]Motivated us is
|
|
[10:38.60]Imagining a world
|
|
[10:39.60]Where high quality
|
|
[10:40.60]Reasoning is really abundant
|
|
[10:41.60]And AI is a technology
|
|
[10:43.60]That's going to get us there
|
|
[10:44.60]And there's a way
|
|
[10:45.60]To guide that technology
|
|
[10:46.60]With research
|
|
[10:47.60]But you can have
|
|
[10:48.60]A more direct effect
|
|
[10:49.60]Through product
|
|
[10:50.60]Because with research
|
|
[10:51.60]You publish the research
|
|
[10:52.60]And someone else
|
|
[10:53.60]Product felt
|
|
[10:54.60]Like a more direct path
|
|
[10:55.60]And we wanted to
|
|
[10:56.60]Concretely have an impact
|
|
[10:57.60]On people's lives
|
|
[10:58.60]Yeah, I think
|
|
[10:59.60]The kind of personally
|
|
[11:00.60]The motivation was
|
|
[11:01.60]We want to build
|
|
[11:02.60]For people
|
|
[11:03.60]Yep, and then
|
|
[11:04.60]Just to recap as well
|
|
[11:05.60]Like the models
|
|
[11:06.60]You're using back then were
|
|
[11:07.60]Like, I don't know
|
|
[11:08.60]With the like BERT type stuff
|
|
[11:10.60]Or T5 or
|
|
[11:12.60]I don't know what time frame
|
|
[11:13.60]We're talking about here
|
|
[11:14.60]I guess to be clear
|
|
[11:15.60]At the very beginning
|
|
[11:16.60]We had humans do the work
|
|
[11:18.60]And then I think
|
|
[11:19.60]The first models
|
|
[11:20.60]That kind of makes sense
|
|
[11:21.60]Or GPT-2
|
|
[11:22.60]And TNLG
|
|
[11:23.60]And early generative models
|
|
[11:25.60]We do
|
|
[11:26.60]We also use
|
|
[11:27.60]Like T5 based models
|
|
[11:28.60]Even now
|
|
[11:29.60]Started with GPT-2
|
|
[11:30.60]Yeah, cool
|
|
[11:31.60]I'm just kind of curious about
|
|
[11:32.60]Like how do you
|
|
[11:33.60]Start so early
|
|
[11:34.60]Like now it's obvious
|
|
[11:35.60]Where to start
|
|
[11:36.60]But back then it wasn't
|
|
[11:37.60]Yeah, I used to
|
|
[11:38.60]Nag Andreas a lot
|
|
[11:39.60]I was like
|
|
[11:40.60]Why are you
|
|
[11:41.60]Talking to this?
|
|
[11:42.60]I don't know
|
|
[11:43.60]I felt like
|
|
[11:44.60]GPT-2 is like
|
|
[11:45.60]Clearly can't do anything
|
|
[11:46.60]And I was like
|
|
[11:47.60]Andreas, you're wasting your time
|
|
[11:48.60]Like playing with this toy
|
|
[11:49.60]But yeah, it was right
|
|
[11:50.60]So what's the history
|
|
[11:51.60]Of what Elisit
|
|
[11:52.60]Actually does as a product
|
|
[11:53.60]You recently announced that
|
|
[11:55.60]After four months
|
|
[11:56.60]You get to a million of revenue
|
|
[11:57.60]Obviously a lot of people
|
|
[11:58.60]Use it, get a lot of value
|
|
[11:59.60]But it would
|
|
[12:00.60]Initially kind of like
|
|
[12:01.60]Structured data
|
|
[12:02.60]Instruction from papers
|
|
[12:03.60]Then you had
|
|
[12:04.60]Kind of like concept grouping
|
|
[12:05.60]And today it's maybe
|
|
[12:06.60]Like a more full stack
|
|
[12:07.60]Research enabler
|
|
[12:09.60]Kind of like paper
|
|
[12:10.60]Understand their platform
|
|
[12:11.60]What's the definitive definition
|
|
[12:13.60]Of what Elisit is
|
|
[12:14.60]And how did you get here
|
|
[12:15.60]Yeah, we say Elisit
|
|
[12:16.60]As an AI research assistant
|
|
[12:17.60]I think it will continue
|
|
[12:18.60]To evolve
|
|
[12:19.60]You know, we're so excited
|
|
[12:20.60]About building and research
|
|
[12:21.60]Because there's just so much space
|
|
[12:22.60]I think the current phase
|
|
[12:23.60]We're in right now
|
|
[12:24.60]We talk about it
|
|
[12:25.60]As really trying to make Elisit
|
|
[12:27.60]The best place to understand
|
|
[12:28.60]What is known
|
|
[12:29.60]So it's all a lot about like
|
|
[12:31.60]Literature summarization
|
|
[12:32.60]There's a ton of information
|
|
[12:33.60]That the world already knows
|
|
[12:34.60]It's really hard to navigate
|
|
[12:35.60]Hard to make it relevant
|
|
[12:37.60]So a lot of it is around
|
|
[12:38.60]Document discovery
|
|
[12:39.60]And processing and analysis
|
|
[12:41.60]I really kind of want to
|
|
[12:42.60]Import some of the incredible
|
|
[12:44.60]Productivity improvements
|
|
[12:45.60]We've seen in software engineering
|
|
[12:47.60]And data science
|
|
[12:48.60]And into research
|
|
[12:49.60]So it's like
|
|
[12:50.60]How can we make researchers
|
|
[12:51.60]Like data scientists of text
|
|
[12:53.60]That's why we're launching
|
|
[12:54.60]This new set of features
|
|
[12:55.60]Called notebooks
|
|
[12:56.60]It's very much inspired
|
|
[12:57.60]By computational notebooks
|
|
[12:58.60]Like Jupyter notebooks
|
|
[12:59.60]Deep note or colab
|
|
[13:01.60]Because they're so powerful
|
|
[13:02.60]And so flexible
|
|
[13:03.60]And ultimately
|
|
[13:04.60]When people are trying
|
|
[13:05.60]To get to an answer
|
|
[13:07.60]Or understand insight
|
|
[13:08.60]They're kind of like
|
|
[13:09.60]Manipulating evidence
|
|
[13:10.60]And information
|
|
[13:11.60]Today that's all packaged
|
|
[13:12.60]In PDFs
|
|
[13:13.60]Which are super brittle
|
|
[13:14.60]But with language models
|
|
[13:15.60]We can decompose
|
|
[13:16.60]These PDFs
|
|
[13:17.60]And then we can
|
|
[13:18.60]Interly claims
|
|
[13:19.60]And evidence
|
|
[13:20.60]And insights
|
|
[13:21.60]And then let researchers
|
|
[13:22.60]Mash them up together
|
|
[13:23.60]Remix them
|
|
[13:24.60]And analyze them together
|
|
[13:25.60]So yeah
|
|
[13:26.60]I would say quite simply
|
|
[13:27.60]Overall listed
|
|
[13:28.60]As an AI research assistant
|
|
[13:29.60]Right now we're focused
|
|
[13:30.60]On text based workflows
|
|
[13:32.60]But long term
|
|
[13:33.60]Really want to kind of
|
|
[13:34.60]Go further and further
|
|
[13:35.60]Into reasoning
|
|
[13:36.60]And decision making
|
|
[13:37.60]And when you say
|
|
[13:38.60]AI research assistant
|
|
[13:39.60]This is kind of
|
|
[13:40.60]Matter research
|
|
[13:41.60]So researchers
|
|
[13:42.60]Use a list
|
|
[13:43.60]As a research assistant
|
|
[13:44.60]It's not a generic
|
|
[13:45.60]You can research
|
|
[13:46.60]Or it could be
|
|
[13:47.60]But what are people
|
|
[13:48.60]Using it for today
|
|
[13:49.60]So specifically in science
|
|
[13:51.60]A lot of people use
|
|
[13:52.60]Human research assistants
|
|
[13:53.60]To do things
|
|
[13:54.60]You tell your grad student
|
|
[13:56.60]Here are a couple of papers
|
|
[13:57.60]Can you look at
|
|
[13:58.60]All of these
|
|
[13:59.60]See which of these
|
|
[14:00.60]Have kind of sufficiently
|
|
[14:01.60]Large populations
|
|
[14:02.60]And actually study
|
|
[14:03.60]The disease that
|
|
[14:04.60]I'm interested in
|
|
[14:05.60]And then write out
|
|
[14:06.60]Like what are the experiments
|
|
[14:07.60]They did
|
|
[14:08.60]What are the interventions
|
|
[14:09.60]They did
|
|
[14:10.60]What are the outcomes
|
|
[14:11.60]And kind of organize
|
|
[14:12.60]That for me
|
|
[14:13.60]And the first phase
|
|
[14:14.60]Of understanding
|
|
[14:15.60]This is on
|
|
[14:16.60]Automating that work flow
|
|
[14:17.60]Because a lot of that work
|
|
[14:18.60]Is pretty road work
|
|
[14:19.60]I think it's not
|
|
[14:20.60]The kind of thing
|
|
[14:21.60]That we need humans to do
|
|
[14:22.60]Language models can do it
|
|
[14:23.60]And then if
|
|
[14:24.60]Language models can do it
|
|
[14:25.60]That you can obviously
|
|
[14:26.60]Scale it up
|
|
[14:27.60]Much more than a grad student
|
|
[14:28.60]Or undergrad
|
|
[14:29.60]Research assistant
|
|
[14:30.60]Would be able to do
|
|
[14:31.60]Yeah the use cases
|
|
[14:32.60]Are pretty broad
|
|
[14:33.60]So we do have
|
|
[14:34.60]A very large
|
|
[14:35.60]Percent of our users
|
|
[14:36.60]Are just using it personally
|
|
[14:37.60]Or for a mix
|
|
[14:38.60]Of personal and professional
|
|
[14:39.60]Things
|
|
[14:40.60]People who care a lot
|
|
[14:41.60]About health
|
|
[14:42.60]Or biohacking
|
|
[14:43.60]Or parents
|
|
[14:44.60]Or disease
|
|
[14:45.60]Or want to understand
|
|
[14:46.60]The literature directly
|
|
[14:47.60]So there is an
|
|
[14:48.60]Individual consumer use
|
|
[14:49.60]Case
|
|
[14:50.60]We're most focused
|
|
[14:51.60]On the power users
|
|
[14:52.60]So that's where
|
|
[14:53.60]We're really excited
|
|
[14:54.60]To build
|
|
[14:55.60]So Lisit was
|
|
[14:56.60]Very much inspired
|
|
[14:57.60]By this work flow
|
|
[14:58.60]In literature
|
|
[14:59.60]Called systematic reviews
|
|
[15:00.60]Or meta analysis
|
|
[15:01.60]Which is basically
|
|
[15:02.60]The human state
|
|
[15:03.60]Of the art
|
|
[15:04.60]For summarizing
|
|
[15:05.60]Scientific literature
|
|
[15:06.60]It typically involves
|
|
[15:07.60]Like five people
|
|
[15:08.60]Working together
|
|
[15:09.60]For over a year
|
|
[15:10.60]And they kind of
|
|
[15:11.60]First start by trying
|
|
[15:12.60]To find the maximally
|
|
[15:13.60]First possible
|
|
[15:14.60]So it's like
|
|
[15:15.60]Ten thousand papers
|
|
[15:16.60]And they kind of
|
|
[15:17.60]Systematically narrow
|
|
[15:18.60]That down to like
|
|
[15:19.60]Hundreds or fifty
|
|
[15:20.60]Extract key details
|
|
[15:22.60]From every single paper
|
|
[15:23.60]Usually have two people
|
|
[15:24.60]Doing it
|
|
[15:25.60]Like a third person
|
|
[15:26.60]Reviewing it
|
|
[15:27.60]So it's like
|
|
[15:28.60]Incredibly laborious
|
|
[15:29.60]Time-consuming process
|
|
[15:30.60]But you see it
|
|
[15:31.60]In every single domain
|
|
[15:32.60]So in science
|
|
[15:33.60]In machine learning
|
|
[15:34.60]In policy
|
|
[15:35.60]Because it's so structured
|
|
[15:36.60]And designed to be reproducible
|
|
[15:37.60]It's really amenable
|
|
[15:38.60]To automation
|
|
[15:39.60]So it's kind of
|
|
[15:40.60]The workflow that we want
|
|
[15:41.60]To automate first
|
|
[15:42.60]It's accessible
|
|
[15:43.60]For any question
|
|
[15:44.60]And make
|
|
[15:45.60]You know kind of
|
|
[15:46.60]These really robust
|
|
[15:47.60]Living summaries of science
|
|
[15:48.60]So yeah
|
|
[15:48.60]It's one of the
|
|
[15:49.60]Workflows that we're
|
|
[15:50.60]Starting with
|
|
[15:51.60]Our previous guest
|
|
[15:52.60]Mike Conover
|
|
[15:53.60]He's building a new
|
|
[15:54.60]Company got BrightWave
|
|
[15:55.60]Which is an AI
|
|
[15:56.60]Research assistant
|
|
[15:57.60]For financial research
|
|
[15:58.60]How do you see
|
|
[15:59.60]The future of these tools
|
|
[16:00.60]Like does everything
|
|
[16:01.60]Converged
|
|
[16:02.60]Like a God researcher
|
|
[16:03.60]Assisted
|
|
[16:04.60]Or is every domain
|
|
[16:05.60]Gone to have its own thing
|
|
[16:06.60]I think that's a good
|
|
[16:07.60]And mostly open question
|
|
[16:09.60]I do think there are
|
|
[16:10.60]Some differences
|
|
[16:11.60]Data analysis
|
|
[16:12.60]And other research
|
|
[16:13.60]Is more high-level
|
|
[16:15.60]Cross-domain thinking
|
|
[16:16.60]And we definitely
|
|
[16:17.60]Want to contribute to
|
|
[16:18.60]The broad
|
|
[16:19.60]Generalist reasoning type
|
|
[16:20.60]Space like if
|
|
[16:21.60]Researchers are
|
|
[16:22.60]Making discoveries often
|
|
[16:23.60]It's like hey
|
|
[16:24.60]This thing in biology
|
|
[16:25.60]Is actually analogous to
|
|
[16:26.60]Like these equations
|
|
[16:27.60]In economics or something
|
|
[16:28.60]And that's just
|
|
[16:29.60]Fundamentally a thing
|
|
[16:30.60]That where you need
|
|
[16:31.60]To reason across domains
|
|
[16:32.60]At least within research
|
|
[16:33.60]I think there will be
|
|
[16:34.60]Like one best platform
|
|
[16:36.60]More or less
|
|
[16:37.60]For this type of
|
|
[16:38.60]Generalist research
|
|
[16:39.60]I think there may still be
|
|
[16:40.60]Tools like for genomics
|
|
[16:41.60]Like particular types
|
|
[16:42.60]Of modules
|
|
[16:43.60]Of genes
|
|
[16:44.60]And proteins
|
|
[16:45.60]And whatnot
|
|
[16:46.60]But for a lot of
|
|
[16:47.60]The kind of high-level reasoning
|
|
[16:48.60]That humans do
|
|
[16:49.60]I think that is
|
|
[16:50.60]A more open or type
|
|
[16:51.60]All thing
|
|
[16:52.60]I wanted to ask
|
|
[16:53.60]A little bit deeper about
|
|
[16:54.60]I guess the workflow
|
|
[16:55.60]That you mentioned
|
|
[16:56.60]I like that phrase
|
|
[16:57.60]I see that
|
|
[16:58.60]In your UI now
|
|
[16:59.60]But that's
|
|
[17:00.60]As it is today
|
|
[17:01.60]And I think you were
|
|
[17:02.60]About to tell us about
|
|
[17:03.60]How it was in 2021
|
|
[17:04.60]And how it maybe progressed
|
|
[17:05.60]How has this workflow
|
|
[17:06.60]Evolved over time
|
|
[17:07.60]So the very first
|
|
[17:08.60]Version of illicit
|
|
[17:09.60]In the research assistant
|
|
[17:10.60]It was a forecasting assistant
|
|
[17:12.60]So we set out
|
|
[17:13.60]And we were thinking about
|
|
[17:14.60]What are some of the most
|
|
[17:15.60]Impactful types of reasoning
|
|
[17:16.60]That if we could scale up
|
|
[17:17.60]AI would really transform
|
|
[17:18.60]The world
|
|
[17:19.60]And we actually started
|
|
[17:20.60]With literature review
|
|
[17:21.60]But we're like
|
|
[17:22.60]So many people are going to build
|
|
[17:23.60]Literature review tools
|
|
[17:24.60]So let's start there
|
|
[17:25.60]So then we focused
|
|
[17:26.60]On geopolitical forecasting
|
|
[17:27.60]So I don't know
|
|
[17:28.60]If you're familiar
|
|
[17:29.60]With like manifold or
|
|
[17:30.60]Manifold markets
|
|
[17:31.60]Yeah, that kind of stuff
|
|
[17:32.60]Before manifold
|
|
[17:33.60]Yeah, yeah
|
|
[17:34.60]I'm not predicting relationships
|
|
[17:35.60]We're predicting like
|
|
[17:36.60]Is China going to invade Taiwan?
|
|
[17:38.60]Yeah
|
|
[17:39.60]That's in a relationship
|
|
[17:40.60]Yeah, that's fair
|
|
[17:41.60]Yeah, it's true
|
|
[17:42.60]And then we worked
|
|
[17:43.60]On that for a while
|
|
[17:44.60]And then after GPT-3
|
|
[17:45.60] came out
|
|
[17:46.60]I think by that time
|
|
[17:47.60]We realized that
|
|
[17:48.60]Originally we were trying
|
|
[17:49.60]To help people convert
|
|
[17:50.60]Their beliefs into
|
|
[17:51.60]Probability distributions
|
|
[17:53.60]So take fuzzy beliefs
|
|
[17:54.60]But like model them
|
|
[17:55.60]More concretely
|
|
[17:56.60]And then after a few months
|
|
[17:57.60]Of iterating on that
|
|
[17:58.60]Just realize the thing
|
|
[17:59.60]That's blocking people
|
|
[18:00.60]From making
|
|
[18:01.60]Interesting predictions
|
|
[18:02.60]About important events
|
|
[18:03.60]In the world
|
|
[18:04.60]Is less kind of
|
|
[18:05.60]On the probabilistic side
|
|
[18:06.60]And much more
|
|
[18:07.60]Research side
|
|
[18:08.60]And so that kind
|
|
[18:09.60]Of combined with
|
|
[18:10.60]The very generalist
|
|
[18:11.60]Capabilities of GPT-3
|
|
[18:12.60]Prompted us to
|
|
[18:13.60]Make a more general
|
|
[18:14.60]Research assistant
|
|
[18:15.60]Then we spent
|
|
[18:16.60]A few months iterating
|
|
[18:17.60]On what even is
|
|
[18:18.60]A research assistant
|
|
[18:19.60]So we would embed
|
|
[18:20.60]With different researchers
|
|
[18:21.60]We built data labeling
|
|
[18:23.60]Workflows in the beginning
|
|
[18:24.60]Kind of right off the bat
|
|
[18:25.60]We built ways to find
|
|
[18:27.60]Experts in a field
|
|
[18:29.60]And like ways to ask
|
|
[18:30.60]Good research questions
|
|
[18:31.60]We just kind of
|
|
[18:32.60]Iterated through a lot
|
|
[18:33.60]Of workflows and no one else
|
|
[18:34.60]Was really building at this
|
|
[18:35.60]Time and it was like
|
|
[18:36.60]Let's do some prompt
|
|
[18:37.60]Engineering and see
|
|
[18:38.60]Like what is a task
|
|
[18:39.60]That is at the
|
|
[18:40.60]Intersection of what's
|
|
[18:41.60]Technologically capable
|
|
[18:42.60]And like important
|
|
[18:43.60]For researchers
|
|
[18:44.60]And we had like
|
|
[18:45.60]A very nondescript
|
|
[18:46.60]Landing page
|
|
[18:47.60]It said nothing
|
|
[18:48.60]But somehow people were
|
|
[18:49.60]Signing up and we had
|
|
[18:50.60]The sign of form
|
|
[18:51.60]That was like
|
|
[18:52.60]Why are you here
|
|
[18:53.60]And everyone was like
|
|
[18:54.60]I need help
|
|
[18:55.60]With literature review
|
|
[18:56.60]And we're like
|
|
[18:57.60]A literature review
|
|
[18:58.60]That sounds so hard
|
|
[18:59.60]I don't even know
|
|
[19:00.60]What that means
|
|
[19:01.60]We don't want to work on it
|
|
[19:02.60]But then eventually
|
|
[19:03.60]We're like
|
|
[19:04.60]Everyone is saying
|
|
[19:05.60]Yeah
|
|
[19:06.60]And we also kind of
|
|
[19:07.60]Personally knew literature
|
|
[19:08.60]Review was hard
|
|
[19:09.60]And if you look at the graphs
|
|
[19:10.60]For academic literature
|
|
[19:11.60]Being published every
|
|
[19:12.60]Single month you guys
|
|
[19:13.60]Know this in machine learning
|
|
[19:14.60]It's like up into the right
|
|
[19:15.60]Like superhuman amounts
|
|
[19:16.60]Of papers
|
|
[19:17.60]So we're like
|
|
[19:18.60]All right, let's just try it
|
|
[19:19.60]I was really nervous
|
|
[19:20.60]But Andres was like
|
|
[19:21.60]This is kind of like
|
|
[19:22.60]The right problem space
|
|
[19:23.60]To jump into
|
|
[19:24.60]Even if we don't
|
|
[19:25.60]Know what we're doing
|
|
[19:26.60]So my take was like
|
|
[19:27.60]Fine
|
|
[19:28.60]This feels really scary
|
|
[19:29.60]But let's just launch
|
|
[19:30.60]A feature every single week
|
|
[19:31.60]And double our user
|
|
[19:32.60]Numbers every month
|
|
[19:33.60]And if we can do that
|
|
[19:34.60]We will find something
|
|
[19:35.60]I was worried about like
|
|
[19:36.60]Getting lost
|
|
[19:37.60]In the kind of academic white
|
|
[19:38.60]Space
|
|
[19:39.60]So the very first version
|
|
[19:40.60]Was actually a weekend prototype
|
|
[19:41.60]That Andres made
|
|
[19:42.60]Do you want to explain
|
|
[19:43.60]How that worked
|
|
[19:44.60]I mostly remember
|
|
[19:45.60]That it was really bad
|
|
[19:47.60]So the thing I remember
|
|
[19:48.60]Is you entered a question
|
|
[19:50.60]And it would give you back
|
|
[19:51.60]A list of claims
|
|
[19:52.60]So your question could be
|
|
[19:53.60]I don't know
|
|
[19:54.60]How does creatine effect cognition
|
|
[19:56.60]And it would give you back
|
|
[19:57.60]Some claims
|
|
[19:58.60]That are to some extent
|
|
[19:59.60]Based on papers
|
|
[20:00.60]But they were often irrelevant
|
|
[20:02.60]The papers were often
|
|
[20:03.60]And so we ended up
|
|
[20:04.60]Soon just printing out
|
|
[20:05.60]A bunch of examples
|
|
[20:06.60]Of results
|
|
[20:07.60]And putting them up
|
|
[20:08.60]On the wall
|
|
[20:09.60]So that we would
|
|
[20:10.60]Kind of feel the constant
|
|
[20:11.60]Shame of having
|
|
[20:12.60]Such a bad product
|
|
[20:13.60]And would be incentivized
|
|
[20:14.60]To make it better
|
|
[20:15.60]And I think overtime
|
|
[20:16.60]It has gotten a lot better
|
|
[20:17.60]But I think
|
|
[20:18.60]The initial version
|
|
[20:19.60]Was like really very bad
|
|
[20:20.60]But it was basically
|
|
[20:21.60]Like a natural language
|
|
[20:22.60]Summary of an abstract
|
|
[20:23.60]Like kind of a one-sentence
|
|
[20:24.60]Summary
|
|
[20:25.60]And which we still have
|
|
[20:26.60]And then as we learned
|
|
[20:27.60]Kind of more about this
|
|
[20:28.60]Systematic review workflow
|
|
[20:29.60]We started expanding
|
|
[20:30.60]The capability so that
|
|
[20:31.60]You could extract a lot
|
|
[20:32.60]And more with that
|
|
[20:33.60]And were you using
|
|
[20:34.60]Like embeddings
|
|
[20:35.60]And cosine similarity
|
|
[20:36.60]That kind of stuff
|
|
[20:37.60]For retrieval
|
|
[20:38.60]Or was it keyword based
|
|
[20:39.60]Or
|
|
[20:40.60]I think the very first version
|
|
[20:42.60]Didn't even have
|
|
[20:43.60]It's own search engine
|
|
[20:44.60]I think the very first version
|
|
[20:45.60]Probably used
|
|
[20:46.60]The semantic school or API
|
|
[20:48.60]Or something similar
|
|
[20:49.60]And only later when we discovered
|
|
[20:51.60]That the API is not very semantic
|
|
[20:53.60]Then built our own search
|
|
[20:55.60]Search and that has helped a lot
|
|
[20:57.60]And then we're going to go into
|
|
[20:59.60]Like more recent products stuff
|
|
[21:01.60]But like you know
|
|
[21:02.60]I think you seem the more
|
|
[21:03.60]So to start up oriented
|
|
[21:04.60]Business person
|
|
[21:05.60]And you seem sort of more
|
|
[21:06.60]Ideologically like interested
|
|
[21:08.60]In research obviously
|
|
[21:09.60]Because of your PhD
|
|
[21:10.60]What kind of market sizing
|
|
[21:11.60]Were you guys thinking
|
|
[21:12.60]Right?
|
|
[21:13.60]Because you're here saying
|
|
[21:14.60]Like we have to double every month
|
|
[21:15.60]And I'm like
|
|
[21:16.60]I don't know how you make
|
|
[21:17.60]That conclusion from this
|
|
[21:19.60]Right?
|
|
[21:20.60]Especially also as a nonprofit
|
|
[21:21.60]At the time
|
|
[21:22.60]I mean market size wise
|
|
[21:23.60]I felt like in this space
|
|
[21:25.60]Where so much was changing
|
|
[21:27.60]And it was very unclear
|
|
[21:29.60]What of today was actually
|
|
[21:30.60]Will be true tomorrow
|
|
[21:31.60]We just like
|
|
[21:32.60]Really rested a lot
|
|
[21:33.60]On very very simple
|
|
[21:34.60]Fundamental principles
|
|
[21:35.60]Which is like
|
|
[21:36.60]If you can understand
|
|
[21:37.60]The truth that is
|
|
[21:38.60]Very economically beneficial
|
|
[21:40.60]Like valuable
|
|
[21:41.60]If you like know the truth
|
|
[21:42.60]On principle
|
|
[21:43.60]That's enough for you
|
|
[21:44.60]Research is the key to many
|
|
[21:45.60]Breakthroughs that are
|
|
[21:46.60]Very commercially valuable
|
|
[21:47.60]Because my version of it
|
|
[21:48.60]Is students are poor
|
|
[21:49.60]And they don't pay
|
|
[21:50.60]For anything
|
|
[21:51.60]Right?
|
|
[21:52.60]But that's obviously not true
|
|
[21:53.60]As you guys have found out
|
|
[21:54.60]But you had to have
|
|
[21:55.60]Some market insight
|
|
[21:56.60]For me to have believed that
|
|
[21:57.60]But you skipped that
|
|
[21:58.60]We did encounter
|
|
[21:59.60]Talking to vcs
|
|
[22:00.60]For our seed round
|
|
[22:01.60]A lot of vcs were like
|
|
[22:02.60]You know researchers
|
|
[22:03.60]They don't have any money
|
|
[22:04.60]Why don't you build
|
|
[22:05.60]Legal assistant
|
|
[22:07.60]I think in some
|
|
[22:09.60]Short-sighted way
|
|
[22:10.60]Maybe that's true
|
|
[22:11.60]But I think in the long run
|
|
[22:12.60]R&D is such a big space
|
|
[22:13.60]Of the economy
|
|
[22:14.60]I think if you can
|
|
[22:15.60]Substantially improve
|
|
[22:17.60]How quickly people find
|
|
[22:19.60]New discoveries
|
|
[22:20.60]Or avoid controlled trials
|
|
[22:22.60]That don't go anywhere
|
|
[22:23.60]I think that's just
|
|
[22:24.60]Huge amounts of money
|
|
[22:25.60]And there are a lot
|
|
[22:26.60]Of questions obviously
|
|
[22:27.60]About between here and there
|
|
[22:28.60]But I think as long as
|
|
[22:29.60]The fundamental principle is there
|
|
[22:31.60]We were okay with that
|
|
[22:32.60]And I guess we found
|
|
[22:33.60]Some investors who also were
|
|
[22:34.60]Yeah congrats
|
|
[22:35.60]I'm sure we can cover
|
|
[22:37.60]The sort of flip later
|
|
[22:39.60]I think you're about to start
|
|
[22:40.60]As on like GPT-3
|
|
[22:41.60]And how like that
|
|
[22:42.60]Changed things for you
|
|
[22:43.60]It's funny like I guess
|
|
[22:44.60]Every major GPT version
|
|
[22:45.60]You have like some big insight
|
|
[22:47.60]Yeah I mean
|
|
[22:49.60]What do you think
|
|
[22:50.60]I think it's a little bit
|
|
[22:52.60]Less true for us than for others
|
|
[22:54.60]Because we always believe
|
|
[22:55.60]That there will basically
|
|
[22:57.60]Human level machine work
|
|
[23:00.60]And so
|
|
[23:01.60]It is definitely true
|
|
[23:02.60]That in practice
|
|
[23:03.60]For your product
|
|
[23:04.60]As new models come out
|
|
[23:06.60]Your product starts working better
|
|
[23:07.60]You can add some features
|
|
[23:08.60]That you couldn't add before
|
|
[23:09.60]But I don't think
|
|
[23:11.60]We really ever had the
|
|
[23:13.60]Moment where we were like
|
|
[23:14.60]Oh wow
|
|
[23:15.60]That is super unanticipated
|
|
[23:17.60]We need to do something
|
|
[23:18.60]Entirely different now
|
|
[23:19.60]From what was on the roadmap
|
|
[23:21.60]I think GPT-3
|
|
[23:22.60]Was a big change
|
|
[23:23.60]Because it kind of said
|
|
[23:25.60]Oh now is the time
|
|
[23:26.60]To build these tools
|
|
[23:27.60]And then GPT-4
|
|
[23:28.60]Was maybe a little bit
|
|
[23:29.60]More of an extension
|
|
[23:30.60]Of GPT-3
|
|
[23:31.60]GPT-3 over GPT-2
|
|
[23:32.60]Was like qualitative level
|
|
[23:34.60]Shift
|
|
[23:35.60]Then GPT-4 was like
|
|
[23:36.60]Okay great
|
|
[23:37.60]Now it's like more accurate
|
|
[23:38.60]We're more accurate
|
|
[23:39.60]On these things
|
|
[23:40.60]We can answer harder questions
|
|
[23:41.60]But the shape of the product
|
|
[23:42.60]Had already taken place
|
|
[23:43.60]By that time
|
|
[23:44.60]I kind of want to ask you
|
|
[23:45.60]About this sort of pivot
|
|
[23:46.60]That you made
|
|
[23:47.60]But I guess that was just
|
|
[23:48.60]A way to sell
|
|
[23:49.60]What you were doing
|
|
[23:50.60]Which is you're adding
|
|
[23:51.60]Extra features on grouping
|
|
[23:52.60]My concepts
|
|
[23:53.60]The GPT-4 pivot
|
|
[23:54.60]Quote unquote pivot
|
|
[23:55.60]Yeah yeah
|
|
[23:56.60]Exactly
|
|
[23:57.60]Yeah yeah
|
|
[23:58.60]When we launched
|
|
[23:59.60]This workflow
|
|
[24:00.60]Now that GPT-4
|
|
[24:01.60]Was available
|
|
[24:02.60]Basically
|
|
[24:03.60]Elisa was at a place
|
|
[24:04.60]Where we have very tabular
|
|
[24:05.60]Interfaces
|
|
[24:06.60]So given a table of papers
|
|
[24:07.60]You can extract data
|
|
[24:08.60] Across all the tables
|
|
[24:09.60]But you kind of want
|
|
[24:10.60]To take the analysis
|
|
[24:11.60]A step further
|
|
[24:12.60]Sometimes what you'd care
|
|
[24:13.60]About is not having
|
|
[24:14.60]A list of papers
|
|
[24:15.60]But a list of arguments
|
|
[24:17.60]A list of effects
|
|
[24:18.60]A list of interventions
|
|
[24:19.60]A list of techniques
|
|
[24:20.60]And so that's
|
|
[24:21.60]One of the things we're
|
|
[24:22.60]Working on is now that
|
|
[24:23.60]You've extracted this information
|
|
[24:24.60]A way
|
|
[24:25.60]Can you pivot it
|
|
[24:26.60]Or group by
|
|
[24:27.60]Whatever the information
|
|
[24:28.60]That you extracted
|
|
[24:29.60]To have more insight
|
|
[24:30.60]First information
|
|
[24:31.60]Still supported
|
|
[24:32.60]By the academic literature
|
|
[24:33.60]Yeah
|
|
[24:34.60]There was a big revelation
|
|
[24:35.60]When I saw it
|
|
[24:36.60]Basically I think
|
|
[24:37.60]I'm very just impressed
|
|
[24:38.60]By how first principles
|
|
[24:39.60]Your ideas
|
|
[24:40.60]Around the workflow is
|
|
[24:42.60]And I think
|
|
[24:43.60]That's why
|
|
[24:44.60]You're not as reliant
|
|
[24:45.60]On like the LM
|
|
[24:46.60]Improving
|
|
[24:47.60]Because it's actually
|
|
[24:48.60]Just about improving
|
|
[24:49.60]The workflow
|
|
[24:50.60]That you will recommend
|
|
[24:51.60]To people
|
|
[24:52.60]Today we might call
|
|
[24:53.60]It's rely on
|
|
[24:54.60]This is the way
|
|
[24:55.60]That elicit
|
|
[24:56.60]Does research
|
|
[24:57.60]And this is
|
|
[24:58.60]What we think
|
|
[24:59.60]Is most effective
|
|
[25:00.60]Based on talking to our users
|
|
[25:01.60]The problem space
|
|
[25:02.60]Is still huge
|
|
[25:03.60]Like if it's
|
|
[25:04.60]Like this big
|
|
[25:05.60]We're all still operating
|
|
[25:06.60]At this tiny part
|
|
[25:07.60]Bit of it
|
|
[25:08.60]So you know
|
|
[25:09.60]I think about this a lot
|
|
[25:10.60]In the context of motes
|
|
[25:11.60]People are like
|
|
[25:12.60]Oh what's your mode
|
|
[25:13.60]What happens
|
|
[25:14.60]If GPT-5 comes out
|
|
[25:15.60]It's like if GPT-5 comes out
|
|
[25:16.60]There's still like
|
|
[25:17.60]All of this other space
|
|
[25:18.60]That we can go into
|
|
[25:19.60]And so I think being
|
|
[25:20.60]Really obsessed
|
|
[25:21.60]With the problem
|
|
[25:22.60]It's a robust
|
|
[25:23.60]And just kind of
|
|
[25:24.60]Directly incorporate
|
|
[25:25.60]Model improvements
|
|
[25:26.60]And they keep going
|
|
[25:27.60]And then I first encountered
|
|
[25:28.60]You guys with Charlie
|
|
[25:29.60]You can tell us
|
|
[25:30.60]About that project
|
|
[25:31.60]Basically yeah
|
|
[25:32.60]Like how much did cost
|
|
[25:34.60]Become a concern
|
|
[25:35.60]As you're working more
|
|
[25:36.60]And more with OpenAI
|
|
[25:37.60]How do you manage
|
|
[25:38.60]That relationship
|
|
[25:39.60]Let me talk about
|
|
[25:40.60]Who Charlie is
|
|
[25:41.60]You can talk about that
|
|
[25:42.60]Charlie is a special character
|
|
[25:43.60]So Charlie
|
|
[25:44.60]When we found him
|
|
[25:45.60]Was had just finished
|
|
[25:46.60]His freshman year
|
|
[25:47.60]At the University of Warwick
|
|
[25:48.60]I think he had heard
|
|
[25:49.60]About us on some discord
|
|
[25:50.60]And then he applied
|
|
[25:51.60]And then we just saw
|
|
[25:52.60]That he had done so many
|
|
[25:53.60]Incredible side projects
|
|
[25:54.60]And we were actually
|
|
[25:55.60]On a team retreat
|
|
[25:56.60]In Barcelona
|
|
[25:57.60]Visiting our head of engineering
|
|
[25:58.60]At that time
|
|
[25:59.60]And everyone was talking
|
|
[26:00.60]About this wonder kid
|
|
[26:01.60]They're like this kid
|
|
[26:02.60]And then on our take home
|
|
[26:03.60]Project he had done
|
|
[26:04.60]Like the best of anyone
|
|
[26:05.60]To that point
|
|
[26:06.60]And so people were
|
|
[26:07.60]Just like so excited
|
|
[26:08.60]To hire him
|
|
[26:09.60]So we hired him
|
|
[26:10.60]As an intern
|
|
[26:11.60]And then we're like Charlie
|
|
[26:12.60]What if you just dropped
|
|
[26:13.60]Out of school
|
|
[26:14.60]And so then we convinced
|
|
[26:15.60] him to take a year off
|
|
[26:16.60]And he's just
|
|
[26:17.60]Incredibly productive
|
|
[26:18.60]And I think the thing
|
|
[26:19.60]You're referring to
|
|
[26:20.60]He kind of launched
|
|
[26:21.60]Their constitutional AI paper
|
|
[26:23.60]And within a few days
|
|
[26:24.60]I think four days
|
|
[26:25.60]He had basically implemented
|
|
[26:26.60]That in production
|
|
[26:27.60]And then we had it
|
|
[26:28.60]In app a week or so after that
|
|
[26:30.60]And he has since kind of
|
|
[26:31.60]Contributed to major improvements
|
|
[26:33.60]Like cutting costs down
|
|
[26:34.60]To a tenth of what they were
|
|
[26:36.60]Really large scale
|
|
[26:37.60]But yeah, you can talk
|
|
[26:38.60]About the technical stuff
|
|
[26:39.60]Yeah, on the
|
|
[26:40.60]Constitutional AI project
|
|
[26:41.60]This was for abstract summarization
|
|
[26:43.60]Where in illicit
|
|
[26:44.60]If you run a query
|
|
[26:45.60]It'll return papers to you
|
|
[26:47.60]And then it will summarize
|
|
[26:48.60]Each paper
|
|
[26:49.60]The query for you
|
|
[26:50.60]On the fly
|
|
[26:51.60]And that's a really
|
|
[26:52.60]Important part of illicit
|
|
[26:53.60]Because illicit does it so much
|
|
[26:55.60]We run a few searches
|
|
[26:56.60]It'll have done it
|
|
[26:57.60]A few hundred times for you
|
|
[26:58.60]And so we cared a lot
|
|
[26:59.60]About this both
|
|
[27:00.60]Being like fast, cheap
|
|
[27:02.60]And also very low on hallucination
|
|
[27:04.60]I think if illicit
|
|
[27:05.60]Hollucinate something
|
|
[27:06.60]About the abstract
|
|
[27:07.60]That's really not good
|
|
[27:08.60]And so what Charlie did
|
|
[27:09.60]In that project was
|
|
[27:11.60]Created a constitution
|
|
[27:12.60]That expressed
|
|
[27:13.60]Where are the attributes
|
|
[27:14.60]Of a good summary
|
|
[27:15.60]Everything in the summary
|
|
[27:16.60]Is reflected in the actual abstract
|
|
[27:18.60]It was like
|
|
[27:19.60]Very concise
|
|
[27:20.60]Etc.
|
|
[27:21.60]And then
|
|
[27:22.60]Used RLHF
|
|
[27:24.60]With a model
|
|
[27:25.60]That was trained
|
|
[27:26.60]On the constitution
|
|
[27:27.60]To basically
|
|
[27:29.60]Find you a better
|
|
[27:30.60]Summarizer
|
|
[27:31.60]On an open source model
|
|
[27:32.60]Yeah, I think
|
|
[27:33.60]That might still be in use
|
|
[27:34.60]Yeah, yeah, definitely
|
|
[27:35.60]Yeah, I think
|
|
[27:36.60]At the time
|
|
[27:37.60]The models hadn't been
|
|
[27:38.60]Trained at all
|
|
[27:39.60]To be faithful to a text
|
|
[27:41.60]So they were just generating
|
|
[27:42.60]So then when you
|
|
[27:43.60]Ask them a question
|
|
[27:44.60]They tried too hard
|
|
[27:45.60]To answer the question
|
|
[27:46.60]And didn't try hard
|
|
[27:47.60]Answer the question
|
|
[27:48.60]Given the text
|
|
[27:49.60]Or answer what the text
|
|
[27:50.60] Said about the question
|
|
[27:51.60]So we had to
|
|
[27:52.60]Basically teach the models
|
|
[27:53.60]To do that specific task
|
|
[27:54.60]How do you monitor
|
|
[27:55.60]The ongoing performance
|
|
[27:57.60]Of your models
|
|
[27:58.60]Not to get
|
|
[27:59.60]To LLMopsy
|
|
[28:00.60]But you are one of the
|
|
[28:01.60]Larger more well-known
|
|
[28:02.60]Operations
|
|
[28:03.60]Doing NLP at scale
|
|
[28:04.60]I guess effectively
|
|
[28:06.60]Like you have to monitor
|
|
[28:07.60]These things and nobody
|
|
[28:08.60]Has a good answer
|
|
[28:09.60]That talks to you
|
|
[28:10.60]Yeah, I don't think
|
|
[28:11.60]We have a good answer yet
|
|
[28:12.60]I think the answers
|
|
[28:13.60]Are actually a little bit
|
|
[28:14.60]Clear on the
|
|
[28:15.60]Just kind of basic
|
|
[28:16.60]The business side
|
|
[28:17.60]Of where you can
|
|
[28:18.60]Import ideas
|
|
[28:19.60]From normal
|
|
[28:20.60]Soft engineering
|
|
[28:21.60]And normal kind
|
|
[28:22.60]Of DevOps
|
|
[28:23.60]You're like
|
|
[28:24.60]Well, you need to
|
|
[28:25.60]Monitor kind
|
|
[28:26.60]Of latencies
|
|
[28:27.60]And response times
|
|
[28:28.60]And optime and whatnot
|
|
[28:29.60]Performance is more
|
|
[28:30.60]Of hallucination rate
|
|
[28:31.60]And then things
|
|
[28:32.60]Like hallucination rate
|
|
[28:33.60]Where I think there
|
|
[28:34.60]The really
|
|
[28:35.60]Important thing
|
|
[28:36.60]Is training time
|
|
[28:37.60]So we care a lot
|
|
[28:38.60]About having
|
|
[28:39.60]Our own internal
|
|
[28:41.60]Benchmarks
|
|
[28:42.60]For model development
|
|
[28:44.60]That reflect
|
|
[28:45.60]So that we can
|
|
[28:46.60]Know ahead of time
|
|
[28:47.60]How well
|
|
[28:48.60]Is the model
|
|
[28:49.60]Gonna perform
|
|
[28:50.60]On different types
|
|
[28:51.60]Of tasks
|
|
[28:52.60]So the tasks being
|
|
[28:53.60]Summarization
|
|
[28:54.60]Question answering
|
|
[28:55.60]Given a paper
|
|
[28:56.60]Ranking
|
|
[28:57.60]And for each of those
|
|
[28:58.60]We wanna know
|
|
[28:59.60]What's the distribution
|
|
[29:00.60]Of things the model
|
|
[29:01.60]Is gonna see
|
|
[29:02.60]So that we can
|
|
[29:03.60]Have well-calibrated
|
|
[29:04.60]Predictions on
|
|
[29:05.60]How well the model
|
|
[29:06.60]Is gonna do in production
|
|
[29:07.60]And I think, yeah,
|
|
[29:08.60]There's like
|
|
[29:09.60]Some chance
|
|
[29:10.60]That there's distribution
|
|
[29:11.60]Shift and actually
|
|
[29:12.60]The things users enter
|
|
[29:13.60]Are gonna be different
|
|
[29:14.60]Trainings right
|
|
[29:15.60]And having
|
|
[29:16.60]Very high quality
|
|
[29:17.60]Well-vetted data
|
|
[29:18.60]Sets at training time
|
|
[29:19.60]I think we also
|
|
[29:20.60]End up effectively
|
|
[29:21.60]Monitoring by trying
|
|
[29:22.60]To evaluate new models
|
|
[29:23.60]As they come out
|
|
[29:24.60]And so that like
|
|
[29:25.60]Kind of prompts us
|
|
[29:26.60]To go through
|
|
[29:27.60]Our eval suite
|
|
[29:28.60]Every couple of months
|
|
[29:29.60]And so every time
|
|
[29:30.60]A new model comes out
|
|
[29:31.60]We have to see
|
|
[29:32.60]Like how is this performing
|
|
[29:33.60]Relative to production
|
|
[29:34.60]And what we currently have
|
|
[29:35.60]Yeah, I mean
|
|
[29:36.60]Since we're on this topic
|
|
[29:37.60]Any new models
|
|
[29:38.60]That really call
|
|
[29:39.60]Your eye this year
|
|
[29:40.60]Like cloud came out
|
|
[29:41.60]Yeah, I think cloud
|
|
[29:42.60]Is pretty pretty
|
|
[29:43.60]Like a good point
|
|
[29:44.60]On the kind of
|
|
[29:45.60]Predo frontier
|
|
[29:46.60]It's neither
|
|
[29:47.60]The cheapest model
|
|
[29:48.60]Nor is it
|
|
[29:49.60]The most accurate
|
|
[29:51.60]Most high quality model
|
|
[29:52.60]But it's just
|
|
[29:53.60]Like a really good tradeoff
|
|
[29:54.60]Between cost and accuracy
|
|
[29:56.60]You apparently
|
|
[29:57.60]Have to 10 shot it
|
|
[29:58.60]To make it good
|
|
[29:59.60]I tried using
|
|
[30:00.60]Aiku for summarization
|
|
[30:01.60]But zero shot
|
|
[30:02.60]Was not great
|
|
[30:03.60]Then they were like
|
|
[30:04.60]You know, it's a skill issue
|
|
[30:05.60]You have to try it harder
|
|
[30:06.60]Interesting
|
|
[30:07.60]I think GPT-4
|
|
[30:08.60]Unlocked tables for us
|
|
[30:10.60]Processing data from tables
|
|
[30:11.60]Which was huge
|
|
[30:12.60]GPT-4 vision
|
|
[30:13.60]Yeah
|
|
[30:14.60]Did you try for you
|
|
[30:15.60]I guess you can't try for you
|
|
[30:16.60]Because it's noncommercial
|
|
[30:17.60]That's the adept model
|
|
[30:18.60]Yeah, we haven't tried that one
|
|
[30:19.60]Yeah
|
|
[30:20.60]Yeah, but cloud is multimodal as well
|
|
[30:22.60]Yeah
|
|
[30:23.60]I think the interesting insight
|
|
[30:24.60]That we got from talking to David Luan
|
|
[30:25.60]Who is CEO of Adept
|
|
[30:26.60]Was that multimodality
|
|
[30:28.60]Has effectively two different flavors
|
|
[30:30.60]Like one is
|
|
[30:31.60]Rerecognize images from a camera
|
|
[30:33.60]In the outside natural world
|
|
[30:35.60]And actually the more important
|
|
[30:37.60]Multimodality for knowledge work
|
|
[30:38.60]Is screenshots
|
|
[30:39.60]And you know
|
|
[30:40.60]PDFs and charts and graphs
|
|
[30:42.60]So we need a new term
|
|
[30:43.60]For that kind of multimodality
|
|
[30:45.60]But is a claim
|
|
[30:46.60]That current models
|
|
[30:47.60]Are good at one or the other
|
|
[30:49.60]Yeah, they're over index
|
|
[30:50.60]Because of the history of computer vision
|
|
[30:51.60]Is coco, right?
|
|
[30:53.60]So now we're like
|
|
[30:54.60]Oh, actually, you know
|
|
[30:55.60]Screens are more important
|
|
[30:56.60]OCR handwriting
|
|
[30:58.60]You mentioned a lot of
|
|
[30:59.60]Closed model lab stuff
|
|
[31:01.60]And then you also have
|
|
[31:02.60]Like this open source model
|
|
[31:03.60]Find tuning stuff
|
|
[31:04.60]Like what is your workload
|
|
[31:05.60]Now between close and open
|
|
[31:06.60]It's a good question
|
|
[31:07.60]I think
|
|
[31:08.60]It's half and half
|
|
[31:09.60]Is that even a relevant question
|
|
[31:10.60]Or not
|
|
[31:11.60]This is a nonsensical question
|
|
[31:12.60]It depends a little bit on
|
|
[31:13.60]Like how you index
|
|
[31:14.60]Whether you index by
|
|
[31:15.60]Like computer cost
|
|
[31:16.60]The number of queries
|
|
[31:17.60]I'd say like
|
|
[31:18.60]In terms of number of queries
|
|
[31:19.60]Is maybe similar
|
|
[31:20.60]In terms of like costing computer
|
|
[31:22.60]I think the closed models
|
|
[31:23.60]Make up more of the budget
|
|
[31:25.60]Since the main cases
|
|
[31:26.60]Where you want to use closed models
|
|
[31:28.60]Are cases where
|
|
[31:29.60]They're just smarter
|
|
[31:31.60]Where there are no existing
|
|
[31:33.60]Open source models
|
|
[31:34.60]Are quite smart enough
|
|
[31:35.60]Yeah
|
|
[31:36.60]We have a lot of
|
|
[31:37.60]Interesting technical questions
|
|
[31:38.60]To go in
|
|
[31:39.60]But just to wrap
|
|
[31:40.60]The kind of like
|
|
[31:41.60]UX evolution
|
|
[31:42.60]Now you have the notebooks
|
|
[31:43.60]We talked a lot
|
|
[31:44.60]About how chatbots
|
|
[31:45.60]Are not the final frontier
|
|
[31:47.60]You know
|
|
[31:48.60]How did you decide
|
|
[31:49.60]To get into notebooks
|
|
[31:50.60]Which is a very iterative
|
|
[31:51.60]Kind of like interactive
|
|
[31:52.60]Interface
|
|
[31:53.60]And yeah
|
|
[31:54.60]Maybe learnings from that
|
|
[31:55.60]Yeah this is actually
|
|
[31:56.60]Our fourth time
|
|
[31:57.60]Trying to make this work
|
|
[31:59.60]I think the first time
|
|
[32:00.60]Was probably in early 2021
|
|
[32:03.60]I think because
|
|
[32:04.60]We've always been obsessed
|
|
[32:05.60]With this idea of task
|
|
[32:06.60]Decomposition
|
|
[32:07.60]And like branching
|
|
[32:08.60]We always wanted a tool
|
|
[32:10.60]That could be kind of
|
|
[32:11.60]Unbounded
|
|
[32:12.60]Where you could keep going
|
|
[32:13.60]Could do a lot of branching
|
|
[32:14.60]Where you could kind of apply
|
|
[32:15.60]Language model operations
|
|
[32:17.60]Or computations on other tasks
|
|
[32:19.60]So in 2021
|
|
[32:20.60]We had this thing called
|
|
[32:21.60]Composite tasks
|
|
[32:22.60]Where you could use GPT-3
|
|
[32:23.60]To brainstorm
|
|
[32:24.60]A bunch of research questions
|
|
[32:25.60]And then take
|
|
[32:26.60]Each research question
|
|
[32:27.60]And decompose those
|
|
[32:28.60]Further into subquestions
|
|
[32:30.60]This kind of again
|
|
[32:31.60]That like task decomposition
|
|
[32:32.60]Tree type thing
|
|
[32:33.60]Was always very exciting to us
|
|
[32:35.60]But that was like
|
|
[32:36.60]It was kind of overwhelming
|
|
[32:37.60]Then at the end of 22
|
|
[32:39.60]I think we tried again
|
|
[32:40.60]And at that point
|
|
[32:41.60]We were thinking
|
|
[32:42.60]Okay we've done a lot
|
|
[32:43.60]With this literature review thing
|
|
[32:44.60]We also want to start helping
|
|
[32:45.60]With kind of adjacent domains
|
|
[32:47.60]And different workflows
|
|
[32:48.60]Like we want to help more
|
|
[32:49.60]With machine learning
|
|
[32:50.60]What does that look like
|
|
[32:51.60]And as we were thinking
|
|
[32:52.60]About it we're like
|
|
[32:53.60]Well there are so many
|
|
[32:54.60]Research workflows
|
|
[32:55.60]How do we not just build
|
|
[32:56.60]Three new workflows
|
|
[32:57.60]Into elicit
|
|
[32:58.60]But make elicit
|
|
[32:59.60]Really generic
|
|
[33:00.60]To lots of workflows
|
|
[33:01.60]What is like a generic
|
|
[33:02.60]Composable system
|
|
[33:03.60]With nice abstractions
|
|
[33:04.60]That can like
|
|
[33:05.60]Scale to all these workflows
|
|
[33:06.60]So we like
|
|
[33:07.60]Iterated on that a bunch
|
|
[33:08.60]And like
|
|
[33:09.60]Didn't quite narrow
|
|
[33:10.60]The problem space enough
|
|
[33:11.60]Or like
|
|
[33:12.60]Get to what we wanted
|
|
[33:13.60]And then I think it was
|
|
[33:14.60]At the beginning of 2023
|
|
[33:16.60]We were like
|
|
[33:17.60]Wow computational notebooks
|
|
[33:18.60]Kind of enable this
|
|
[33:19.60]Where they have a lot
|
|
[33:20.60]Of flexibility
|
|
[33:21.60]But you know
|
|
[33:22.60]Kind of robust primitive
|
|
[33:23.60]Such that you can extend
|
|
[33:24.60]The workflow
|
|
[33:25.60]And it's not limited
|
|
[33:26.60]It's not like
|
|
[33:27.60]You ask a query
|
|
[33:28.60]You get an answer
|
|
[33:29.60]You're done
|
|
[33:30.60]You can just constantly
|
|
[33:31.60]Keep building on top of that
|
|
[33:32.60]And each little step
|
|
[33:33.60]Seems like a really good
|
|
[33:34.60]Work for the language model
|
|
[33:35.60]And also there was just
|
|
[33:36.60]Like really helpful
|
|
[33:37.60]To have a bit more
|
|
[33:38.60]Preexisting work to emulate
|
|
[33:40.60]Yeah, that's kind of
|
|
[33:41.60]How we ended up at
|
|
[33:42.60]Computational notebooks
|
|
[33:43.60]For elicit
|
|
[33:44.60]Maybe one thing
|
|
[33:45.60]That's worth making explicit
|
|
[33:46.60]Is the difference between
|
|
[33:47.60]Computational notebooks
|
|
[33:48.60]And chat because
|
|
[33:49.60]On the surface
|
|
[33:50.60]They seem pretty similar
|
|
[33:51.60]It's kind of this iterative
|
|
[33:52.60]Interaction where you add stuff
|
|
[33:53.60]In both cases
|
|
[33:54.60]You have a back and forth
|
|
[33:55.60]Between you enter stuff
|
|
[33:56.60]And then you get some output
|
|
[33:57.60]And then you enter stuff
|
|
[33:58.60]But the important difference
|
|
[33:59.60]In our minds is
|
|
[34:00.60]With notebooks
|
|
[34:01.60]You can define a process
|
|
[34:03.60]So in data science
|
|
[34:04.60]You know like
|
|
[34:05.60]Here's like my data analysis
|
|
[34:06.60]Process that takes in a CSV
|
|
[34:08.60]And then does some extraction
|
|
[34:09.60]And then generates a figure
|
|
[34:10.60]At the end
|
|
[34:11.60]And you can prototype it
|
|
[34:13.60]Using a small CSV
|
|
[34:14.60]And then you can run it
|
|
[34:15.60]Over a much larger CSV
|
|
[34:16.60]Later
|
|
[34:17.60]And similarly
|
|
[34:18.60]The vision for notebooks
|
|
[34:19.60]In our case
|
|
[34:20.60]Is to not make it this
|
|
[34:22.60]Like one of chat interaction
|
|
[34:23.60]But to allow you to then
|
|
[34:25.60]Say if you start
|
|
[34:27.60]And first you're like
|
|
[34:28.60]Okay, let me just
|
|
[34:29.60]Analyze a few papers
|
|
[34:30.60]And see do I get to
|
|
[34:31.60]The correct conclusions
|
|
[34:32.60]For those few papers
|
|
[34:33.60]Can I then later
|
|
[34:34.60]Go back and say
|
|
[34:35.60]Now let me run this
|
|
[34:36.60]Over 10,000 papers
|
|
[34:38.60]Now that I've debug
|
|
[34:39.60]The process
|
|
[34:40.60]Using a few papers
|
|
[34:41.60]And that's an interaction
|
|
[34:42.60]That doesn't fit
|
|
[34:43.60]Quite as well
|
|
[34:44.60]Into the chat framework
|
|
[34:45.60]Because that's more
|
|
[34:46.60]For kind of quick
|
|
[34:47.60]Back and forth
|
|
[34:48.60]Interaction
|
|
[34:49.60]Do you think in notebooks
|
|
[34:50.60]That's kind of like
|
|
[34:51.60]Structure, editable
|
|
[34:52.60]Chain of thought
|
|
[34:53.60]Basically step by step
|
|
[34:54.60]Like is that kind of
|
|
[34:55.60]Where you see this going
|
|
[34:56.60]And then are people
|
|
[34:57.60]Gonna reuse notebooks
|
|
[34:59.60]As like templates
|
|
[35:00.60]And maybe in traditional
|
|
[35:01.60]Notebooks
|
|
[35:02.60]As like cookbooks
|
|
[35:03.60]Right, you share a cookbook
|
|
[35:04.60]You can start from there
|
|
[35:05.60]Is that similar
|
|
[35:06.60]And illicit
|
|
[35:07.60]Yeah, that's exactly right
|
|
[35:08.60]So that's our hope
|
|
[35:09.60]That people will build templates
|
|
[35:10.60]Share them with other people
|
|
[35:12.60]I think chain of thought
|
|
[35:13.60]Is maybe still like
|
|
[35:14.60]Kind of one level
|
|
[35:15.60]Lower on the abstraction hierarchy
|
|
[35:17.60]Then we would think of notebooks
|
|
[35:19.60]I think we'll probably
|
|
[35:20.60]Want to think about
|
|
[35:21.60]More semantic pieces
|
|
[35:22.60]Like a building block
|
|
[35:23.60]Is more like a paper search
|
|
[35:25.60]Or an extraction
|
|
[35:26.60]Or a list of concepts
|
|
[35:28.60]And then the models
|
|
[35:30.60]And the reasoning
|
|
[35:31.60]Will probably often be
|
|
[35:32.60]One level down
|
|
[35:33.60]You always want to
|
|
[35:34.60]Be able to see it
|
|
[35:35.60]But you don't always
|
|
[35:36.60]Want it to be front and center
|
|
[35:37.60]Yeah, what's the difference
|
|
[35:38.60]Between a notebook
|
|
[35:39.60]And an agent
|
|
[35:40.60]Since everybody always
|
|
[35:41.60]Ask me what's an agent
|
|
[35:42.60]Like how do you think
|
|
[35:43.60]About where the line is
|
|
[35:45.60]Yeah, it's an interesting
|
|
[35:46.60]Question
|
|
[35:47.60]In the notebook world
|
|
[35:48.60]I would
|
|
[35:49.60]Generally think of
|
|
[35:50.60]The human as the agent
|
|
[35:51.60]In the first iteration
|
|
[35:52.60]So you have the notebook
|
|
[35:53.60]And the human kind of
|
|
[35:54.60]Adds little action steps
|
|
[35:56.60]And then the next point
|
|
[35:58.60]On this kind of progress
|
|
[35:59.60]Okay, now you can use
|
|
[36:00.60]Language models to predict
|
|
[36:01.60]Which action
|
|
[36:02.60]Would you take as a human
|
|
[36:03.60]And at some point
|
|
[36:04.60]You're probably going to
|
|
[36:05.60]Be very good at this
|
|
[36:06.60]You'll be like, okay
|
|
[36:07.60]In some cases, I can
|
|
[36:08.60]With 99.9% accuracy
|
|
[36:09.60]Predict what you do
|
|
[36:10.60]And then you might
|
|
[36:11.60]As well just execute it
|
|
[36:12.60]Like why wait for the human
|
|
[36:13.60]And eventually
|
|
[36:14.60]As you get better at this
|
|
[36:15.60]That will just look
|
|
[36:16.60]More and more like agents
|
|
[36:18.60]Taking actions
|
|
[36:19.60]As opposed to you
|
|
[36:20.60]Doing the thing
|
|
[36:21.60]I think templates
|
|
[36:22.60]Are a specific case of this
|
|
[36:23.60]Very like, okay, well
|
|
[36:24.60]There's just particular
|
|
[36:25.60]Sequences of actions
|
|
[36:26.60]That you often want to chunk
|
|
[36:27.60]And have available
|
|
[36:28.60]Just like in normal
|
|
[36:29.60]Programming
|
|
[36:30.60]And those
|
|
[36:31.60]You can view them as
|
|
[36:32.60]Action sequences of agents
|
|
[36:33.60]Or you can view them as
|
|
[36:34.60]More normal programming
|
|
[36:36.60]Language abstraction thing
|
|
[36:37.60]And I think those
|
|
[36:38.60]Are two valid views
|
|
[36:40.60]How do you see this
|
|
[36:41.60]Changes
|
|
[36:42.60]Like you said, the models
|
|
[36:43.60]Get better and you need
|
|
[36:44.60]Less and less human
|
|
[36:45.60]Actual interfacing
|
|
[36:47.60]With the model
|
|
[36:48.60]You just get the results
|
|
[36:49.60]Like how does the UX
|
|
[36:50.60]And the way people
|
|
[36:51.60]Perceive it change
|
|
[36:52.60]Yeah, I think this
|
|
[36:53.60] kind of interaction
|
|
[36:54.60]Paradimes for evaluation
|
|
[36:55.60]Is not really something
|
|
[36:56.60]The internet has encountered
|
|
[36:57.60]Yet because up to now
|
|
[36:58.60]The internet has all been
|
|
[36:59.60]About getting data
|
|
[37:00.60]And work from people
|
|
[37:02.60]So increasingly
|
|
[37:03.60]I really want kind of
|
|
[37:04.60]Evaluation both from
|
|
[37:05.60]An interface perspective
|
|
[37:06.60]And from like a
|
|
[37:07.60]Technical perspective
|
|
[37:08.60]Operation perspective
|
|
[37:09.60]To be a superpower
|
|
[37:10.60]For elicit because I think
|
|
[37:11.60]Over time models will do
|
|
[37:12.60]More and more of the work
|
|
[37:13.60]And people will have
|
|
[37:14.60]To do more and more
|
|
[37:15.60]Of the evaluation
|
|
[37:16.60]So I think yeah
|
|
[37:17.60]In terms of the interface
|
|
[37:18.60]Some of the things we have
|
|
[37:19.60]Today, you know
|
|
[37:20.60]For every kind of
|
|
[37:21.60]Language model generation
|
|
[37:22.60]There's some citation back
|
|
[37:23.60]And we kind of try to
|
|
[37:24.60]Highlight the ground truth
|
|
[37:25.60]In the paper
|
|
[37:26.60]To whatever elicit said
|
|
[37:27.60]And make it super easy
|
|
[37:28.60]So you can click on it
|
|
[37:29.60]And quickly see
|
|
[37:30.60]In context and validate
|
|
[37:31.60]Whether the text
|
|
[37:32.60]Actually supports
|
|
[37:33.60]The answer that elicit gave
|
|
[37:34.60]So I think we'd probably
|
|
[37:35.60]Want to scale things up
|
|
[37:37.60]Like that, like the ability
|
|
[37:38.60]To kind of spot check
|
|
[37:39.60]The models work super
|
|
[37:40.60]Quickly scale up
|
|
[37:41.60]Interfaces like that
|
|
[37:42.60]And who would spot check
|
|
[37:44.60]The user
|
|
[37:45.60]Yeah, to start
|
|
[37:46.60]It would be the user
|
|
[37:47.60]One of the other things
|
|
[37:48.60]We do is also kind of flag
|
|
[37:49.60]The models uncertainty
|
|
[37:50.60]So we have models report
|
|
[37:52.60]Out how confident are you
|
|
[37:53.60]That this was the
|
|
[37:54.60]Sample size of this study
|
|
[37:55.60]The model's not sure
|
|
[37:56.60]We throw a flag
|
|
[37:57.60]And so the user knows
|
|
[37:58.60]To prioritize checking that
|
|
[37:59.60]So again, we can kind of
|
|
[38:00.60]Scale that up
|
|
[38:01.60]So when the model's like
|
|
[38:02.60]Well, I searched this
|
|
[38:03.60]On Google, I'm not sure
|
|
[38:04.60]If that was the right thing
|
|
[38:05.60]I have an uncertainty flag
|
|
[38:06.60]And the user can go
|
|
[38:07.60]And be like, okay
|
|
[38:08.60]That was actually
|
|
[38:09.60]The right thing to do or not
|
|
[38:10.60]I've tried to do
|
|
[38:11.60]Uncertainty ratings
|
|
[38:12.60]From models
|
|
[38:13.60]I don't know
|
|
[38:14.60]If you have this live
|
|
[38:15.60]Because I just
|
|
[38:16.60]Didn't find them reliable
|
|
[38:17.60]Because they just elucidated
|
|
[38:18.60]Their own uncertainty
|
|
[38:19.60]I would love to
|
|
[38:20.60]Based on log probes
|
|
[38:22.60]Or something more
|
|
[38:23.60]Native within the model
|
|
[38:24.60]Better than generated
|
|
[38:25.60]But it sounds like
|
|
[38:27.60]The scale properly for you
|
|
[38:29.60]Yeah, we found it
|
|
[38:30.60]To be pretty calibrated
|
|
[38:31.60]Diverse on the model
|
|
[38:32.60]I think in some cases
|
|
[38:33.60]We also used
|
|
[38:34.60]To different models
|
|
[38:35.60]For the answer estimates
|
|
[38:36.60]Then for the question
|
|
[38:37.60]Answering
|
|
[38:38.60]So one model would say
|
|
[38:39.60]Here's my chain of thought
|
|
[38:40.60]Here's my answer
|
|
[38:41.60]And then a different
|
|
[38:42.60]Type of model
|
|
[38:43.60]Let's say the first model
|
|
[38:44.60]Is Lama
|
|
[38:45.60]And let's say the second
|
|
[38:46.60]Model is GP3.5
|
|
[38:47.60]And then the second model
|
|
[38:49.60]Just looks over
|
|
[38:50.60]The results and like
|
|
[38:51.60]Okay, how confident
|
|
[38:52.60]Are you in this
|
|
[38:53.60]And I think
|
|
[38:54.60]Sometimes using
|
|
[38:55.60]A different model
|
|
[38:56.60]Can be better than
|
|
[38:57.60]Using the same model
|
|
[38:58.60]Yeah, you know
|
|
[38:59.60]On topic of models
|
|
[39:00.60]Evaluating models
|
|
[39:01.60]Obviously you can
|
|
[39:02.60]Do that all day long
|
|
[39:03.60]Like what's your budget
|
|
[39:04.60]Like because
|
|
[39:05.60]Your queries
|
|
[39:06.60]Fan out a lot
|
|
[39:07.60]And then you have
|
|
[39:08.60]Models evaluating models
|
|
[39:09.60]One person typing
|
|
[39:10.60]In a question
|
|
[39:11.60]Can lead to
|
|
[39:12.60]A thousand calls
|
|
[39:13.60]It depends on the project
|
|
[39:14.60]So if the project
|
|
[39:15.60]Is basically
|
|
[39:16.60]A systematic review
|
|
[39:17.60]That otherwise
|
|
[39:18.60]Human research assistance
|
|
[39:19.60]Would do
|
|
[39:20.60]Then the project
|
|
[39:21.60]Is basically
|
|
[39:22.60]Can get quite large
|
|
[39:23.60]For those projects
|
|
[39:24.60]I don't know
|
|
[39:25.60]Let's say
|
|
[39:26.60]A hundred thousand dollars
|
|
[39:27.60]So in those cases
|
|
[39:28.60]You're happier
|
|
[39:29.60]To spend compute
|
|
[39:30.60]Then in the
|
|
[39:31.60]Can of shallow search case
|
|
[39:32.60]Where someone
|
|
[39:33.60]Just enters a question
|
|
[39:34.60]Because I don't know
|
|
[39:35.60]Maybe like it
|
|
[39:36.60]I heard about creatine
|
|
[39:37.60]What's it about
|
|
[39:38.60]Probably don't want
|
|
[39:39.60]To spend a lot of compute
|
|
[39:40.60]On that
|
|
[39:41.60]This sort of
|
|
[39:42.60]Being able to invest
|
|
[39:43.60]More or less compute
|
|
[39:44.60]Into getting
|
|
[39:45.60]More or less accurate answers
|
|
[39:46.60]I think one of the
|
|
[39:47.60]Core things we care about
|
|
[39:48.60]And that I think
|
|
[39:49.60]Is currently undervalued
|
|
[39:50.60]In the AI space
|
|
[39:51.60]You can't choose
|
|
[39:52.60]Which model you want
|
|
[39:53.60]And you can sometimes
|
|
[39:54.60]I don't know
|
|
[39:55.60]You'll tip it
|
|
[39:56.60]It'll try harder
|
|
[39:57.60]Or you can try various
|
|
[39:58.60]Things to get it to work harder
|
|
[40:00.60]But you don't have great
|
|
[40:01.60]Ways of converting
|
|
[40:02.60]Willingness to spend
|
|
[40:03.60]Into better answers
|
|
[40:04.60]And we really
|
|
[40:05.60]Want to build a product
|
|
[40:06.60]That has this sort of
|
|
[40:07.60]Unbounded flavor
|
|
[40:08.60]Where like if you care
|
|
[40:09.60]About it a lot
|
|
[40:10.60]You should be able to get
|
|
[40:11.60]Really high quality answers
|
|
[40:12.60]Really double-checked
|
|
[40:13.60]In every way
|
|
[40:14.60]And you have a
|
|
[40:15.60]Credit-based pricing
|
|
[40:16.60]So unlike most products
|
|
[40:17.60]It's not a fixed monthly
|
|
[40:19.60]Right exactly
|
|
[40:20.60]Some of the
|
|
[40:21.60]Higher costs are
|
|
[40:22.60]Teared
|
|
[40:23.60]So for most casual users
|
|
[40:25.60]They'll just get
|
|
[40:26.60]The abstract summary
|
|
[40:27.60]Which is kind of
|
|
[40:28.60]An open source model
|
|
[40:29.60]Then you can
|
|
[40:30.60]Add more columns
|
|
[40:31.60]Which have more
|
|
[40:32.60]Extractions
|
|
[40:33.60]And these uncertainty features
|
|
[40:34.60]And then you can also
|
|
[40:35.60]Add the same columns
|
|
[40:36.60]In high accuracy mode
|
|
[40:37.60]Which also parses the table
|
|
[40:38.60]So we kind of
|
|
[40:39.60]Stack the complexity
|
|
[40:40.60]And the cost
|
|
[40:41.60]You know the fun thing
|
|
[40:42.60]You can do with a credit system
|
|
[40:43.60]Which is data for data
|
|
[40:44.60]Basically you can
|
|
[40:45.60]Give people more credit
|
|
[40:46.60]If they give
|
|
[40:47.60]Data back to you
|
|
[40:48.60]I don't know
|
|
[40:49.60]You don't have money
|
|
[40:50.60]But you have time
|
|
[40:51.60]How do you exchange that
|
|
[40:53.60]It's a fair trade
|
|
[40:54.60]I think it's interesting
|
|
[40:55.60]We haven't quite operationized it
|
|
[40:56.60]And then you know
|
|
[40:57.60]There's been some kind of like
|
|
[40:58.60]Adverse selection
|
|
[40:59.60]Like you know for example
|
|
[41:00.60]It would be really valuable
|
|
[41:01.60]To get feedback on our model
|
|
[41:02.60]So maybe if you were willing
|
|
[41:03.60]To give more robust feedback
|
|
[41:04.60]On our results
|
|
[41:05.60]We could give you credits
|
|
[41:06.60]Or something like that
|
|
[41:07.60]But then there's kind of this
|
|
[41:08.60]Will people take it seriously
|
|
[41:09.60]And you want the good people
|
|
[41:10.60]Exactly
|
|
[41:11.60]Can you tell who are the good people
|
|
[41:12.60]Not right now
|
|
[41:13.60]But yeah maybe
|
|
[41:14.60]At the point where we can
|
|
[41:15.60]We can offer it
|
|
[41:16.60]We can offer it up to them
|
|
[41:17.60]The perplexity of questions asked
|
|
[41:18.60]If it's higher perplexity
|
|
[41:19.60]These are smarter people
|
|
[41:20.60]Yeah maybe
|
|
[41:21.60]And if you make a lot of typos
|
|
[41:22.60]In your queries
|
|
[41:23.60]You're not going to get off
|
|
[41:24.60]How does that change
|
|
[41:25.60]Negative social credit
|
|
[41:28.60]It's very topical right now
|
|
[41:29.60]To think about
|
|
[41:30.60]The threat of long context windows
|
|
[41:32.60]All these models
|
|
[41:34.60]We're talking about these days
|
|
[41:35.60]All like a million tokens plus
|
|
[41:36.60]Is that relevant for you
|
|
[41:38.60]Can you make use of that
|
|
[41:39.60]Is that just prohibitively expensive
|
|
[41:41.60]Because you're just paying
|
|
[41:42.60]For all those tokens
|
|
[41:43.60]Or you're just doing right
|
|
[41:44.60]It's definitely relevant
|
|
[41:45.60]And when we think about search
|
|
[41:46.60]As many people do
|
|
[41:47.60]We think about kind of
|
|
[41:48.60]A staged pipeline
|
|
[41:49.60]Of retrieval
|
|
[41:50.60]Where first you use
|
|
[41:51.60]Semitic search database
|
|
[41:53.60]With embeddings
|
|
[41:54.60]Get like the
|
|
[41:55.60]In our case maybe 400
|
|
[41:56.60]Or so most relevant papers
|
|
[41:57.60]And then
|
|
[41:58.60]You still need to rank those
|
|
[41:59.60]And I think at that point
|
|
[42:01.60]It becomes pretty interesting
|
|
[42:02.60]To use larger models
|
|
[42:04.60]So specifically in the past
|
|
[42:06.60]I think a lot of ranking
|
|
[42:07.60]Was kind of per item ranking
|
|
[42:09.60]Where you would score
|
|
[42:10.60]Each individual item
|
|
[42:11.60]Maybe using increasingly
|
|
[42:12.60]Expensive scoring methods
|
|
[42:13.60]And then rank based on the scores
|
|
[42:15.60]But I think list wise
|
|
[42:16.60]We ranking where
|
|
[42:17.60]You have a model
|
|
[42:18.60]That can see
|
|
[42:19.60]All the elements
|
|
[42:20.60]Is a lot more powerful
|
|
[42:21.60]Because often you can
|
|
[42:22.60]Only really tell
|
|
[42:23.60]How good a thing is
|
|
[42:24.60]In comparison to other things
|
|
[42:26.60]And what things should come first
|
|
[42:28.60]It really depends on
|
|
[42:29.60]Like well what other things
|
|
[42:30.60]Are available
|
|
[42:31.60]Maybe you even care about
|
|
[42:32.60]Diversity and your results
|
|
[42:33.60]You don't want to show
|
|
[42:34.60]Ten very similar papers
|
|
[42:35.60]As the first 10 results
|
|
[42:36.60]So I think along context models
|
|
[42:38.60]Are quite interesting there
|
|
[42:40.60]And especially for our case
|
|
[42:41.60]Where we care more about
|
|
[42:43.60]Power users who are perhaps
|
|
[42:45.60]A little bit more
|
|
[42:46.60]Welling to wait a little bit longer
|
|
[42:47.60]To get higher quality results
|
|
[42:48.60]Relative to people who just
|
|
[42:50.60]Quickly check out things
|
|
[42:51.60]Because why not
|
|
[42:52.60]I think being able to spend
|
|
[42:53.60]More on longer context
|
|
[42:54.60]Is quite valuable
|
|
[42:55.60]Yeah I think one thing
|
|
[42:56.60]The longer context models
|
|
[42:57.60]Changed for us
|
|
[42:58.60]Is maybe a focus from
|
|
[43:00.60]Breaking down tasks
|
|
[43:01.60]To breaking down the evaluation
|
|
[43:03.60]So before you know
|
|
[43:05.60]If we wanted to answer
|
|
[43:06.60]A question from the full text
|
|
[43:08.60]Of a paper
|
|
[43:09.60]We had to figure out
|
|
[43:10.60]How to chunk it and like
|
|
[43:11.60]Find the relevant chunk
|
|
[43:12.60]And then answer
|
|
[43:13.60]Based on that chunk
|
|
[43:14.60]Then you know
|
|
[43:15.60]Which chunk the model
|
|
[43:16.60]Used to answer the question
|
|
[43:17.60]So if you want to help
|
|
[43:18.60]The user to check it
|
|
[43:19.60]Yeah you can be like
|
|
[43:20.60]Well this was the chunk
|
|
[43:21.60]That the model got
|
|
[43:22.60]And now if you put the whole
|
|
[43:23.60]Text in the paper
|
|
[43:24.60]You have to kind of
|
|
[43:25.60]Find the chunk
|
|
[43:26.60]Like more retroactively
|
|
[43:27.60]Basically and so you need
|
|
[43:28.60]Kind of like a different
|
|
[43:29.60]Set of abilities
|
|
[43:30.60]And obviously like
|
|
[43:31.60]A different technology
|
|
[43:32.60]To figure out
|
|
[43:33.60]You still want to point
|
|
[43:34.60]The user to the supporting
|
|
[43:35.60]Quotes in the text
|
|
[43:36.60]But then the interaction
|
|
[43:37.60]Is a little different
|
|
[43:38.60]You like scan through
|
|
[43:39.60]And find some rouge score
|
|
[43:40.60]Yeah the floor
|
|
[43:41.60]I think there's an
|
|
[43:42.60]Interesting space of
|
|
[43:43.60]Almost research problems
|
|
[43:44.60]Here because
|
|
[43:45.60]You would ideally
|
|
[43:46.60]Make causal claims
|
|
[43:47.60]Like if this
|
|
[43:48.60]Hadn't been in the text
|
|
[43:49.60]The model wouldn't
|
|
[43:50.60]Have said this thing
|
|
[43:51.60]And maybe you can do
|
|
[43:52.60]Expensive approximations
|
|
[43:53.60]To that where like
|
|
[43:54.60]I don't know you just
|
|
[43:55.60]Throw a chunk of the paper
|
|
[43:56.60]And re-answer
|
|
[43:57.60]And see what happens
|
|
[43:58.60]But hopefully
|
|
[43:59.60]There are better
|
|
[44:00.60]Ways of doing that
|
|
[44:01.60]Where you just get
|
|
[44:03.60]That kind of counterfactual
|
|
[44:04.60]Information for free
|
|
[44:05.60]From the model
|
|
[44:06.60]Do you think at all
|
|
[44:07.60]About the cost of maintaining
|
|
[44:09.60]Reg versus just putting
|
|
[44:10.60]More tokens in the window
|
|
[44:12.60]I think in software
|
|
[44:13.60]Development a lot of
|
|
[44:14.60]Times people buy
|
|
[44:15.60]Developer productivity
|
|
[44:16.60]Things so that
|
|
[44:17.60]We don't have to worry
|
|
[44:18.60]About it context windows
|
|
[44:19.60]Kinda the same right
|
|
[44:20.60]You have to maintain
|
|
[44:21.60]Chunking and like
|
|
[44:22.60]Reg retrieval and like
|
|
[44:23.60]Re-ranking and all of this
|
|
[44:24.60] Versus I just shove
|
|
[44:25.60]Everything into the context
|
|
[44:26.60]And like it costs
|
|
[44:27.60]A little more
|
|
[44:28.60]But at least I don't
|
|
[44:29.60]Have to do all of that
|
|
[44:30.60]Is that something
|
|
[44:31.60]You thought about
|
|
[44:32.60]I think we still
|
|
[44:33.60]Like hit up against
|
|
[44:34.60]Context limits enough
|
|
[44:35.60]That it's not really
|
|
[44:36.60]Do we still want
|
|
[44:37.60]To keep this rag around
|
|
[44:38.60]It's like we do still
|
|
[44:39.60]Need it for the scale
|
|
[44:40.60]The worth we're doing
|
|
[44:41.60]I think there are
|
|
[44:42.60]Different kinds of
|
|
[44:43.60]Maintainability in
|
|
[44:44.60]One sense I think
|
|
[44:45.60]You write that
|
|
[44:46.60]Throw everything into
|
|
[44:47.60]The context window thing
|
|
[44:48.60]Is easier to maintain
|
|
[44:49.60]Because you just
|
|
[44:50.60]Can swap out a model
|
|
[44:52.60]In another sense
|
|
[44:53.60]If things go wrong
|
|
[44:54.60]It's harder to debug
|
|
[44:55.60]Like if you know
|
|
[44:56.60]Here's the process
|
|
[44:57.60]That we go through
|
|
[44:58.60]To go from
|
|
[45:00.60]200 million papers
|
|
[45:01.60]To an answer
|
|
[45:02.60]And there are like
|
|
[45:03.60]Little steps
|
|
[45:04.60]And you understand
|
|
[45:05.60]Okay this is the step
|
|
[45:06.60]That finds the relevant
|
|
[45:07.60]Paragraph or whatever
|
|
[45:08.60]Maybe you'll know
|
|
[45:09.60]Which step breaks
|
|
[45:10.60]If it's just like
|
|
[45:11.60]A new model
|
|
[45:12.60]Version came out
|
|
[45:13.60]And now it suddenly
|
|
[45:14.60]Doesn't find your needle
|
|
[45:15.60]In a haystack anymore
|
|
[45:16.60]Then you're like
|
|
[45:17.60]Okay what can you do
|
|
[45:18.60]You're kind of at a loss
|
|
[45:20.60]Yeah let's talk
|
|
[45:21.60]A bit about needle
|
|
[45:22.60]In a haystack
|
|
[45:23.60]And like maybe
|
|
[45:24.60]The opposite of it
|
|
[45:25.60]Which is like hard
|
|
[45:26.60]Grounding I don't know
|
|
[45:27.60]That's like the best thing
|
|
[45:28.60]To think about it
|
|
[45:29.60]But I was using
|
|
[45:30.60]One of these
|
|
[45:31.60]Chavicher documents
|
|
[45:32.60]Features
|
|
[45:33.60]And I put the
|
|
[45:34.60]AMD MI300
|
|
[45:35.60]Spacks and the
|
|
[45:36.60]Blackwell chips
|
|
[45:37.60]From NVIDIA
|
|
[45:38.60]And I was asking questions
|
|
[45:39.60]And we like
|
|
[45:40.60]And the response was like
|
|
[45:41.60]Oh it doesn't say
|
|
[45:42.60]In the specs
|
|
[45:43.60]But if you ask
|
|
[45:44.60]GbD4 without the docs
|
|
[45:45.60]It would tell you no
|
|
[45:46.60]Because nvlink
|
|
[45:47.60]It's an NVIDIA
|
|
[45:48.60]It's technology
|
|
[45:49.60]Just as your N.V.
|
|
[45:50.60]Yeah hey man
|
|
[45:51.60]It just says in the thing
|
|
[45:52.60]How do you think about
|
|
[45:53.60]That having the context
|
|
[45:54.60]Sometimes suppress
|
|
[45:55.60]The knowledge
|
|
[45:56.60]That the model has
|
|
[45:57.60]It really depends on the task
|
|
[45:58.60]Because I think
|
|
[45:59.60]Sometimes that is
|
|
[46:00.60]Exactly what you want
|
|
[46:01.60]So imagine your researcher
|
|
[46:02.60]You're writing the background
|
|
[46:03.60]Section of your paper
|
|
[46:04.60]And you're trying to describe
|
|
[46:05.60]What these other papers say
|
|
[46:06.60]You really don't want
|
|
[46:07.60]Extra information
|
|
[46:08.60]To be introduced there
|
|
[46:09.60]In other cases
|
|
[46:10.60]Where you're just trying
|
|
[46:11.60]To figure out the truth
|
|
[46:12.60]And you're giving
|
|
[46:13.60]The documents because
|
|
[46:14.60]You think they will help
|
|
[46:15.60]The model figure out
|
|
[46:16.60]What the truth is
|
|
[46:17.60]I think you do want
|
|
[46:18.60]If the model has a hunch
|
|
[46:19.60]That there might be
|
|
[46:21.60]Something that's not
|
|
[46:22.60]In the papers
|
|
[46:23.60]You do want to surface that
|
|
[46:24.60]I think ideally
|
|
[46:25.60]You still don't want
|
|
[46:26.60]The model to just tell you
|
|
[46:27.60]I think probably
|
|
[46:28.60]The ideal thing
|
|
[46:29.60]Looks a bit more like
|
|
[46:30.60]Agent control
|
|
[46:31.60]Where the model can issue
|
|
[46:33.60]A query that then
|
|
[46:35.60]Is intended to surface
|
|
[46:36.60]The documents that
|
|
[46:37.60]Substantiate its hunch
|
|
[46:38.60]That's maybe
|
|
[46:39.60]A reasonable middle ground
|
|
[46:40.60]Between
|
|
[46:41.60]While just telling you
|
|
[46:42.60]And while being fully
|
|
[46:43.60]Limited to the papers
|
|
[46:44.60]You give it
|
|
[46:45.60]Yeah, I would say
|
|
[46:46.60]They're just kind of
|
|
[46:47.60]Different tasks right now
|
|
[46:48.60]And the tasks that
|
|
[46:49.60]Elicit is mostly focused on
|
|
[46:50.60]Is what do these papers say
|
|
[46:51.60]But there is another task
|
|
[46:52.60]Which is like
|
|
[46:53.60]Just give me the best
|
|
[46:54.60]Possible answer
|
|
[46:55.60]And that give me
|
|
[46:56.60]The best possible answer
|
|
[46:57.60]Sometimes depends
|
|
[46:58.60]On what do these papers say
|
|
[46:59.60]But it can also depend
|
|
[47:00.60]On other stuff
|
|
[47:01.60]That's not in the papers
|
|
[47:02.60]So ideally
|
|
[47:03.60]We can do both
|
|
[47:04.60]And then kind of
|
|
[47:05.60]We can ask
|
|
[47:06.60]For you
|
|
[47:07.60]More going forward
|
|
[47:08.60]We have
|
|
[47:09.60]See a lot of details
|
|
[47:10.60]But just to zoom
|
|
[47:11.60]Back out a little bit
|
|
[47:12.60]What are maybe
|
|
[47:13.60]The most underrated
|
|
[47:14.60]Features of elicit
|
|
[47:16.60]And what is
|
|
[47:17.60]One thing that
|
|
[47:18.60]Maybe the users
|
|
[47:19.60]Surprise you the most
|
|
[47:20.60]By using it
|
|
[47:21.60]I think the most
|
|
[47:22.60]Powerful feature of elicit
|
|
[47:23.60]Is the ability to
|
|
[47:24.60]Extract
|
|
[47:25.60]Add columns to this table
|
|
[47:26.60]Which effectively
|
|
[47:27.60]Extracts data
|
|
[47:28.60]From all of your
|
|
[47:29.60]Papers at once
|
|
[47:30.60]It's well used
|
|
[47:31.60]But there are
|
|
[47:32.60]Kind of many different
|
|
[47:33.60]Extensions of that
|
|
[47:34.60]We let you
|
|
[47:35.60]Give a description
|
|
[47:36.60]Of the column
|
|
[47:37.60]We let you give instructions
|
|
[47:38.60]Of a column
|
|
[47:39.60]We let you create custom
|
|
[47:40.60]Column
|
|
[47:41.60]So we have like 30
|
|
[47:42.60]Plus predefined fields
|
|
[47:43.60]That users can extract
|
|
[47:44.60]Like what were the methods
|
|
[47:45.60]What were the main findings
|
|
[47:46.60]How many people were studied
|
|
[47:48.60]And we actually show
|
|
[47:49.60]You basically the prompts
|
|
[47:50.60]That we're using to
|
|
[47:51.60]Extract that from
|
|
[47:52.60]Our predefined fields
|
|
[47:53.60]And then you can fork this
|
|
[47:54.60]And you can say
|
|
[47:55.60]Oh, actually I don't care
|
|
[47:56.60]About the population of people
|
|
[47:57.60]I only care about
|
|
[47:58.60]The population of rats
|
|
[47:59.60]Like you can change
|
|
[48:00.60]The instructions
|
|
[48:01.60]So I think users
|
|
[48:02.60]Are still kind of discovering
|
|
[48:03.60]This predefined
|
|
[48:04.60]Easy to use default
|
|
[48:06.60]But that they can extend it
|
|
[48:07.60]To be much more
|
|
[48:08.60]Specific to them
|
|
[48:09.60]And then they can also ask
|
|
[48:10.60]Custom questions
|
|
[48:11.60]One use case of that
|
|
[48:12.60]Is you can start to
|
|
[48:13.60]Create different column types
|
|
[48:14.60]That you might not expect
|
|
[48:15.60]So rather than just
|
|
[48:16.60]Creating generative answers
|
|
[48:17.60]Like a description
|
|
[48:18.60]Of the methodology
|
|
[48:19.60]You can say
|
|
[48:20.60]Classify the methodology
|
|
[48:22.60]Into a prospective study
|
|
[48:23.60]A retrospective study
|
|
[48:24.60]Or a case study
|
|
[48:26.60]And then you can filter
|
|
[48:27.60]Based on that
|
|
[48:28.60]It's like all using
|
|
[48:29.60]The same kind of technology
|
|
[48:30.60]And the interface
|
|
[48:31.60]But it unlocks
|
|
[48:32.60]So I think that
|
|
[48:33.60]The ability to ask
|
|
[48:34.60]Custom questions
|
|
[48:35.60]Give instructions
|
|
[48:36.60]And specifically use
|
|
[48:37.60]That to create different
|
|
[48:38.60]Types of columns
|
|
[48:39.60]Like classification columns
|
|
[48:41.60]Is still pretty underrated
|
|
[48:42.60]In terms of use case
|
|
[48:44.60]I spoke to someone
|
|
[48:45.60]Who works in medical affairs
|
|
[48:47.60]At a genomic sequencing
|
|
[48:48.60]Company recently
|
|
[48:49.60]So you know
|
|
[48:50.60]The doctors kind of order
|
|
[48:52.60]These genomic tests
|
|
[48:53.60]These sequencing tests
|
|
[48:54.60]To kind of identify
|
|
[48:55.60]If a patient has
|
|
[48:56.60]A particular disease
|
|
[48:57.60]This company helps
|
|
[48:58.60]And process it
|
|
[48:59.60]And this person
|
|
[49:00.60]Basically interacts
|
|
[49:01.60]With all the doctors
|
|
[49:02.60]And if the doctors
|
|
[49:03.60]Have any questions
|
|
[49:04.60]My understanding is that
|
|
[49:05.60]Medical affairs
|
|
[49:06.60]Is kind of like customer
|
|
[49:07.60]Support or customer success
|
|
[49:08.60]In pharma
|
|
[49:09.60]So this person
|
|
[49:10.60]Talks to doctors all day long
|
|
[49:11.60]And one of the things
|
|
[49:12.60]They started using elicit for
|
|
[49:13.60]Is like putting the results
|
|
[49:14.60]Of their tests
|
|
[49:15.60]As a query
|
|
[49:17.60]Like this test showed
|
|
[49:18.60]You know this percentage
|
|
[49:19.60]Presence of this
|
|
[49:20.60]And 40% that
|
|
[49:21.60]And whatever
|
|
[49:22.60]You know what genes are present
|
|
[49:23.60]Here or within this sample
|
|
[49:25.60]And getting kind of
|
|
[49:26.60]A list of academic papers
|
|
[49:27.60]That would support their findings
|
|
[49:29.60]And using this to help
|
|
[49:30.60]The doctors
|
|
[49:31.60]Interpret their tests
|
|
[49:32.60]So we talked about
|
|
[49:33.60]Okay cool
|
|
[49:34.60]Like if we built
|
|
[49:35.60]He's pretty interested
|
|
[49:36.60]In kind of doing a survey
|
|
[49:37.60]Of infectious disease
|
|
[49:38.60]Specialists
|
|
[49:39.60]And getting them
|
|
[49:40.60]To evaluate
|
|
[49:41.60]You know having them
|
|
[49:42.60]Right up their answers
|
|
[49:43.60]Comparing it to elicit
|
|
[49:44.60]Answers trying to see
|
|
[49:45.60]Can elicit start being
|
|
[49:46.60]Used to interpret
|
|
[49:47.60]The results of
|
|
[49:48.60]These diagnostic tests
|
|
[49:49.60]Because the way
|
|
[49:50.60]They ship these tests
|
|
[49:51.60]To doctors
|
|
[49:52.60]Is they report
|
|
[49:53.60]On a really wide
|
|
[49:54.60]Array of things
|
|
[49:55.60]He was saying
|
|
[49:56.60]That at a large
|
|
[49:57.60]Well resourced hospital
|
|
[49:58.60]Like a city hospital
|
|
[49:59.60]There might be
|
|
[50:00.60]A team of infectious disease
|
|
[50:01.60]Specialists who can
|
|
[50:02.60]Help interpret
|
|
[50:03.60]These results
|
|
[50:04.60]But at underresourced
|
|
[50:05.60]Hospitals or more
|
|
[50:06.60]Rural hospitals
|
|
[50:07.60]The primary care physician
|
|
[50:08.60]Can't interpret
|
|
[50:09.60]The test results
|
|
[50:10.60]So then they can't order
|
|
[50:11.60]They can't use it
|
|
[50:12.60]They can't help
|
|
[50:13.60]The patients with it
|
|
[50:14.60]So thinking about
|
|
[50:15.60]An evidence backed way
|
|
[50:16.60]Of interpreting these tests
|
|
[50:17.60]Definitely kind of
|
|
[50:18.60]An extension of the product
|
|
[50:19.60]That I hadn't considered
|
|
[50:20.60]Before
|
|
[50:21.60]But yeah the idea of
|
|
[50:22.60]Using that to bring
|
|
[50:23.60]More access to physicians
|
|
[50:24.60]In all different parts
|
|
[50:25.60]Of the country
|
|
[50:26.60]And helping them
|
|
[50:27.60]Interpret complicated
|
|
[50:28.60]We are kenjun
|
|
[50:29.60]From mv1
|
|
[50:30.60]On the podcast
|
|
[50:31.60]And we talked about
|
|
[50:32.60]Better allocating
|
|
[50:33.60]Scientific resources
|
|
[50:34.60]How do you think about
|
|
[50:35.60]These use cases
|
|
[50:36.60]And maybe
|
|
[50:37.60]How illicit
|
|
[50:38.60]Can help drive
|
|
[50:39.60]More research
|
|
[50:40.60]And do you see
|
|
[50:41.60]A world in which
|
|
[50:42.60]You know maybe the models
|
|
[50:43.60]Actually do
|
|
[50:44.60]Some of the research
|
|
[50:45.60]Before suggesting us
|
|
[50:46.60]Yeah I think
|
|
[50:47.60]That's like
|
|
[50:48.60]Very close to
|
|
[50:49.60]What we care about
|
|
[50:50.60]Our product values
|
|
[50:51.60]Are systematic
|
|
[50:52.60]Transparent and unbounded
|
|
[50:53.60]And I think
|
|
[50:54.60]You make research
|
|
[50:55.60]Especially more systematic
|
|
[50:56.60]And unbounded
|
|
[50:57.60]And here's
|
|
[50:58.60]The thing
|
|
[50:59.60]That's at stake here
|
|
[51:00.60]So for example
|
|
[51:01.60]I was
|
|
[51:02.60]Recently talking
|
|
[51:03.60]To people in longevity
|
|
[51:04.60]And I think
|
|
[51:05.60]There isn't really
|
|
[51:06.60]One field of longevity
|
|
[51:07.60]There are kind of
|
|
[51:08.60]Different
|
|
[51:09.60]Scientific subdomains
|
|
[51:10.60]That are surfacing
|
|
[51:11.60]Various things
|
|
[51:12.60]That are related
|
|
[51:13.60]To longevity
|
|
[51:14.60]And I think
|
|
[51:14.60]If you could
|
|
[51:15.60]More systematically
|
|
[51:16.60]Say look
|
|
[51:17.60]Here all the different
|
|
[51:18.60]Interventions
|
|
[51:19.60]We could do
|
|
[51:20.60]And here's
|
|
[51:21.60]The expected
|
|
[51:22.60]RI of these experiments
|
|
[51:23.60]Here's like
|
|
[51:24.60]The evidence so far
|
|
[51:25.60]That supports
|
|
[51:26.60]So much more systematic
|
|
[51:27.60]Than
|
|
[51:28.60]Sciences today
|
|
[51:29.60]I'd guess in like
|
|
[51:30.60]10 20 years we'll look back
|
|
[51:31.60]And it will be
|
|
[51:32.60]Incredible how
|
|
[51:33.60]Unsystematic science
|
|
[51:34.60]Was back in the day
|
|
[51:35.60]Our views kind of
|
|
[51:36.60]Have models
|
|
[51:37.60]Catch up to expert humans today
|
|
[51:39.60]Start with kind of
|
|
[51:40.60]Novice humans
|
|
[51:41.60]And then increasingly
|
|
[51:42.60]Expert humans
|
|
[51:43.60]But we really want
|
|
[51:44.60]The models to earn
|
|
[51:45.60]Their right to the expertise
|
|
[51:47.60]So that's why we do
|
|
[51:48.60]Things in this very step-by-step way
|
|
[51:49.60]That's why we don't
|
|
[51:50.60]Just like throw a bunch of data
|
|
[51:51.60]And apply a bunch of compute
|
|
[51:52.60]And hope we get good results
|
|
[51:54.60]But obviously at some point
|
|
[51:55.60]It's kind of
|
|
[51:56.60]Earned its stripes
|
|
[51:57.60]It can surpass
|
|
[51:58.60]Human researchers
|
|
[51:59.60]But I think that's where
|
|
[52:00.60]Making sure
|
|
[52:01.60]That the models
|
|
[52:02.60]Processes are really
|
|
[52:03.60]Explicit and transparent
|
|
[52:05.60]And that it's really
|
|
[52:06.60]Easy to evaluate
|
|
[52:07.60]Is important because
|
|
[52:08.60]If it does surpass
|
|
[52:09.60]Human understanding
|
|
[52:10.60]People will still need
|
|
[52:11.60]To be able to audit
|
|
[52:12.60]It's work somehow
|
|
[52:13.60]Or spot check
|
|
[52:14.60]It's work somehow
|
|
[52:15.60]To be able to reliably
|
|
[52:16.60]Trust it and use it
|
|
[52:17.60]So yeah
|
|
[52:18.60]That's kind of why
|
|
[52:19.60]The process-based approaches
|
|
[52:20.60]Is really important
|
|
[52:21.60]And on the question
|
|
[52:22.60]Of will models
|
|
[52:23.60]Do their own research
|
|
[52:24.60]Teachers that models
|
|
[52:25.60]Currently don't have
|
|
[52:26.60]That will need
|
|
[52:27.60]To be better there
|
|
[52:28.60]Is better world models
|
|
[52:30.60]I think currently models
|
|
[52:31.60]Are just not great
|
|
[52:32.60]At representing
|
|
[52:33.60]What's going on
|
|
[52:34.60]In a particular situation
|
|
[52:35.60]Or domain in a way
|
|
[52:36.60]That allows them to
|
|
[52:37.60]Come to interesting
|
|
[52:38.60]Surprising conclusions
|
|
[52:40.60]I think they're very good
|
|
[52:41.60]At coming to conclusions
|
|
[52:42.60]That are nearby
|
|
[52:43.60]To conclusions
|
|
[52:44.60]That people have come to
|
|
[52:45.60]Not as good
|
|
[52:46.60]At kind of reasoning
|
|
[52:47.60]And making
|
|
[52:48.60]Surprising connections maybe
|
|
[52:49.60]And so having
|
|
[52:50.60]Deeper models of
|
|
[52:52.60]What are the underlying
|
|
[52:53.60]Domains
|
|
[52:54.60]How are they related
|
|
[52:55.60]Or not related
|
|
[52:56.60]I think there will be
|
|
[52:57.60]An important ingredient
|
|
[52:58.60]From all to actually
|
|
[52:59.60]Being able to make
|
|
[53:00.60]Novel contributions
|
|
[53:01.60]On the topic of
|
|
[53:02.60]Hiring more expert humans
|
|
[53:03.60]You've hired some
|
|
[53:04.60]Very expert humans
|
|
[53:05.60]My friend Maggie
|
|
[53:06.60]Appleton joined you guys
|
|
[53:07.60]I think maybe
|
|
[53:08.60]A year ago-ish
|
|
[53:09.60]In fact, I think
|
|
[53:10.60]You're doing an offsite
|
|
[53:11.60]And we're actually
|
|
[53:12.60]Organizing our big
|
|
[53:13.60]AI UX meetup around
|
|
[53:14.60]Whenever she's
|
|
[53:15.60]In town in San Francisco
|
|
[53:16.60]How big is the team
|
|
[53:17.60]How have you sort of
|
|
[53:18.60]Transition your company
|
|
[53:19.60]Into this sort of PBC
|
|
[53:20.60]And sort of the plan
|
|
[53:21.60]For the future
|
|
[53:22.60]About half of us
|
|
[53:23.60]Are in the Bay Area
|
|
[53:24.60]And then distributed
|
|
[53:25.60]Across US and Europe
|
|
[53:26.60]A mix of mostly kind
|
|
[53:28.60]Of roles in engineering
|
|
[53:29.60]And product
|
|
[53:30.60]And I think that
|
|
[53:31.60]The transition to
|
|
[53:32.60]PBC was really
|
|
[53:33.60]Not that eventful
|
|
[53:34.60]Because I think
|
|
[53:35.60]We were already
|
|
[53:36.60]Even as a nonprofit
|
|
[53:37.60]We were already
|
|
[53:38.60]Shipping every week
|
|
[53:39.60]So very much
|
|
[53:40.60]Operating as a product
|
|
[53:41.60]And then I would say
|
|
[53:43.60]The kind of PBC component
|
|
[53:44.60]Was to very explicitly
|
|
[53:46.60]Stay that we have
|
|
[53:47.60]A mission that we care
|
|
[53:48.60]A lot about
|
|
[53:49.60]There are a lot of ways
|
|
[53:50.60]To make money
|
|
[53:51.60]We make us
|
|
[53:52.60]A lot of money
|
|
[53:53.60]But we are going
|
|
[53:54.60]To be opinionated
|
|
[53:55.60]About how we make money
|
|
[53:56.60]We're going to take
|
|
[53:57.60]The version of making
|
|
[53:58.60]A lot of money
|
|
[53:59.60]That's in line
|
|
[54:00.60]With our mission
|
|
[54:01.60]But it's like
|
|
[54:02.60]All very convergent
|
|
[54:03.60]Alicit is not going
|
|
[54:04.60]To make any money
|
|
[54:05.60]If it's a bad product
|
|
[54:06.60]If it doesn't actually
|
|
[54:07.60]Help you discover truth
|
|
[54:08.60]And do research
|
|
[54:09.60]More rigorously
|
|
[54:10.60]So I think for us
|
|
[54:11.60]The kind of mission
|
|
[54:12.60]And the success
|
|
[54:13.60]Of the company
|
|
[54:14.60]Are very intertwined
|
|
[54:15.60]We're hoping to grow
|
|
[54:16.60]The team quite a lot
|
|
[54:17.60]This year
|
|
[54:18.60]Probably some of our
|
|
[54:19.60]Highest priority roles
|
|
[54:20.60]In marketing
|
|
[54:21.60]Go to market
|
|
[54:22.60]Do you want to talk
|
|
[54:23.60]About their roles?
|
|
[54:24.60]Yeah, broadly
|
|
[54:25.60]We're just looking
|
|
[54:26.60]For senior software engineers
|
|
[54:27.60]And don't need
|
|
[54:28.60]Any particular AI expertise
|
|
[54:29.60]A lot of it is just
|
|
[54:30.60]How do you
|
|
[54:31.60]Build good orchestration
|
|
[54:33.60]For complex tasks
|
|
[54:34.60]So we talked earlier
|
|
[54:35.60]About these notebooks
|
|
[54:36.60]Scaling up
|
|
[54:37.60]Task orchestration
|
|
[54:38.60]And I think a lot
|
|
[54:39.60]Of this looks more
|
|
[54:40.60]Like traditional
|
|
[54:41.60]Soft engineering
|
|
[54:42.60]Than it does look
|
|
[54:43.60]Like machine learning
|
|
[54:44.60]Research and I think
|
|
[54:45.60]The people who are
|
|
[54:46.60]Like really good at
|
|
[54:47.60]Building good abstractions
|
|
[54:48.60]Building applications
|
|
[54:49.60]We've survived
|
|
[54:50.60]Even if some
|
|
[54:51.60]Of their pieces break
|
|
[54:52.60]Like making reliable
|
|
[54:53.60]Components out of
|
|
[54:54.60]Unreliable pieces
|
|
[54:55.60]I think those are the
|
|
[54:56.60]People we're looking for
|
|
[54:57.60]You know that's exactly
|
|
[54:58.60]What I used to do
|
|
[54:59.60]Have you explored
|
|
[55:00.60]The existing orchestration
|
|
[55:01.60]Frameworks, Temporal, Airflow
|
|
[55:03.60]Daxter, Prefects
|
|
[55:05.60]We've looked into
|
|
[55:06.60] Them a little bit
|
|
[55:07.60]I think we have
|
|
[55:08.60]Some specific requirements
|
|
[55:09.60]Around being able
|
|
[55:10.60]To stream work back
|
|
[55:11.60]Very quickly
|
|
[55:12.60]To our users
|
|
[55:13.60]Those could definitely
|
|
[55:14.60]Be relevant
|
|
[55:15.60]Okay, well you're hiring
|
|
[55:16.60]I'm sure we'll plug
|
|
[55:17.60]All the links
|
|
[55:18.60]And parting words
|
|
[55:19.60]Any words of wisdom
|
|
[55:20.60]Models you live by
|
|
[55:22.60]I think it's a really important
|
|
[55:23.60]Time for humanity
|
|
[55:24.60]So I hope everyone
|
|
[55:25.60]Listening to this podcast
|
|
[55:27.60]Can think hard about exactly
|
|
[55:29.60]How they want to
|
|
[55:30.60]Participate in this story
|
|
[55:31.60]There's so much to build
|
|
[55:33.60]And we can be really
|
|
[55:34.60]Intentional about what
|
|
[55:35.60]We align ourselves with
|
|
[55:36.60]There are a lot of applications
|
|
[55:38.60]That are going to be really good
|
|
[55:39.60]For the world
|
|
[55:39.60]And a lot of applications
|
|
[55:40.60]That are not
|
|
[55:41.60]And so yeah
|
|
[55:42.60]I hope people can
|
|
[55:43.60]Take that seriously
|
|
[55:44.60]And kind of seize the moment
|
|
[55:45.60]Yeah, I love how intentional
|
|
[55:46.60]You guys have been
|
|
[55:47.60]Thank you for sharing
|
|
[55:48.60]Thank you
|
|
[55:49.60]Thank you for coming on
|
|
[55:50.60](音乐)
|
|
[55:52.60](音樂)
|
|
[55:54.60](音樂)
|
|
[55:56.60](音樂)
|
|
[55:58.60](音樂)
|
|
[56:00.60](音樂)
|
|
[56:03.60](音樂)
|
|
[56:06.60](音樂)
|
|
[56:09.60](音樂)
|
|
[56:11.60](音樂)
|
|
[56:13.60](音樂)
|
|
[56:15.60]中文字幕:J Chong
|
|
[56:16.60]我只想要你和我一起去做一件事
|