[by:whisper.cpp]
[00:00.00](音乐)
[00:06.00]大家好 欢迎到Lit and Space Podcast
[00:08.40]这是Alessio 和CTO的计划人士 和我参加的计划人士
[00:11.80]我参加了麦克欧的计划 专门 邱小雅
[00:15.00]今天我们回到工作室了
[00:17.20]和Andreas 和 卢安 欢迎你
[00:20.20]谢谢 太好了 谢谢你
[00:22.40]我会介绍你分别的 但也希望你会更多学习
[00:27.40]So Andreas it looks like you started Alicit first and joined later
[00:32.40]That's right
[00:33.00]For all intents and purposes, the illicit and also the odd that existed before then were very different from what I started
[00:39.60]So I think it's like fair to say that you co-funded it
[00:42.60]Got it
[00:43.00]And Joanne you're a co-founder and COO of Alicit now
[00:46.20]Yeah that's right
[00:47.00]So there's a little bit of a history to this
[00:48.80]I'm not super aware of like the sort of journey
[00:51.80]I was aware of odd and illicit as sort of a non-profit type situation
[00:55.80]And recently you turned into like a public benefit corporation
[00:59.40]So yeah maybe if you want you could take us through that journey of finding the problem
[01:04.00]You know obviously you're working together now
[01:06.20]So like how do you get together to decide to leave your startup career to join him
[01:11.20]Yeah it's truly a very long journey
[01:12.80]I guess truly it kind of started in Germany when I was born
[01:17.20]So even as a kid I was always interested in AI
[01:20.00]Like I kind of went to the library
[01:21.40]There were books about how to write programs in QBasic
[01:24.20]And like some of them talked about how to implement chatbots
[01:27.20]And to be clear
[01:28.80]He grew up in like a tiny village on the outskirts of Munich called Dinkelscherbin
[01:33.20]Where it's like a very very idyllic German village
[01:36.20]Yeah important to the story
[01:38.40]So basically the main thing is I've kind of always been thinking about AI my entire life
[01:42.80]And been thinking about at some point this is going to be a huge deal
[01:46.00]It's going to be transformative
[01:47.00]How can I work on it
[01:48.20]And was thinking about it from when I was a teenager
[01:51.60]After high school did a year where I started a startup with the intention to become rich
[01:56.80]And then once I'm rich I can affect the trajectory of AI
[02:00.40]Did not become rich
[02:01.40]Decided to go back to college
[02:03.00]And study cognitive science there
[02:05.00]Which was like the closest thing I could find at the time to AI
[02:08.00]In the last year of college moved to the US to do a PhD at MIT
[02:12.60]Working on broadly kind of new programming languages for AI
[02:15.00]Because it kind of seemed like the existing languages were not great at expressing
[02:19.60]World models and learning world models during Bayesian inference
[02:22.60]Was obviously thinking about ultimately the goal is to actually build tools that help people reason more clearly
[02:27.60]Ask and answer better questions and make better decisions
[02:31.60]But for a long time it seemed like the technology to put reasoning in machines just wasn't there
[02:35.60]Initially at the end of my postdoc at Stanford was thinking about well what to do
[02:39.60]I think the standard path is you become an academic and do research
[02:43.60]But it's really hard to actually build interesting tools as an academic
[02:48.60]You can't really hire great engineers
[02:50.60]Everything is kind of on a paper-to-paper timeline
[02:53.60]And so I was like well maybe I should start a startup
[02:56.60]Pursuit that for a little bit
[02:57.60]But it seemed like it was too early because you could have tried to do an AI startup
[03:01.60]But probably would not have been this kind of AI startup we're seeing now
[03:05.60]So then decided to just start a non-profit research lab
[03:08.60]That's going to do research for a while until we better figure out how to do thinking in machines
[03:13.60]And that was odd
[03:14.60]And then over time it became clear how to actually build actual tools for reasoning
[03:19.60]Then only over time we developed a better way to
[03:23.60]I'll let you fill in some of the details here
[03:25.60]Yeah so I guess my story maybe starts around 2015
[03:29.60]I kind of wanted to be a founder for a long time
[03:31.60]And I wanted to work on an idea that stood the test of time for me
[03:34.60]Like an idea that stuck with me for a long time
[03:37.60]And starting in 2015
[03:38.60]Actually originally I became interested in AI based tools from the perspective of mental health
[03:43.60]So there are a bunch of people around me who are really struggling
[03:45.60]One really close friend in particular is really struggling with mental health
[03:48.60]And didn't have any support
[03:50.60]And it didn't feel like there was anything before kind of like getting hospitalized
[03:54.60]That could just help her
[03:56.60]And so luckily she came and stayed with me for a while
[03:58.60]And we were just able to talk through some things
[04:00.60]But it seemed like you know lots of people might not have that resource
[04:04.60]And something maybe AI enabled could be much more scalable
[04:07.60]I didn't feel ready to start a company then
[04:10.60]That's 2015
[04:11.60]And I also didn't feel like the technology was ready
[04:13.60]So then I went into fintech
[04:15.60]And like kind of learned how to do the tech thing
[04:17.60]And then in 2019
[04:18.60]I felt like it was time for me to just jump in
[04:21.60]And build something on my own
[04:22.60]I really wanted to create
[04:24.60]And at the time I looked around at tech
[04:26.60]And felt like not super inspired by the options
[04:28.60]I just I didn't want to have a tech career ladder
[04:31.60]Or like I didn't want to like climb the career ladder
[04:33.60]There are two kind of interesting technologies at the time
[04:35.60]There was AI and there was crypto
[04:37.60]And I was like well the AI people seemed like a little bit more nice
[04:41.60]And maybe like slightly more trustworthy
[04:44.60]Both super exciting
[04:45.60]But through my bed and on the AI side
[04:47.60]And then I got connected to Andreas
[04:49.60]And actually the way he was thinking about
[04:51.60]Pursuing the research agenda at AUT
[04:53.60]Was really compatible with what I had envisioned
[04:56.60]For an ideal AI product
[04:58.60]Something that helps kind of take down
[05:00.60]Really complex thinking
[05:01.60]Overwhelming thoughts
[05:02.60]And breaks it down into small pieces
[05:04.60]And then this kind of mission
[05:05.60]We need AI to help us figure out
[05:07.60]What we ought to do
[05:08.60]It was really inspiring, right?
[05:10.60]Yeah, because I think it was clear
[05:12.60]That we were building the most powerful
[05:14.60]Optimizer of our time
[05:16.60]But as a society
[05:17.60]We hadn't figured out
[05:18.60]How to direct that optimization potential
[05:21.60]And if you kind of direct tremendous
[05:23.60]Optimization potential at the wrong thing
[05:25.60]That's really disastrous
[05:26.60]So the goal of AUT was
[05:28.60]Make sure that if we build
[05:29.60]The most transformative technology of our lifetime
[05:31.60]It can be used for something really impactful
[05:34.60]And that's really good reasoning
[05:35.60]Like not just generating ads
[05:37.60]My background was in marketing
[05:38.60]But like so
[05:39.60]It's like I want to do
[05:40.60]More than generate ads with this
[05:42.60]And also if these AI systems
[05:44.60]Get to be super intelligent enough
[05:46.60]That they are doing this
[05:47.60]Really complex reasoning
[05:48.60]That we can trust them
[05:49.60]That they are aligned with us
[05:51.60]And we have ways of evaluating
[05:53.60]That they are doing the right thing
[05:54.60]So that's what AUT did
[05:55.60]We did a lot of experiments
[05:56.60]You know, like Andreas said
[05:57.60]Before foundation models
[05:59.60]Really like took off
[06:00.60]A lot of the issues we were seeing
[06:01.60]Were more in reinforcement learning
[06:03.60]But we saw a future
[06:04.60]Where AI would be able to do
[06:06.60]More kind of logical reasoning
[06:08.60]Not just kind of extrapolate
[06:09.60]From numerical trends
[06:10.60]We actually kind of
[06:11.60]Set up experiments with people
[06:13.60]Where kind of people stood in
[06:14.60]As super intelligent systems
[06:16.60]And we effectively gave them
[06:17.60]Context windows
[06:18.60]So they would have to
[06:19.60]Like read a bunch of text
[06:20.60]And one person would get less text
[06:23.60]And one person would get all the text
[06:24.60]And the person with less text
[06:26.60]Would have to evaluate the work
[06:28.60]Of the person who could read much more
[06:30.60]So like in the world
[06:31.60]We were basically simulating
[06:32.60]Like in, you know, 2018-2019
[06:34.60]A world where an AI system
[06:36.60]Could read significantly more than you
[06:38.60]And you as the person
[06:39.60]Who couldn't read that much
[06:40.60]Had to evaluate the work
[06:41.60]Of the AI system
[06:42.60]So there's a lot of the work we did
[06:44.60]And from that we kind of
[06:45.60]Iterated on the idea
[06:46.60]Of breaking complex tasks down
[06:47.60]Into smaller tasks
[06:48.60]Like complex tasks
[06:49.60]Like open-ended reasoning
[06:51.60]Logical reasoning
[06:52.60]Into smaller tasks
[06:53.60]So that it's easier
[06:54.60]To train AI systems on them
[06:55.60]And also so that it's easier
[06:57.60]To evaluate the work of the AI system
[06:59.60]When it's done
[07:00.60]And then also kind of
[07:01.60]We really pioneered this idea
[07:02.60]The importance of supervising
[07:03.60]The process of AI systems
[07:05.60]Not just the outcomes
[07:06.60]And so a big part
[07:07.60]Of how elicit is built
[07:08.60]Is we're very intentional
[07:10.60]About not just throwing
[07:11.60]A ton of data into a model
[07:13.60]And training it
[07:14.60]And then saying cool
[07:15.60]Here's like scientific output
[07:16.60]Like that's not at all
[07:17.60]What we do
[07:18.60]Our approach is very much
[07:19.60]Like what are the steps
[07:20.60]That an expert human does
[07:21.60]Or what is like an ideal process
[07:23.60]As granularly as possible
[07:25.60]Let's break that down
[07:26.60]And then train AI systems
[07:27.60]To perform each of those steps
[07:29.60]Very robustly
[07:30.60]When you train like that
[07:32.60]From the start
[07:33.60]After the fact
[07:34.60]It's much easier to evaluate
[07:35.60]It's much easier to troubleshoot
[07:36.60]At each point
[07:37.60]Like where did something break down
[07:38.60]So yeah
[07:39.60]We were working on those experiments
[07:40.60]For a while
[07:41.60]And then at the start of 2021
[07:43.60]Decided to build a product
[07:44.60]Do you mind if I
[07:45.60]Because I think you're about
[07:46.60]To go into more modern
[07:47.60]Hot and elicit
[07:49.60]And I just wanted to
[07:50.60]Because I think a lot of people
[07:51.60]Are in where you were
[07:53.60]Like sort of 2018-19
[07:55.60]Where you chose a partner
[07:57.60]To work with
[07:58.60]And you didn't know him
[07:59.60]Yeah yeah
[08:00.60]You were just kind of cold introduced
[08:01.60]Yep
[08:02.60]A lot of people are cold introduced
[08:03.60]I've been cold introduced
[08:04.60]To tons of people
[08:05.60]And I never work with them
[08:06.60]I assume you had a lot
[08:07.60]A lot of other options
[08:08.60]Like how do you advise
[08:09.60]People to make those choices
[08:10.60]We were not totally cold introduced
[08:12.60]So one of our closest friends
[08:13.60]Introduced us
[08:14.60]And then Andreas had written a lot
[08:16.60]On the website
[08:17.60]A lot of blog posts
[08:18.60]A lot of publications
[08:19.60]And I just read it
[08:20.60]And I was like, wow
[08:21.60]This sounds like my writing
[08:22.60]And even other people
[08:23.60]Some of my closest friends
[08:24.60]I asked for advice from
[08:25.60]They were like, oh
[08:26.60]This sounds like your writing
[08:28.60]But I think
[08:29.60]I also had some kind of
[08:30.60]Like things I was looking for
[08:31.60]I wanted someone
[08:32.60]With a complimentary skill set
[08:33.60]I want someone
[08:34.60]Who was very values aligned
[08:36.60]And yeah
[08:37.60]That was all a good fit
[08:38.60]We also did a pretty
[08:40.60]Lengthy mutual evaluation process
[08:42.60]Where we had a Google doc
[08:43.60]Where we had all kinds of questions
[08:45.60]For each other
[08:46.60]And I think it ended up being
[08:48.60]Round 50 pages or so
[08:49.60]Off like various questions
[08:51.60]Was it the YC list?
[08:53.60]There's some lists going around
[08:54.60]For co-founder questions
[08:55.60]No, we just made our own
[08:57.60]But I guess it's probably related
[08:59.60]And that you asked yourself
[09:00.60]What are the values you care about
[09:01.60]How would you approach
[09:02.60]Various decisions
[09:03.60]And things like that
[09:04.60]I shared like all of my past
[09:05.60]Performance reviews
[09:06.60]Yeah
[09:07.60]Yeah
[09:08.60]And he never had any
[09:09.60]No
[09:10.60]Yeah, sorry
[09:14.60]I just had to
[09:15.60]A lot of people are going through
[09:16.60]That phase
[09:17.60]And you kind of skipped over it
[09:18.60]I was like, no, no, no
[09:19.60]There's like an interesting story
[09:20.60]Yeah
[09:21.60]Before we jump into what it is
[09:22.60]It is today
[09:23.60]The history is a bit
[09:24.60]Cutter intuitive
[09:25.60]So you start
[09:26.60]Now, oh, if we had
[09:27.60]A super powerful model
[09:29.60]How we align it
[09:30.60]How we use it
[09:31.60]But then you were actually
[09:32.60]Like, well, let's just build
[09:33.60]The product so that people
[09:34.60]Can actually leverage it
[09:35.60]And I think there are
[09:36.60]A lot of folks today
[09:37.60]That are now back
[09:38.60]To where you were
[09:39.60]Maybe five years ago
[09:40.60]They're like, oh, what if
[09:41.60]This happens rather than
[09:42.60]Focusing on actually building
[09:43.60]Something useful with it
[09:45.60]What click for you
[09:46.60]To like move into a list
[09:47.60]And then we can cover
[09:48.60]That story too
[09:49.60]I think in many ways
[09:50.60]The approach is still the same
[09:51.60]Because the way we're
[09:52.60]Building a list is not
[09:54.60]Let's train a foundation model
[09:55.60]To do more stuff
[09:56.60]It's like
[09:57.60]Let's build a scaffolding
[09:58.60]Such that we can
[09:59.60]Deploy powerful models
[10:00.60]To good ends
[10:01.60]I think it's different
[10:02.60]Now in that
[10:03.60]We actually have
[10:04.60]Like some of the models to plug in
[10:05.60]But if in 2017
[10:06.60]We had had the models
[10:08.60]We could have run
[10:09.60]The same experiments
[10:10.60]We did run with humans
[10:11.60]Back then
[10:12.60]Just with models
[10:13.60]And so in many ways
[10:14.60]Our philosophy is always
[10:15.60]Let's think add to the future
[10:16.60]What models are going to exist
[10:17.60]In one, two years
[10:19.60]Or longer
[10:20.60]And how can we make it
[10:22.60]So that they can
[10:23.60]Actually be deployed
[10:24.60]In many transparent
[10:25.60]Controllable ways
[10:26.60]Yeah, I think
[10:27.60]Motivationally we both
[10:28.60]Are kind of
[10:29.60]Product people at heart
[10:30.60]The research was
[10:31.60]Really important
[10:32.60]And it didn't
[10:33.60]Make sense to build
[10:34.60]A product at that time
[10:35.60]But at the end of the day
[10:36.60]The thing that always
[10:37.60]Motivated us is
[10:38.60]Imagining a world
[10:39.60]Where high quality
[10:40.60]Reasoning is really abundant
[10:41.60]And AI is a technology
[10:43.60]That's going to get us there
[10:44.60]And there's a way
[10:45.60]To guide that technology
[10:46.60]With research
[10:47.60]But you can have
[10:48.60]A more direct effect
[10:49.60]Through product
[10:50.60]Because with research
[10:51.60]You publish the research
[10:52.60]And someone else
[10:53.60]Product felt
[10:54.60]Like a more direct path
[10:55.60]And we wanted to
[10:56.60]Concretely have an impact
[10:57.60]On people's lives
[10:58.60]Yeah, I think
[10:59.60]The kind of personally
[11:00.60]The motivation was
[11:01.60]We want to build
[11:02.60]For people
[11:03.60]Yep, and then
[11:04.60]Just to recap as well
[11:05.60]Like the models
[11:06.60]You're using back then were
[11:07.60]Like, I don't know
[11:08.60]With the like BERT type stuff
[11:10.60]Or T5 or
[11:12.60]I don't know what time frame
[11:13.60]We're talking about here
[11:14.60]I guess to be clear
[11:15.60]At the very beginning
[11:16.60]We had humans do the work
[11:18.60]And then I think
[11:19.60]The first models
[11:20.60]That kind of makes sense
[11:21.60]Or GPT-2
[11:22.60]And TNLG
[11:23.60]And early generative models
[11:25.60]We do
[11:26.60]We also use
[11:27.60]Like T5 based models
[11:28.60]Even now
[11:29.60]Started with GPT-2
[11:30.60]Yeah, cool
[11:31.60]I'm just kind of curious about
[11:32.60]Like how do you
[11:33.60]Start so early
[11:34.60]Like now it's obvious
[11:35.60]Where to start
[11:36.60]But back then it wasn't
[11:37.60]Yeah, I used to
[11:38.60]Nag Andreas a lot
[11:39.60]I was like
[11:40.60]Why are you
[11:41.60]Talking to this?
[11:42.60]I don't know
[11:43.60]I felt like
[11:44.60]GPT-2 is like
[11:45.60]Clearly can't do anything
[11:46.60]And I was like
[11:47.60]Andreas, you're wasting your time
[11:48.60]Like playing with this toy
[11:49.60]But yeah, it was right
[11:50.60]So what's the history
[11:51.60]Of what Elisit
[11:52.60]Actually does as a product
[11:53.60]You recently announced that
[11:55.60]After four months
[11:56.60]You get to a million of revenue
[11:57.60]Obviously a lot of people
[11:58.60]Use it, get a lot of value
[11:59.60]But it would
[12:00.60]Initially kind of like
[12:01.60]Structured data
[12:02.60]Instruction from papers
[12:03.60]Then you had
[12:04.60]Kind of like concept grouping
[12:05.60]And today it's maybe
[12:06.60]Like a more full stack
[12:07.60]Research enabler
[12:09.60]Kind of like paper
[12:10.60]Understand their platform
[12:11.60]What's the definitive definition
[12:13.60]Of what Elisit is
[12:14.60]And how did you get here
[12:15.60]Yeah, we say Elisit
[12:16.60]As an AI research assistant
[12:17.60]I think it will continue
[12:18.60]To evolve
[12:19.60]You know, we're so excited
[12:20.60]About building and research
[12:21.60]Because there's just so much space
[12:22.60]I think the current phase
[12:23.60]We're in right now
[12:24.60]We talk about it
[12:25.60]As really trying to make Elisit
[12:27.60]The best place to understand
[12:28.60]What is known
[12:29.60]So it's all a lot about like
[12:31.60]Literature summarization
[12:32.60]There's a ton of information
[12:33.60]That the world already knows
[12:34.60]It's really hard to navigate
[12:35.60]Hard to make it relevant
[12:37.60]So a lot of it is around
[12:38.60]Document discovery
[12:39.60]And processing and analysis
[12:41.60]I really kind of want to
[12:42.60]Import some of the incredible
[12:44.60]Productivity improvements
[12:45.60]We've seen in software engineering
[12:47.60]And data science
[12:48.60]And into research
[12:49.60]So it's like
[12:50.60]How can we make researchers
[12:51.60]Like data scientists of text
[12:53.60]That's why we're launching
[12:54.60]This new set of features
[12:55.60]Called notebooks
[12:56.60]It's very much inspired
[12:57.60]By computational notebooks
[12:58.60]Like Jupyter notebooks
[12:59.60]Deep note or colab
[13:01.60]Because they're so powerful
[13:02.60]And so flexible
[13:03.60]And ultimately
[13:04.60]When people are trying
[13:05.60]To get to an answer
[13:07.60]Or understand insight
[13:08.60]They're kind of like
[13:09.60]Manipulating evidence
[13:10.60]And information
[13:11.60]Today that's all packaged
[13:12.60]In PDFs
[13:13.60]Which are super brittle
[13:14.60]But with language models
[13:15.60]We can decompose
[13:16.60]These PDFs
[13:17.60]And then we can
[13:18.60]Interly claims
[13:19.60]And evidence
[13:20.60]And insights
[13:21.60]And then let researchers
[13:22.60]Mash them up together
[13:23.60]Remix them
[13:24.60]And analyze them together
[13:25.60]So yeah
[13:26.60]I would say quite simply
[13:27.60]Overall listed
[13:28.60]As an AI research assistant
[13:29.60]Right now we're focused
[13:30.60]On text based workflows
[13:32.60]But long term
[13:33.60]Really want to kind of
[13:34.60]Go further and further
[13:35.60]Into reasoning
[13:36.60]And decision making
[13:37.60]And when you say
[13:38.60]AI research assistant
[13:39.60]This is kind of
[13:40.60]Matter research
[13:41.60]So researchers
[13:42.60]Use a list
[13:43.60]As a research assistant
[13:44.60]It's not a generic
[13:45.60]You can research
[13:46.60]Or it could be
[13:47.60]But what are people
[13:48.60]Using it for today
[13:49.60]So specifically in science
[13:51.60]A lot of people use
[13:52.60]Human research assistants
[13:53.60]To do things
[13:54.60]You tell your grad student
[13:56.60]Here are a couple of papers
[13:57.60]Can you look at
[13:58.60]All of these
[13:59.60]See which of these
[14:00.60]Have kind of sufficiently
[14:01.60]Large populations
[14:02.60]And actually study
[14:03.60]The disease that
[14:04.60]I'm interested in
[14:05.60]And then write out
[14:06.60]Like what are the experiments
[14:07.60]They did
[14:08.60]What are the interventions
[14:09.60]They did
[14:10.60]What are the outcomes
[14:11.60]And kind of organize
[14:12.60]That for me
[14:13.60]And the first phase
[14:14.60]Of understanding
[14:15.60]This is on
[14:16.60]Automating that work flow
[14:17.60]Because a lot of that work
[14:18.60]Is pretty road work
[14:19.60]I think it's not
[14:20.60]The kind of thing
[14:21.60]That we need humans to do
[14:22.60]Language models can do it
[14:23.60]And then if
[14:24.60]Language models can do it
[14:25.60]That you can obviously
[14:26.60]Scale it up
[14:27.60]Much more than a grad student
[14:28.60]Or undergrad
[14:29.60]Research assistant
[14:30.60]Would be able to do
[14:31.60]Yeah the use cases
[14:32.60]Are pretty broad
[14:33.60]So we do have
[14:34.60]A very large
[14:35.60]Percent of our users
[14:36.60]Are just using it personally
[14:37.60]Or for a mix
[14:38.60]Of personal and professional
[14:39.60]Things
[14:40.60]People who care a lot
[14:41.60]About health
[14:42.60]Or biohacking
[14:43.60]Or parents
[14:44.60]Or disease
[14:45.60]Or want to understand
[14:46.60]The literature directly
[14:47.60]So there is an
[14:48.60]Individual consumer use
[14:49.60]Case
[14:50.60]We're most focused
[14:51.60]On the power users
[14:52.60]So that's where
[14:53.60]We're really excited
[14:54.60]To build
[14:55.60]So Lisit was
[14:56.60]Very much inspired
[14:57.60]By this work flow
[14:58.60]In literature
[14:59.60]Called systematic reviews
[15:00.60]Or meta analysis
[15:01.60]Which is basically
[15:02.60]The human state
[15:03.60]Of the art
[15:04.60]For summarizing
[15:05.60]Scientific literature
[15:06.60]It typically involves
[15:07.60]Like five people
[15:08.60]Working together
[15:09.60]For over a year
[15:10.60]And they kind of
[15:11.60]First start by trying
[15:12.60]To find the maximally
[15:13.60]First possible
[15:14.60]So it's like
[15:15.60]Ten thousand papers
[15:16.60]And they kind of
[15:17.60]Systematically narrow
[15:18.60]That down to like
[15:19.60]Hundreds or fifty
[15:20.60]Extract key details
[15:22.60]From every single paper
[15:23.60]Usually have two people
[15:24.60]Doing it
[15:25.60]Like a third person
[15:26.60]Reviewing it
[15:27.60]So it's like
[15:28.60]Incredibly laborious
[15:29.60]Time-consuming process
[15:30.60]But you see it
[15:31.60]In every single domain
[15:32.60]So in science
[15:33.60]In machine learning
[15:34.60]In policy
[15:35.60]Because it's so structured
[15:36.60]And designed to be reproducible
[15:37.60]It's really amenable
[15:38.60]To automation
[15:39.60]So it's kind of
[15:40.60]The workflow that we want
[15:41.60]To automate first
[15:42.60]It's accessible
[15:43.60]For any question
[15:44.60]And make
[15:45.60]You know kind of
[15:46.60]These really robust
[15:47.60]Living summaries of science
[15:48.60]So yeah
[15:48.60]It's one of the
[15:49.60]Workflows that we're
[15:50.60]Starting with
[15:51.60]Our previous guest
[15:52.60]Mike Conover
[15:53.60]He's building a new
[15:54.60]Company got BrightWave
[15:55.60]Which is an AI
[15:56.60]Research assistant
[15:57.60]For financial research
[15:58.60]How do you see
[15:59.60]The future of these tools
[16:00.60]Like does everything
[16:01.60]Converged
[16:02.60]Like a God researcher
[16:03.60]Assisted
[16:04.60]Or is every domain
[16:05.60]Gone to have its own thing
[16:06.60]I think that's a good
[16:07.60]And mostly open question
[16:09.60]I do think there are
[16:10.60]Some differences
[16:11.60]Data analysis
[16:12.60]And other research
[16:13.60]Is more high-level
[16:15.60]Cross-domain thinking
[16:16.60]And we definitely
[16:17.60]Want to contribute to
[16:18.60]The broad
[16:19.60]Generalist reasoning type
[16:20.60]Space like if
[16:21.60]Researchers are
[16:22.60]Making discoveries often
[16:23.60]It's like hey
[16:24.60]This thing in biology
[16:25.60]Is actually analogous to
[16:26.60]Like these equations
[16:27.60]In economics or something
[16:28.60]And that's just
[16:29.60]Fundamentally a thing
[16:30.60]That where you need
[16:31.60]To reason across domains
[16:32.60]At least within research
[16:33.60]I think there will be
[16:34.60]Like one best platform
[16:36.60]More or less
[16:37.60]For this type of
[16:38.60]Generalist research
[16:39.60]I think there may still be
[16:40.60]Tools like for genomics
[16:41.60]Like particular types
[16:42.60]Of modules
[16:43.60]Of genes
[16:44.60]And proteins
[16:45.60]And whatnot
[16:46.60]But for a lot of
[16:47.60]The kind of high-level reasoning
[16:48.60]That humans do
[16:49.60]I think that is
[16:50.60]A more open or type
[16:51.60]All thing
[16:52.60]I wanted to ask
[16:53.60]A little bit deeper about
[16:54.60]I guess the workflow
[16:55.60]That you mentioned
[16:56.60]I like that phrase
[16:57.60]I see that
[16:58.60]In your UI now
[16:59.60]But that's
[17:00.60]As it is today
[17:01.60]And I think you were
[17:02.60]About to tell us about
[17:03.60]How it was in 2021
[17:04.60]And how it maybe progressed
[17:05.60]How has this workflow
[17:06.60]Evolved over time
[17:07.60]So the very first
[17:08.60]Version of illicit
[17:09.60]In the research assistant
[17:10.60]It was a forecasting assistant
[17:12.60]So we set out
[17:13.60]And we were thinking about
[17:14.60]What are some of the most
[17:15.60]Impactful types of reasoning
[17:16.60]That if we could scale up
[17:17.60]AI would really transform
[17:18.60]The world
[17:19.60]And we actually started
[17:20.60]With literature review
[17:21.60]But we're like
[17:22.60]So many people are going to build
[17:23.60]Literature review tools
[17:24.60]So let's start there
[17:25.60]So then we focused
[17:26.60]On geopolitical forecasting
[17:27.60]So I don't know
[17:28.60]If you're familiar
[17:29.60]With like manifold or
[17:30.60]Manifold markets
[17:31.60]Yeah, that kind of stuff
[17:32.60]Before manifold
[17:33.60]Yeah, yeah
[17:34.60]I'm not predicting relationships
[17:35.60]We're predicting like
[17:36.60]Is China going to invade Taiwan?
[17:38.60]Yeah
[17:39.60]That's in a relationship
[17:40.60]Yeah, that's fair
[17:41.60]Yeah, it's true
[17:42.60]And then we worked
[17:43.60]On that for a while
[17:44.60]And then after GPT-3
[17:45.60] came out
[17:46.60]I think by that time
[17:47.60]We realized that
[17:48.60]Originally we were trying
[17:49.60]To help people convert
[17:50.60]Their beliefs into
[17:51.60]Probability distributions
[17:53.60]So take fuzzy beliefs
[17:54.60]But like model them
[17:55.60]More concretely
[17:56.60]And then after a few months
[17:57.60]Of iterating on that
[17:58.60]Just realize the thing
[17:59.60]That's blocking people
[18:00.60]From making
[18:01.60]Interesting predictions
[18:02.60]About important events
[18:03.60]In the world
[18:04.60]Is less kind of
[18:05.60]On the probabilistic side
[18:06.60]And much more
[18:07.60]Research side
[18:08.60]And so that kind
[18:09.60]Of combined with
[18:10.60]The very generalist
[18:11.60]Capabilities of GPT-3
[18:12.60]Prompted us to
[18:13.60]Make a more general
[18:14.60]Research assistant
[18:15.60]Then we spent
[18:16.60]A few months iterating
[18:17.60]On what even is
[18:18.60]A research assistant
[18:19.60]So we would embed
[18:20.60]With different researchers
[18:21.60]We built data labeling
[18:23.60]Workflows in the beginning
[18:24.60]Kind of right off the bat
[18:25.60]We built ways to find
[18:27.60]Experts in a field
[18:29.60]And like ways to ask
[18:30.60]Good research questions
[18:31.60]We just kind of
[18:32.60]Iterated through a lot
[18:33.60]Of workflows and no one else
[18:34.60]Was really building at this
[18:35.60]Time and it was like
[18:36.60]Let's do some prompt
[18:37.60]Engineering and see
[18:38.60]Like what is a task
[18:39.60]That is at the
[18:40.60]Intersection of what's
[18:41.60]Technologically capable
[18:42.60]And like important
[18:43.60]For researchers
[18:44.60]And we had like
[18:45.60]A very nondescript
[18:46.60]Landing page
[18:47.60]It said nothing
[18:48.60]But somehow people were
[18:49.60]Signing up and we had
[18:50.60]The sign of form
[18:51.60]That was like
[18:52.60]Why are you here
[18:53.60]And everyone was like
[18:54.60]I need help
[18:55.60]With literature review
[18:56.60]And we're like
[18:57.60]A literature review
[18:58.60]That sounds so hard
[18:59.60]I don't even know
[19:00.60]What that means
[19:01.60]We don't want to work on it
[19:02.60]But then eventually
[19:03.60]We're like
[19:04.60]Everyone is saying
[19:05.60]Yeah
[19:06.60]And we also kind of
[19:07.60]Personally knew literature
[19:08.60]Review was hard
[19:09.60]And if you look at the graphs
[19:10.60]For academic literature
[19:11.60]Being published every
[19:12.60]Single month you guys
[19:13.60]Know this in machine learning
[19:14.60]It's like up into the right
[19:15.60]Like superhuman amounts
[19:16.60]Of papers
[19:17.60]So we're like
[19:18.60]All right, let's just try it
[19:19.60]I was really nervous
[19:20.60]But Andres was like
[19:21.60]This is kind of like
[19:22.60]The right problem space
[19:23.60]To jump into
[19:24.60]Even if we don't
[19:25.60]Know what we're doing
[19:26.60]So my take was like
[19:27.60]Fine
[19:28.60]This feels really scary
[19:29.60]But let's just launch
[19:30.60]A feature every single week
[19:31.60]And double our user
[19:32.60]Numbers every month
[19:33.60]And if we can do that
[19:34.60]We will find something
[19:35.60]I was worried about like
[19:36.60]Getting lost
[19:37.60]In the kind of academic white
[19:38.60]Space
[19:39.60]So the very first version
[19:40.60]Was actually a weekend prototype
[19:41.60]That Andres made
[19:42.60]Do you want to explain
[19:43.60]How that worked
[19:44.60]I mostly remember
[19:45.60]That it was really bad
[19:47.60]So the thing I remember
[19:48.60]Is you entered a question
[19:50.60]And it would give you back
[19:51.60]A list of claims
[19:52.60]So your question could be
[19:53.60]I don't know
[19:54.60]How does creatine effect cognition
[19:56.60]And it would give you back
[19:57.60]Some claims
[19:58.60]That are to some extent
[19:59.60]Based on papers
[20:00.60]But they were often irrelevant
[20:02.60]The papers were often
[20:03.60]And so we ended up
[20:04.60]Soon just printing out
[20:05.60]A bunch of examples
[20:06.60]Of results
[20:07.60]And putting them up
[20:08.60]On the wall
[20:09.60]So that we would
[20:10.60]Kind of feel the constant
[20:11.60]Shame of having
[20:12.60]Such a bad product
[20:13.60]And would be incentivized
[20:14.60]To make it better
[20:15.60]And I think overtime
[20:16.60]It has gotten a lot better
[20:17.60]But I think
[20:18.60]The initial version
[20:19.60]Was like really very bad
[20:20.60]But it was basically
[20:21.60]Like a natural language
[20:22.60]Summary of an abstract
[20:23.60]Like kind of a one-sentence
[20:24.60]Summary
[20:25.60]And which we still have
[20:26.60]And then as we learned
[20:27.60]Kind of more about this
[20:28.60]Systematic review workflow
[20:29.60]We started expanding
[20:30.60]The capability so that
[20:31.60]You could extract a lot
[20:32.60]And more with that
[20:33.60]And were you using
[20:34.60]Like embeddings
[20:35.60]And cosine similarity
[20:36.60]That kind of stuff
[20:37.60]For retrieval
[20:38.60]Or was it keyword based
[20:39.60]Or
[20:40.60]I think the very first version
[20:42.60]Didn't even have
[20:43.60]It's own search engine
[20:44.60]I think the very first version
[20:45.60]Probably used
[20:46.60]The semantic school or API
[20:48.60]Or something similar
[20:49.60]And only later when we discovered
[20:51.60]That the API is not very semantic
[20:53.60]Then built our own search
[20:55.60]Search and that has helped a lot
[20:57.60]And then we're going to go into
[20:59.60]Like more recent products stuff
[21:01.60]But like you know
[21:02.60]I think you seem the more
[21:03.60]So to start up oriented
[21:04.60]Business person
[21:05.60]And you seem sort of more
[21:06.60]Ideologically like interested
[21:08.60]In research obviously
[21:09.60]Because of your PhD
[21:10.60]What kind of market sizing
[21:11.60]Were you guys thinking
[21:12.60]Right?
[21:13.60]Because you're here saying
[21:14.60]Like we have to double every month
[21:15.60]And I'm like
[21:16.60]I don't know how you make
[21:17.60]That conclusion from this
[21:19.60]Right?
[21:20.60]Especially also as a nonprofit
[21:21.60]At the time
[21:22.60]I mean market size wise
[21:23.60]I felt like in this space
[21:25.60]Where so much was changing
[21:27.60]And it was very unclear
[21:29.60]What of today was actually
[21:30.60]Will be true tomorrow
[21:31.60]We just like
[21:32.60]Really rested a lot
[21:33.60]On very very simple
[21:34.60]Fundamental principles
[21:35.60]Which is like
[21:36.60]If you can understand
[21:37.60]The truth that is
[21:38.60]Very economically beneficial
[21:40.60]Like valuable
[21:41.60]If you like know the truth
[21:42.60]On principle
[21:43.60]That's enough for you
[21:44.60]Research is the key to many
[21:45.60]Breakthroughs that are
[21:46.60]Very commercially valuable
[21:47.60]Because my version of it
[21:48.60]Is students are poor
[21:49.60]And they don't pay
[21:50.60]For anything
[21:51.60]Right?
[21:52.60]But that's obviously not true
[21:53.60]As you guys have found out
[21:54.60]But you had to have
[21:55.60]Some market insight
[21:56.60]For me to have believed that
[21:57.60]But you skipped that
[21:58.60]We did encounter
[21:59.60]Talking to vcs
[22:00.60]For our seed round
[22:01.60]A lot of vcs were like
[22:02.60]You know researchers
[22:03.60]They don't have any money
[22:04.60]Why don't you build
[22:05.60]Legal assistant
[22:07.60]I think in some
[22:09.60]Short-sighted way
[22:10.60]Maybe that's true
[22:11.60]But I think in the long run
[22:12.60]R&D is such a big space
[22:13.60]Of the economy
[22:14.60]I think if you can
[22:15.60]Substantially improve
[22:17.60]How quickly people find
[22:19.60]New discoveries
[22:20.60]Or avoid controlled trials
[22:22.60]That don't go anywhere
[22:23.60]I think that's just
[22:24.60]Huge amounts of money
[22:25.60]And there are a lot
[22:26.60]Of questions obviously
[22:27.60]About between here and there
[22:28.60]But I think as long as
[22:29.60]The fundamental principle is there
[22:31.60]We were okay with that
[22:32.60]And I guess we found
[22:33.60]Some investors who also were
[22:34.60]Yeah congrats
[22:35.60]I'm sure we can cover
[22:37.60]The sort of flip later
[22:39.60]I think you're about to start
[22:40.60]As on like GPT-3
[22:41.60]And how like that
[22:42.60]Changed things for you
[22:43.60]It's funny like I guess
[22:44.60]Every major GPT version
[22:45.60]You have like some big insight
[22:47.60]Yeah I mean
[22:49.60]What do you think
[22:50.60]I think it's a little bit
[22:52.60]Less true for us than for others
[22:54.60]Because we always believe
[22:55.60]That there will basically
[22:57.60]Human level machine work
[23:00.60]And so
[23:01.60]It is definitely true
[23:02.60]That in practice
[23:03.60]For your product
[23:04.60]As new models come out
[23:06.60]Your product starts working better
[23:07.60]You can add some features
[23:08.60]That you couldn't add before
[23:09.60]But I don't think
[23:11.60]We really ever had the
[23:13.60]Moment where we were like
[23:14.60]Oh wow
[23:15.60]That is super unanticipated
[23:17.60]We need to do something
[23:18.60]Entirely different now
[23:19.60]From what was on the roadmap
[23:21.60]I think GPT-3
[23:22.60]Was a big change
[23:23.60]Because it kind of said
[23:25.60]Oh now is the time
[23:26.60]To build these tools
[23:27.60]And then GPT-4
[23:28.60]Was maybe a little bit
[23:29.60]More of an extension
[23:30.60]Of GPT-3
[23:31.60]GPT-3 over GPT-2
[23:32.60]Was like qualitative level
[23:34.60]Shift
[23:35.60]Then GPT-4 was like
[23:36.60]Okay great
[23:37.60]Now it's like more accurate
[23:38.60]We're more accurate
[23:39.60]On these things
[23:40.60]We can answer harder questions
[23:41.60]But the shape of the product
[23:42.60]Had already taken place
[23:43.60]By that time
[23:44.60]I kind of want to ask you
[23:45.60]About this sort of pivot
[23:46.60]That you made
[23:47.60]But I guess that was just
[23:48.60]A way to sell
[23:49.60]What you were doing
[23:50.60]Which is you're adding
[23:51.60]Extra features on grouping
[23:52.60]My concepts
[23:53.60]The GPT-4 pivot
[23:54.60]Quote unquote pivot
[23:55.60]Yeah yeah
[23:56.60]Exactly
[23:57.60]Yeah yeah
[23:58.60]When we launched
[23:59.60]This workflow
[24:00.60]Now that GPT-4
[24:01.60]Was available
[24:02.60]Basically
[24:03.60]Elisa was at a place
[24:04.60]Where we have very tabular
[24:05.60]Interfaces
[24:06.60]So given a table of papers
[24:07.60]You can extract data
[24:08.60] Across all the tables
[24:09.60]But you kind of want
[24:10.60]To take the analysis
[24:11.60]A step further
[24:12.60]Sometimes what you'd care
[24:13.60]About is not having
[24:14.60]A list of papers
[24:15.60]But a list of arguments
[24:17.60]A list of effects
[24:18.60]A list of interventions
[24:19.60]A list of techniques
[24:20.60]And so that's
[24:21.60]One of the things we're
[24:22.60]Working on is now that
[24:23.60]You've extracted this information
[24:24.60]A way
[24:25.60]Can you pivot it
[24:26.60]Or group by
[24:27.60]Whatever the information
[24:28.60]That you extracted
[24:29.60]To have more insight
[24:30.60]First information
[24:31.60]Still supported
[24:32.60]By the academic literature
[24:33.60]Yeah
[24:34.60]There was a big revelation
[24:35.60]When I saw it
[24:36.60]Basically I think
[24:37.60]I'm very just impressed
[24:38.60]By how first principles
[24:39.60]Your ideas
[24:40.60]Around the workflow is
[24:42.60]And I think
[24:43.60]That's why
[24:44.60]You're not as reliant
[24:45.60]On like the LM
[24:46.60]Improving
[24:47.60]Because it's actually
[24:48.60]Just about improving
[24:49.60]The workflow
[24:50.60]That you will recommend
[24:51.60]To people
[24:52.60]Today we might call
[24:53.60]It's rely on
[24:54.60]This is the way
[24:55.60]That elicit
[24:56.60]Does research
[24:57.60]And this is
[24:58.60]What we think
[24:59.60]Is most effective
[25:00.60]Based on talking to our users
[25:01.60]The problem space
[25:02.60]Is still huge
[25:03.60]Like if it's
[25:04.60]Like this big
[25:05.60]We're all still operating
[25:06.60]At this tiny part
[25:07.60]Bit of it
[25:08.60]So you know
[25:09.60]I think about this a lot
[25:10.60]In the context of motes
[25:11.60]People are like
[25:12.60]Oh what's your mode
[25:13.60]What happens
[25:14.60]If GPT-5 comes out
[25:15.60]It's like if GPT-5 comes out
[25:16.60]There's still like
[25:17.60]All of this other space
[25:18.60]That we can go into
[25:19.60]And so I think being
[25:20.60]Really obsessed
[25:21.60]With the problem
[25:22.60]It's a robust
[25:23.60]And just kind of
[25:24.60]Directly incorporate
[25:25.60]Model improvements
[25:26.60]And they keep going
[25:27.60]And then I first encountered
[25:28.60]You guys with Charlie
[25:29.60]You can tell us
[25:30.60]About that project
[25:31.60]Basically yeah
[25:32.60]Like how much did cost
[25:34.60]Become a concern
[25:35.60]As you're working more
[25:36.60]And more with OpenAI
[25:37.60]How do you manage
[25:38.60]That relationship
[25:39.60]Let me talk about
[25:40.60]Who Charlie is
[25:41.60]You can talk about that
[25:42.60]Charlie is a special character
[25:43.60]So Charlie
[25:44.60]When we found him
[25:45.60]Was had just finished
[25:46.60]His freshman year
[25:47.60]At the University of Warwick
[25:48.60]I think he had heard
[25:49.60]About us on some discord
[25:50.60]And then he applied
[25:51.60]And then we just saw
[25:52.60]That he had done so many
[25:53.60]Incredible side projects
[25:54.60]And we were actually
[25:55.60]On a team retreat
[25:56.60]In Barcelona
[25:57.60]Visiting our head of engineering
[25:58.60]At that time
[25:59.60]And everyone was talking
[26:00.60]About this wonder kid
[26:01.60]They're like this kid
[26:02.60]And then on our take home
[26:03.60]Project he had done
[26:04.60]Like the best of anyone
[26:05.60]To that point
[26:06.60]And so people were
[26:07.60]Just like so excited
[26:08.60]To hire him
[26:09.60]So we hired him
[26:10.60]As an intern
[26:11.60]And then we're like Charlie
[26:12.60]What if you just dropped
[26:13.60]Out of school
[26:14.60]And so then we convinced
[26:15.60] him to take a year off
[26:16.60]And he's just
[26:17.60]Incredibly productive
[26:18.60]And I think the thing
[26:19.60]You're referring to
[26:20.60]He kind of launched
[26:21.60]Their constitutional AI paper
[26:23.60]And within a few days
[26:24.60]I think four days
[26:25.60]He had basically implemented
[26:26.60]That in production
[26:27.60]And then we had it
[26:28.60]In app a week or so after that
[26:30.60]And he has since kind of
[26:31.60]Contributed to major improvements
[26:33.60]Like cutting costs down
[26:34.60]To a tenth of what they were
[26:36.60]Really large scale
[26:37.60]But yeah, you can talk
[26:38.60]About the technical stuff
[26:39.60]Yeah, on the
[26:40.60]Constitutional AI project
[26:41.60]This was for abstract summarization
[26:43.60]Where in illicit
[26:44.60]If you run a query
[26:45.60]It'll return papers to you
[26:47.60]And then it will summarize
[26:48.60]Each paper
[26:49.60]The query for you
[26:50.60]On the fly
[26:51.60]And that's a really
[26:52.60]Important part of illicit
[26:53.60]Because illicit does it so much
[26:55.60]We run a few searches
[26:56.60]It'll have done it
[26:57.60]A few hundred times for you
[26:58.60]And so we cared a lot
[26:59.60]About this both
[27:00.60]Being like fast, cheap
[27:02.60]And also very low on hallucination
[27:04.60]I think if illicit
[27:05.60]Hollucinate something
[27:06.60]About the abstract
[27:07.60]That's really not good
[27:08.60]And so what Charlie did
[27:09.60]In that project was
[27:11.60]Created a constitution
[27:12.60]That expressed
[27:13.60]Where are the attributes
[27:14.60]Of a good summary
[27:15.60]Everything in the summary
[27:16.60]Is reflected in the actual abstract
[27:18.60]It was like
[27:19.60]Very concise
[27:20.60]Etc.
[27:21.60]And then
[27:22.60]Used RLHF
[27:24.60]With a model
[27:25.60]That was trained
[27:26.60]On the constitution
[27:27.60]To basically
[27:29.60]Find you a better
[27:30.60]Summarizer
[27:31.60]On an open source model
[27:32.60]Yeah, I think
[27:33.60]That might still be in use
[27:34.60]Yeah, yeah, definitely
[27:35.60]Yeah, I think
[27:36.60]At the time
[27:37.60]The models hadn't been
[27:38.60]Trained at all
[27:39.60]To be faithful to a text
[27:41.60]So they were just generating
[27:42.60]So then when you
[27:43.60]Ask them a question
[27:44.60]They tried too hard
[27:45.60]To answer the question
[27:46.60]And didn't try hard
[27:47.60]Answer the question
[27:48.60]Given the text
[27:49.60]Or answer what the text
[27:50.60] Said about the question
[27:51.60]So we had to
[27:52.60]Basically teach the models
[27:53.60]To do that specific task
[27:54.60]How do you monitor
[27:55.60]The ongoing performance
[27:57.60]Of your models
[27:58.60]Not to get
[27:59.60]To LLMopsy
[28:00.60]But you are one of the
[28:01.60]Larger more well-known
[28:02.60]Operations
[28:03.60]Doing NLP at scale
[28:04.60]I guess effectively
[28:06.60]Like you have to monitor
[28:07.60]These things and nobody
[28:08.60]Has a good answer
[28:09.60]That talks to you
[28:10.60]Yeah, I don't think
[28:11.60]We have a good answer yet
[28:12.60]I think the answers
[28:13.60]Are actually a little bit
[28:14.60]Clear on the
[28:15.60]Just kind of basic
[28:16.60]The business side
[28:17.60]Of where you can
[28:18.60]Import ideas
[28:19.60]From normal
[28:20.60]Soft engineering
[28:21.60]And normal kind
[28:22.60]Of DevOps
[28:23.60]You're like
[28:24.60]Well, you need to
[28:25.60]Monitor kind
[28:26.60]Of latencies
[28:27.60]And response times
[28:28.60]And optime and whatnot
[28:29.60]Performance is more
[28:30.60]Of hallucination rate
[28:31.60]And then things
[28:32.60]Like hallucination rate
[28:33.60]Where I think there
[28:34.60]The really
[28:35.60]Important thing
[28:36.60]Is training time
[28:37.60]So we care a lot
[28:38.60]About having
[28:39.60]Our own internal
[28:41.60]Benchmarks
[28:42.60]For model development
[28:44.60]That reflect
[28:45.60]So that we can
[28:46.60]Know ahead of time
[28:47.60]How well
[28:48.60]Is the model
[28:49.60]Gonna perform
[28:50.60]On different types
[28:51.60]Of tasks
[28:52.60]So the tasks being
[28:53.60]Summarization
[28:54.60]Question answering
[28:55.60]Given a paper
[28:56.60]Ranking
[28:57.60]And for each of those
[28:58.60]We wanna know
[28:59.60]What's the distribution
[29:00.60]Of things the model
[29:01.60]Is gonna see
[29:02.60]So that we can
[29:03.60]Have well-calibrated
[29:04.60]Predictions on
[29:05.60]How well the model
[29:06.60]Is gonna do in production
[29:07.60]And I think, yeah,
[29:08.60]There's like
[29:09.60]Some chance
[29:10.60]That there's distribution
[29:11.60]Shift and actually
[29:12.60]The things users enter
[29:13.60]Are gonna be different
[29:14.60]Trainings right
[29:15.60]And having
[29:16.60]Very high quality
[29:17.60]Well-vetted data
[29:18.60]Sets at training time
[29:19.60]I think we also
[29:20.60]End up effectively
[29:21.60]Monitoring by trying
[29:22.60]To evaluate new models
[29:23.60]As they come out
[29:24.60]And so that like
[29:25.60]Kind of prompts us
[29:26.60]To go through
[29:27.60]Our eval suite
[29:28.60]Every couple of months
[29:29.60]And so every time
[29:30.60]A new model comes out
[29:31.60]We have to see
[29:32.60]Like how is this performing
[29:33.60]Relative to production
[29:34.60]And what we currently have
[29:35.60]Yeah, I mean
[29:36.60]Since we're on this topic
[29:37.60]Any new models
[29:38.60]That really call
[29:39.60]Your eye this year
[29:40.60]Like cloud came out
[29:41.60]Yeah, I think cloud
[29:42.60]Is pretty pretty
[29:43.60]Like a good point
[29:44.60]On the kind of
[29:45.60]Predo frontier
[29:46.60]It's neither
[29:47.60]The cheapest model
[29:48.60]Nor is it
[29:49.60]The most accurate
[29:51.60]Most high quality model
[29:52.60]But it's just
[29:53.60]Like a really good tradeoff
[29:54.60]Between cost and accuracy
[29:56.60]You apparently
[29:57.60]Have to 10 shot it
[29:58.60]To make it good
[29:59.60]I tried using
[30:00.60]Aiku for summarization
[30:01.60]But zero shot
[30:02.60]Was not great
[30:03.60]Then they were like
[30:04.60]You know, it's a skill issue
[30:05.60]You have to try it harder
[30:06.60]Interesting
[30:07.60]I think GPT-4
[30:08.60]Unlocked tables for us
[30:10.60]Processing data from tables
[30:11.60]Which was huge
[30:12.60]GPT-4 vision
[30:13.60]Yeah
[30:14.60]Did you try for you
[30:15.60]I guess you can't try for you
[30:16.60]Because it's noncommercial
[30:17.60]That's the adept model
[30:18.60]Yeah, we haven't tried that one
[30:19.60]Yeah
[30:20.60]Yeah, but cloud is multimodal as well
[30:22.60]Yeah
[30:23.60]I think the interesting insight
[30:24.60]That we got from talking to David Luan
[30:25.60]Who is CEO of Adept
[30:26.60]Was that multimodality
[30:28.60]Has effectively two different flavors
[30:30.60]Like one is
[30:31.60]Rerecognize images from a camera
[30:33.60]In the outside natural world
[30:35.60]And actually the more important
[30:37.60]Multimodality for knowledge work
[30:38.60]Is screenshots
[30:39.60]And you know
[30:40.60]PDFs and charts and graphs
[30:42.60]So we need a new term
[30:43.60]For that kind of multimodality
[30:45.60]But is a claim
[30:46.60]That current models
[30:47.60]Are good at one or the other
[30:49.60]Yeah, they're over index
[30:50.60]Because of the history of computer vision
[30:51.60]Is coco, right?
[30:53.60]So now we're like
[30:54.60]Oh, actually, you know
[30:55.60]Screens are more important
[30:56.60]OCR handwriting
[30:58.60]You mentioned a lot of
[30:59.60]Closed model lab stuff
[31:01.60]And then you also have
[31:02.60]Like this open source model
[31:03.60]Find tuning stuff
[31:04.60]Like what is your workload
[31:05.60]Now between close and open
[31:06.60]It's a good question
[31:07.60]I think
[31:08.60]It's half and half
[31:09.60]Is that even a relevant question
[31:10.60]Or not
[31:11.60]This is a nonsensical question
[31:12.60]It depends a little bit on
[31:13.60]Like how you index
[31:14.60]Whether you index by
[31:15.60]Like computer cost
[31:16.60]The number of queries
[31:17.60]I'd say like
[31:18.60]In terms of number of queries
[31:19.60]Is maybe similar
[31:20.60]In terms of like costing computer
[31:22.60]I think the closed models
[31:23.60]Make up more of the budget
[31:25.60]Since the main cases
[31:26.60]Where you want to use closed models
[31:28.60]Are cases where
[31:29.60]They're just smarter
[31:31.60]Where there are no existing
[31:33.60]Open source models
[31:34.60]Are quite smart enough
[31:35.60]Yeah
[31:36.60]We have a lot of
[31:37.60]Interesting technical questions
[31:38.60]To go in
[31:39.60]But just to wrap
[31:40.60]The kind of like
[31:41.60]UX evolution
[31:42.60]Now you have the notebooks
[31:43.60]We talked a lot
[31:44.60]About how chatbots
[31:45.60]Are not the final frontier
[31:47.60]You know
[31:48.60]How did you decide
[31:49.60]To get into notebooks
[31:50.60]Which is a very iterative
[31:51.60]Kind of like interactive
[31:52.60]Interface
[31:53.60]And yeah
[31:54.60]Maybe learnings from that
[31:55.60]Yeah this is actually
[31:56.60]Our fourth time
[31:57.60]Trying to make this work
[31:59.60]I think the first time
[32:00.60]Was probably in early 2021
[32:03.60]I think because
[32:04.60]We've always been obsessed
[32:05.60]With this idea of task
[32:06.60]Decomposition
[32:07.60]And like branching
[32:08.60]We always wanted a tool
[32:10.60]That could be kind of
[32:11.60]Unbounded
[32:12.60]Where you could keep going
[32:13.60]Could do a lot of branching
[32:14.60]Where you could kind of apply
[32:15.60]Language model operations
[32:17.60]Or computations on other tasks
[32:19.60]So in 2021
[32:20.60]We had this thing called
[32:21.60]Composite tasks
[32:22.60]Where you could use GPT-3
[32:23.60]To brainstorm
[32:24.60]A bunch of research questions
[32:25.60]And then take
[32:26.60]Each research question
[32:27.60]And decompose those
[32:28.60]Further into subquestions
[32:30.60]This kind of again
[32:31.60]That like task decomposition
[32:32.60]Tree type thing
[32:33.60]Was always very exciting to us
[32:35.60]But that was like
[32:36.60]It was kind of overwhelming
[32:37.60]Then at the end of 22
[32:39.60]I think we tried again
[32:40.60]And at that point
[32:41.60]We were thinking
[32:42.60]Okay we've done a lot
[32:43.60]With this literature review thing
[32:44.60]We also want to start helping
[32:45.60]With kind of adjacent domains
[32:47.60]And different workflows
[32:48.60]Like we want to help more
[32:49.60]With machine learning
[32:50.60]What does that look like
[32:51.60]And as we were thinking
[32:52.60]About it we're like
[32:53.60]Well there are so many
[32:54.60]Research workflows
[32:55.60]How do we not just build
[32:56.60]Three new workflows
[32:57.60]Into elicit
[32:58.60]But make elicit
[32:59.60]Really generic
[33:00.60]To lots of workflows
[33:01.60]What is like a generic
[33:02.60]Composable system
[33:03.60]With nice abstractions
[33:04.60]That can like
[33:05.60]Scale to all these workflows
[33:06.60]So we like
[33:07.60]Iterated on that a bunch
[33:08.60]And like
[33:09.60]Didn't quite narrow
[33:10.60]The problem space enough
[33:11.60]Or like
[33:12.60]Get to what we wanted
[33:13.60]And then I think it was
[33:14.60]At the beginning of 2023
[33:16.60]We were like
[33:17.60]Wow computational notebooks
[33:18.60]Kind of enable this
[33:19.60]Where they have a lot
[33:20.60]Of flexibility
[33:21.60]But you know
[33:22.60]Kind of robust primitive
[33:23.60]Such that you can extend
[33:24.60]The workflow
[33:25.60]And it's not limited
[33:26.60]It's not like
[33:27.60]You ask a query
[33:28.60]You get an answer
[33:29.60]You're done
[33:30.60]You can just constantly
[33:31.60]Keep building on top of that
[33:32.60]And each little step
[33:33.60]Seems like a really good
[33:34.60]Work for the language model
[33:35.60]And also there was just
[33:36.60]Like really helpful
[33:37.60]To have a bit more
[33:38.60]Preexisting work to emulate
[33:40.60]Yeah, that's kind of
[33:41.60]How we ended up at
[33:42.60]Computational notebooks
[33:43.60]For elicit
[33:44.60]Maybe one thing
[33:45.60]That's worth making explicit
[33:46.60]Is the difference between
[33:47.60]Computational notebooks
[33:48.60]And chat because
[33:49.60]On the surface
[33:50.60]They seem pretty similar
[33:51.60]It's kind of this iterative
[33:52.60]Interaction where you add stuff
[33:53.60]In both cases
[33:54.60]You have a back and forth
[33:55.60]Between you enter stuff
[33:56.60]And then you get some output
[33:57.60]And then you enter stuff
[33:58.60]But the important difference
[33:59.60]In our minds is
[34:00.60]With notebooks
[34:01.60]You can define a process
[34:03.60]So in data science
[34:04.60]You know like
[34:05.60]Here's like my data analysis
[34:06.60]Process that takes in a CSV
[34:08.60]And then does some extraction
[34:09.60]And then generates a figure
[34:10.60]At the end
[34:11.60]And you can prototype it
[34:13.60]Using a small CSV
[34:14.60]And then you can run it
[34:15.60]Over a much larger CSV
[34:16.60]Later
[34:17.60]And similarly
[34:18.60]The vision for notebooks
[34:19.60]In our case
[34:20.60]Is to not make it this
[34:22.60]Like one of chat interaction
[34:23.60]But to allow you to then
[34:25.60]Say if you start
[34:27.60]And first you're like
[34:28.60]Okay, let me just
[34:29.60]Analyze a few papers
[34:30.60]And see do I get to
[34:31.60]The correct conclusions
[34:32.60]For those few papers
[34:33.60]Can I then later
[34:34.60]Go back and say
[34:35.60]Now let me run this
[34:36.60]Over 10,000 papers
[34:38.60]Now that I've debug
[34:39.60]The process
[34:40.60]Using a few papers
[34:41.60]And that's an interaction
[34:42.60]That doesn't fit
[34:43.60]Quite as well
[34:44.60]Into the chat framework
[34:45.60]Because that's more
[34:46.60]For kind of quick
[34:47.60]Back and forth
[34:48.60]Interaction
[34:49.60]Do you think in notebooks
[34:50.60]That's kind of like
[34:51.60]Structure, editable
[34:52.60]Chain of thought
[34:53.60]Basically step by step
[34:54.60]Like is that kind of
[34:55.60]Where you see this going
[34:56.60]And then are people
[34:57.60]Gonna reuse notebooks
[34:59.60]As like templates
[35:00.60]And maybe in traditional
[35:01.60]Notebooks
[35:02.60]As like cookbooks
[35:03.60]Right, you share a cookbook
[35:04.60]You can start from there
[35:05.60]Is that similar
[35:06.60]And illicit
[35:07.60]Yeah, that's exactly right
[35:08.60]So that's our hope
[35:09.60]That people will build templates
[35:10.60]Share them with other people
[35:12.60]I think chain of thought
[35:13.60]Is maybe still like
[35:14.60]Kind of one level
[35:15.60]Lower on the abstraction hierarchy
[35:17.60]Then we would think of notebooks
[35:19.60]I think we'll probably
[35:20.60]Want to think about
[35:21.60]More semantic pieces
[35:22.60]Like a building block
[35:23.60]Is more like a paper search
[35:25.60]Or an extraction
[35:26.60]Or a list of concepts
[35:28.60]And then the models
[35:30.60]And the reasoning
[35:31.60]Will probably often be
[35:32.60]One level down
[35:33.60]You always want to
[35:34.60]Be able to see it
[35:35.60]But you don't always
[35:36.60]Want it to be front and center
[35:37.60]Yeah, what's the difference
[35:38.60]Between a notebook
[35:39.60]And an agent
[35:40.60]Since everybody always
[35:41.60]Ask me what's an agent
[35:42.60]Like how do you think
[35:43.60]About where the line is
[35:45.60]Yeah, it's an interesting
[35:46.60]Question
[35:47.60]In the notebook world
[35:48.60]I would
[35:49.60]Generally think of
[35:50.60]The human as the agent
[35:51.60]In the first iteration
[35:52.60]So you have the notebook
[35:53.60]And the human kind of
[35:54.60]Adds little action steps
[35:56.60]And then the next point
[35:58.60]On this kind of progress
[35:59.60]Okay, now you can use
[36:00.60]Language models to predict
[36:01.60]Which action
[36:02.60]Would you take as a human
[36:03.60]And at some point
[36:04.60]You're probably going to
[36:05.60]Be very good at this
[36:06.60]You'll be like, okay
[36:07.60]In some cases, I can
[36:08.60]With 99.9% accuracy
[36:09.60]Predict what you do
[36:10.60]And then you might
[36:11.60]As well just execute it
[36:12.60]Like why wait for the human
[36:13.60]And eventually
[36:14.60]As you get better at this
[36:15.60]That will just look
[36:16.60]More and more like agents
[36:18.60]Taking actions
[36:19.60]As opposed to you
[36:20.60]Doing the thing
[36:21.60]I think templates
[36:22.60]Are a specific case of this
[36:23.60]Very like, okay, well
[36:24.60]There's just particular
[36:25.60]Sequences of actions
[36:26.60]That you often want to chunk
[36:27.60]And have available
[36:28.60]Just like in normal
[36:29.60]Programming
[36:30.60]And those
[36:31.60]You can view them as
[36:32.60]Action sequences of agents
[36:33.60]Or you can view them as
[36:34.60]More normal programming
[36:36.60]Language abstraction thing
[36:37.60]And I think those
[36:38.60]Are two valid views
[36:40.60]How do you see this
[36:41.60]Changes
[36:42.60]Like you said, the models
[36:43.60]Get better and you need
[36:44.60]Less and less human
[36:45.60]Actual interfacing
[36:47.60]With the model
[36:48.60]You just get the results
[36:49.60]Like how does the UX
[36:50.60]And the way people
[36:51.60]Perceive it change
[36:52.60]Yeah, I think this
[36:53.60] kind of interaction
[36:54.60]Paradimes for evaluation
[36:55.60]Is not really something
[36:56.60]The internet has encountered
[36:57.60]Yet because up to now
[36:58.60]The internet has all been
[36:59.60]About getting data
[37:00.60]And work from people
[37:02.60]So increasingly
[37:03.60]I really want kind of
[37:04.60]Evaluation both from
[37:05.60]An interface perspective
[37:06.60]And from like a
[37:07.60]Technical perspective
[37:08.60]Operation perspective
[37:09.60]To be a superpower
[37:10.60]For elicit because I think
[37:11.60]Over time models will do
[37:12.60]More and more of the work
[37:13.60]And people will have
[37:14.60]To do more and more
[37:15.60]Of the evaluation
[37:16.60]So I think yeah
[37:17.60]In terms of the interface
[37:18.60]Some of the things we have
[37:19.60]Today, you know
[37:20.60]For every kind of
[37:21.60]Language model generation
[37:22.60]There's some citation back
[37:23.60]And we kind of try to
[37:24.60]Highlight the ground truth
[37:25.60]In the paper
[37:26.60]To whatever elicit said
[37:27.60]And make it super easy
[37:28.60]So you can click on it
[37:29.60]And quickly see
[37:30.60]In context and validate
[37:31.60]Whether the text
[37:32.60]Actually supports
[37:33.60]The answer that elicit gave
[37:34.60]So I think we'd probably
[37:35.60]Want to scale things up
[37:37.60]Like that, like the ability
[37:38.60]To kind of spot check
[37:39.60]The models work super
[37:40.60]Quickly scale up
[37:41.60]Interfaces like that
[37:42.60]And who would spot check
[37:44.60]The user
[37:45.60]Yeah, to start
[37:46.60]It would be the user
[37:47.60]One of the other things
[37:48.60]We do is also kind of flag
[37:49.60]The models uncertainty
[37:50.60]So we have models report
[37:52.60]Out how confident are you
[37:53.60]That this was the
[37:54.60]Sample size of this study
[37:55.60]The model's not sure
[37:56.60]We throw a flag
[37:57.60]And so the user knows
[37:58.60]To prioritize checking that
[37:59.60]So again, we can kind of
[38:00.60]Scale that up
[38:01.60]So when the model's like
[38:02.60]Well, I searched this
[38:03.60]On Google, I'm not sure
[38:04.60]If that was the right thing
[38:05.60]I have an uncertainty flag
[38:06.60]And the user can go
[38:07.60]And be like, okay
[38:08.60]That was actually
[38:09.60]The right thing to do or not
[38:10.60]I've tried to do
[38:11.60]Uncertainty ratings
[38:12.60]From models
[38:13.60]I don't know
[38:14.60]If you have this live
[38:15.60]Because I just
[38:16.60]Didn't find them reliable
[38:17.60]Because they just elucidated
[38:18.60]Their own uncertainty
[38:19.60]I would love to
[38:20.60]Based on log probes
[38:22.60]Or something more
[38:23.60]Native within the model
[38:24.60]Better than generated
[38:25.60]But it sounds like
[38:27.60]The scale properly for you
[38:29.60]Yeah, we found it
[38:30.60]To be pretty calibrated
[38:31.60]Diverse on the model
[38:32.60]I think in some cases
[38:33.60]We also used
[38:34.60]To different models
[38:35.60]For the answer estimates
[38:36.60]Then for the question
[38:37.60]Answering
[38:38.60]So one model would say
[38:39.60]Here's my chain of thought
[38:40.60]Here's my answer
[38:41.60]And then a different
[38:42.60]Type of model
[38:43.60]Let's say the first model
[38:44.60]Is Lama
[38:45.60]And let's say the second
[38:46.60]Model is GP3.5
[38:47.60]And then the second model
[38:49.60]Just looks over
[38:50.60]The results and like
[38:51.60]Okay, how confident
[38:52.60]Are you in this
[38:53.60]And I think
[38:54.60]Sometimes using
[38:55.60]A different model
[38:56.60]Can be better than
[38:57.60]Using the same model
[38:58.60]Yeah, you know
[38:59.60]On topic of models
[39:00.60]Evaluating models
[39:01.60]Obviously you can
[39:02.60]Do that all day long
[39:03.60]Like what's your budget
[39:04.60]Like because
[39:05.60]Your queries
[39:06.60]Fan out a lot
[39:07.60]And then you have
[39:08.60]Models evaluating models
[39:09.60]One person typing
[39:10.60]In a question
[39:11.60]Can lead to
[39:12.60]A thousand calls
[39:13.60]It depends on the project
[39:14.60]So if the project
[39:15.60]Is basically
[39:16.60]A systematic review
[39:17.60]That otherwise
[39:18.60]Human research assistance
[39:19.60]Would do
[39:20.60]Then the project
[39:21.60]Is basically
[39:22.60]Can get quite large
[39:23.60]For those projects
[39:24.60]I don't know
[39:25.60]Let's say
[39:26.60]A hundred thousand dollars
[39:27.60]So in those cases
[39:28.60]You're happier
[39:29.60]To spend compute
[39:30.60]Then in the
[39:31.60]Can of shallow search case
[39:32.60]Where someone
[39:33.60]Just enters a question
[39:34.60]Because I don't know
[39:35.60]Maybe like it
[39:36.60]I heard about creatine
[39:37.60]What's it about
[39:38.60]Probably don't want
[39:39.60]To spend a lot of compute
[39:40.60]On that
[39:41.60]This sort of
[39:42.60]Being able to invest
[39:43.60]More or less compute
[39:44.60]Into getting
[39:45.60]More or less accurate answers
[39:46.60]I think one of the
[39:47.60]Core things we care about
[39:48.60]And that I think
[39:49.60]Is currently undervalued
[39:50.60]In the AI space
[39:51.60]You can't choose
[39:52.60]Which model you want
[39:53.60]And you can sometimes
[39:54.60]I don't know
[39:55.60]You'll tip it
[39:56.60]It'll try harder
[39:57.60]Or you can try various
[39:58.60]Things to get it to work harder
[40:00.60]But you don't have great
[40:01.60]Ways of converting
[40:02.60]Willingness to spend
[40:03.60]Into better answers
[40:04.60]And we really
[40:05.60]Want to build a product
[40:06.60]That has this sort of
[40:07.60]Unbounded flavor
[40:08.60]Where like if you care
[40:09.60]About it a lot
[40:10.60]You should be able to get
[40:11.60]Really high quality answers
[40:12.60]Really double-checked
[40:13.60]In every way
[40:14.60]And you have a
[40:15.60]Credit-based pricing
[40:16.60]So unlike most products
[40:17.60]It's not a fixed monthly
[40:19.60]Right exactly
[40:20.60]Some of the
[40:21.60]Higher costs are
[40:22.60]Teared
[40:23.60]So for most casual users
[40:25.60]They'll just get
[40:26.60]The abstract summary
[40:27.60]Which is kind of
[40:28.60]An open source model
[40:29.60]Then you can
[40:30.60]Add more columns
[40:31.60]Which have more
[40:32.60]Extractions
[40:33.60]And these uncertainty features
[40:34.60]And then you can also
[40:35.60]Add the same columns
[40:36.60]In high accuracy mode
[40:37.60]Which also parses the table
[40:38.60]So we kind of
[40:39.60]Stack the complexity
[40:40.60]And the cost
[40:41.60]You know the fun thing
[40:42.60]You can do with a credit system
[40:43.60]Which is data for data
[40:44.60]Basically you can
[40:45.60]Give people more credit
[40:46.60]If they give
[40:47.60]Data back to you
[40:48.60]I don't know
[40:49.60]You don't have money
[40:50.60]But you have time
[40:51.60]How do you exchange that
[40:53.60]It's a fair trade
[40:54.60]I think it's interesting
[40:55.60]We haven't quite operationized it
[40:56.60]And then you know
[40:57.60]There's been some kind of like
[40:58.60]Adverse selection
[40:59.60]Like you know for example
[41:00.60]It would be really valuable
[41:01.60]To get feedback on our model
[41:02.60]So maybe if you were willing
[41:03.60]To give more robust feedback
[41:04.60]On our results
[41:05.60]We could give you credits
[41:06.60]Or something like that
[41:07.60]But then there's kind of this
[41:08.60]Will people take it seriously
[41:09.60]And you want the good people
[41:10.60]Exactly
[41:11.60]Can you tell who are the good people
[41:12.60]Not right now
[41:13.60]But yeah maybe
[41:14.60]At the point where we can
[41:15.60]We can offer it
[41:16.60]We can offer it up to them
[41:17.60]The perplexity of questions asked
[41:18.60]If it's higher perplexity
[41:19.60]These are smarter people
[41:20.60]Yeah maybe
[41:21.60]And if you make a lot of typos
[41:22.60]In your queries
[41:23.60]You're not going to get off
[41:24.60]How does that change
[41:25.60]Negative social credit
[41:28.60]It's very topical right now
[41:29.60]To think about
[41:30.60]The threat of long context windows
[41:32.60]All these models
[41:34.60]We're talking about these days
[41:35.60]All like a million tokens plus
[41:36.60]Is that relevant for you
[41:38.60]Can you make use of that
[41:39.60]Is that just prohibitively expensive
[41:41.60]Because you're just paying
[41:42.60]For all those tokens
[41:43.60]Or you're just doing right
[41:44.60]It's definitely relevant
[41:45.60]And when we think about search
[41:46.60]As many people do
[41:47.60]We think about kind of
[41:48.60]A staged pipeline
[41:49.60]Of retrieval
[41:50.60]Where first you use
[41:51.60]Semitic search database
[41:53.60]With embeddings
[41:54.60]Get like the
[41:55.60]In our case maybe 400
[41:56.60]Or so most relevant papers
[41:57.60]And then
[41:58.60]You still need to rank those
[41:59.60]And I think at that point
[42:01.60]It becomes pretty interesting
[42:02.60]To use larger models
[42:04.60]So specifically in the past
[42:06.60]I think a lot of ranking
[42:07.60]Was kind of per item ranking
[42:09.60]Where you would score
[42:10.60]Each individual item
[42:11.60]Maybe using increasingly
[42:12.60]Expensive scoring methods
[42:13.60]And then rank based on the scores
[42:15.60]But I think list wise
[42:16.60]We ranking where
[42:17.60]You have a model
[42:18.60]That can see
[42:19.60]All the elements
[42:20.60]Is a lot more powerful
[42:21.60]Because often you can
[42:22.60]Only really tell
[42:23.60]How good a thing is
[42:24.60]In comparison to other things
[42:26.60]And what things should come first
[42:28.60]It really depends on
[42:29.60]Like well what other things
[42:30.60]Are available
[42:31.60]Maybe you even care about
[42:32.60]Diversity and your results
[42:33.60]You don't want to show
[42:34.60]Ten very similar papers
[42:35.60]As the first 10 results
[42:36.60]So I think along context models
[42:38.60]Are quite interesting there
[42:40.60]And especially for our case
[42:41.60]Where we care more about
[42:43.60]Power users who are perhaps
[42:45.60]A little bit more
[42:46.60]Welling to wait a little bit longer
[42:47.60]To get higher quality results
[42:48.60]Relative to people who just
[42:50.60]Quickly check out things
[42:51.60]Because why not
[42:52.60]I think being able to spend
[42:53.60]More on longer context
[42:54.60]Is quite valuable
[42:55.60]Yeah I think one thing
[42:56.60]The longer context models
[42:57.60]Changed for us
[42:58.60]Is maybe a focus from
[43:00.60]Breaking down tasks
[43:01.60]To breaking down the evaluation
[43:03.60]So before you know
[43:05.60]If we wanted to answer
[43:06.60]A question from the full text
[43:08.60]Of a paper
[43:09.60]We had to figure out
[43:10.60]How to chunk it and like
[43:11.60]Find the relevant chunk
[43:12.60]And then answer
[43:13.60]Based on that chunk
[43:14.60]Then you know
[43:15.60]Which chunk the model
[43:16.60]Used to answer the question
[43:17.60]So if you want to help
[43:18.60]The user to check it
[43:19.60]Yeah you can be like
[43:20.60]Well this was the chunk
[43:21.60]That the model got
[43:22.60]And now if you put the whole
[43:23.60]Text in the paper
[43:24.60]You have to kind of
[43:25.60]Find the chunk
[43:26.60]Like more retroactively
[43:27.60]Basically and so you need
[43:28.60]Kind of like a different
[43:29.60]Set of abilities
[43:30.60]And obviously like
[43:31.60]A different technology
[43:32.60]To figure out
[43:33.60]You still want to point
[43:34.60]The user to the supporting
[43:35.60]Quotes in the text
[43:36.60]But then the interaction
[43:37.60]Is a little different
[43:38.60]You like scan through
[43:39.60]And find some rouge score
[43:40.60]Yeah the floor
[43:41.60]I think there's an
[43:42.60]Interesting space of
[43:43.60]Almost research problems
[43:44.60]Here because
[43:45.60]You would ideally
[43:46.60]Make causal claims
[43:47.60]Like if this
[43:48.60]Hadn't been in the text
[43:49.60]The model wouldn't
[43:50.60]Have said this thing
[43:51.60]And maybe you can do
[43:52.60]Expensive approximations
[43:53.60]To that where like
[43:54.60]I don't know you just
[43:55.60]Throw a chunk of the paper
[43:56.60]And re-answer
[43:57.60]And see what happens
[43:58.60]But hopefully
[43:59.60]There are better
[44:00.60]Ways of doing that
[44:01.60]Where you just get
[44:03.60]That kind of counterfactual
[44:04.60]Information for free
[44:05.60]From the model
[44:06.60]Do you think at all
[44:07.60]About the cost of maintaining
[44:09.60]Reg versus just putting
[44:10.60]More tokens in the window
[44:12.60]I think in software
[44:13.60]Development a lot of
[44:14.60]Times people buy
[44:15.60]Developer productivity
[44:16.60]Things so that
[44:17.60]We don't have to worry
[44:18.60]About it context windows
[44:19.60]Kinda the same right
[44:20.60]You have to maintain
[44:21.60]Chunking and like
[44:22.60]Reg retrieval and like
[44:23.60]Re-ranking and all of this
[44:24.60] Versus I just shove
[44:25.60]Everything into the context
[44:26.60]And like it costs
[44:27.60]A little more
[44:28.60]But at least I don't
[44:29.60]Have to do all of that
[44:30.60]Is that something
[44:31.60]You thought about
[44:32.60]I think we still
[44:33.60]Like hit up against
[44:34.60]Context limits enough
[44:35.60]That it's not really
[44:36.60]Do we still want
[44:37.60]To keep this rag around
[44:38.60]It's like we do still
[44:39.60]Need it for the scale
[44:40.60]The worth we're doing
[44:41.60]I think there are
[44:42.60]Different kinds of
[44:43.60]Maintainability in
[44:44.60]One sense I think
[44:45.60]You write that
[44:46.60]Throw everything into
[44:47.60]The context window thing
[44:48.60]Is easier to maintain
[44:49.60]Because you just
[44:50.60]Can swap out a model
[44:52.60]In another sense
[44:53.60]If things go wrong
[44:54.60]It's harder to debug
[44:55.60]Like if you know
[44:56.60]Here's the process
[44:57.60]That we go through
[44:58.60]To go from
[45:00.60]200 million papers
[45:01.60]To an answer
[45:02.60]And there are like
[45:03.60]Little steps
[45:04.60]And you understand
[45:05.60]Okay this is the step
[45:06.60]That finds the relevant
[45:07.60]Paragraph or whatever
[45:08.60]Maybe you'll know
[45:09.60]Which step breaks
[45:10.60]If it's just like
[45:11.60]A new model
[45:12.60]Version came out
[45:13.60]And now it suddenly
[45:14.60]Doesn't find your needle
[45:15.60]In a haystack anymore
[45:16.60]Then you're like
[45:17.60]Okay what can you do
[45:18.60]You're kind of at a loss
[45:20.60]Yeah let's talk
[45:21.60]A bit about needle
[45:22.60]In a haystack
[45:23.60]And like maybe
[45:24.60]The opposite of it
[45:25.60]Which is like hard
[45:26.60]Grounding I don't know
[45:27.60]That's like the best thing
[45:28.60]To think about it
[45:29.60]But I was using
[45:30.60]One of these
[45:31.60]Chavicher documents
[45:32.60]Features
[45:33.60]And I put the
[45:34.60]AMD MI300
[45:35.60]Spacks and the
[45:36.60]Blackwell chips
[45:37.60]From NVIDIA
[45:38.60]And I was asking questions
[45:39.60]And we like
[45:40.60]And the response was like
[45:41.60]Oh it doesn't say
[45:42.60]In the specs
[45:43.60]But if you ask
[45:44.60]GbD4 without the docs
[45:45.60]It would tell you no
[45:46.60]Because nvlink
[45:47.60]It's an NVIDIA
[45:48.60]It's technology
[45:49.60]Just as your N.V.
[45:50.60]Yeah hey man
[45:51.60]It just says in the thing
[45:52.60]How do you think about
[45:53.60]That having the context
[45:54.60]Sometimes suppress
[45:55.60]The knowledge
[45:56.60]That the model has
[45:57.60]It really depends on the task
[45:58.60]Because I think
[45:59.60]Sometimes that is
[46:00.60]Exactly what you want
[46:01.60]So imagine your researcher
[46:02.60]You're writing the background
[46:03.60]Section of your paper
[46:04.60]And you're trying to describe
[46:05.60]What these other papers say
[46:06.60]You really don't want
[46:07.60]Extra information
[46:08.60]To be introduced there
[46:09.60]In other cases
[46:10.60]Where you're just trying
[46:11.60]To figure out the truth
[46:12.60]And you're giving
[46:13.60]The documents because
[46:14.60]You think they will help
[46:15.60]The model figure out
[46:16.60]What the truth is
[46:17.60]I think you do want
[46:18.60]If the model has a hunch
[46:19.60]That there might be
[46:21.60]Something that's not
[46:22.60]In the papers
[46:23.60]You do want to surface that
[46:24.60]I think ideally
[46:25.60]You still don't want
[46:26.60]The model to just tell you
[46:27.60]I think probably
[46:28.60]The ideal thing
[46:29.60]Looks a bit more like
[46:30.60]Agent control
[46:31.60]Where the model can issue
[46:33.60]A query that then
[46:35.60]Is intended to surface
[46:36.60]The documents that
[46:37.60]Substantiate its hunch
[46:38.60]That's maybe
[46:39.60]A reasonable middle ground
[46:40.60]Between
[46:41.60]While just telling you
[46:42.60]And while being fully
[46:43.60]Limited to the papers
[46:44.60]You give it
[46:45.60]Yeah, I would say
[46:46.60]They're just kind of
[46:47.60]Different tasks right now
[46:48.60]And the tasks that
[46:49.60]Elicit is mostly focused on
[46:50.60]Is what do these papers say
[46:51.60]But there is another task
[46:52.60]Which is like
[46:53.60]Just give me the best
[46:54.60]Possible answer
[46:55.60]And that give me
[46:56.60]The best possible answer
[46:57.60]Sometimes depends
[46:58.60]On what do these papers say
[46:59.60]But it can also depend
[47:00.60]On other stuff
[47:01.60]That's not in the papers
[47:02.60]So ideally
[47:03.60]We can do both
[47:04.60]And then kind of
[47:05.60]We can ask
[47:06.60]For you
[47:07.60]More going forward
[47:08.60]We have
[47:09.60]See a lot of details
[47:10.60]But just to zoom
[47:11.60]Back out a little bit
[47:12.60]What are maybe
[47:13.60]The most underrated
[47:14.60]Features of elicit
[47:16.60]And what is
[47:17.60]One thing that
[47:18.60]Maybe the users
[47:19.60]Surprise you the most
[47:20.60]By using it
[47:21.60]I think the most
[47:22.60]Powerful feature of elicit
[47:23.60]Is the ability to
[47:24.60]Extract
[47:25.60]Add columns to this table
[47:26.60]Which effectively
[47:27.60]Extracts data
[47:28.60]From all of your
[47:29.60]Papers at once
[47:30.60]It's well used
[47:31.60]But there are
[47:32.60]Kind of many different
[47:33.60]Extensions of that
[47:34.60]We let you
[47:35.60]Give a description
[47:36.60]Of the column
[47:37.60]We let you give instructions
[47:38.60]Of a column
[47:39.60]We let you create custom
[47:40.60]Column
[47:41.60]So we have like 30
[47:42.60]Plus predefined fields
[47:43.60]That users can extract
[47:44.60]Like what were the methods
[47:45.60]What were the main findings
[47:46.60]How many people were studied
[47:48.60]And we actually show
[47:49.60]You basically the prompts
[47:50.60]That we're using to
[47:51.60]Extract that from
[47:52.60]Our predefined fields
[47:53.60]And then you can fork this
[47:54.60]And you can say
[47:55.60]Oh, actually I don't care
[47:56.60]About the population of people
[47:57.60]I only care about
[47:58.60]The population of rats
[47:59.60]Like you can change
[48:00.60]The instructions
[48:01.60]So I think users
[48:02.60]Are still kind of discovering
[48:03.60]This predefined
[48:04.60]Easy to use default
[48:06.60]But that they can extend it
[48:07.60]To be much more
[48:08.60]Specific to them
[48:09.60]And then they can also ask
[48:10.60]Custom questions
[48:11.60]One use case of that
[48:12.60]Is you can start to
[48:13.60]Create different column types
[48:14.60]That you might not expect
[48:15.60]So rather than just
[48:16.60]Creating generative answers
[48:17.60]Like a description
[48:18.60]Of the methodology
[48:19.60]You can say
[48:20.60]Classify the methodology
[48:22.60]Into a prospective study
[48:23.60]A retrospective study
[48:24.60]Or a case study
[48:26.60]And then you can filter
[48:27.60]Based on that
[48:28.60]It's like all using
[48:29.60]The same kind of technology
[48:30.60]And the interface
[48:31.60]But it unlocks
[48:32.60]So I think that
[48:33.60]The ability to ask
[48:34.60]Custom questions
[48:35.60]Give instructions
[48:36.60]And specifically use
[48:37.60]That to create different
[48:38.60]Types of columns
[48:39.60]Like classification columns
[48:41.60]Is still pretty underrated
[48:42.60]In terms of use case
[48:44.60]I spoke to someone
[48:45.60]Who works in medical affairs
[48:47.60]At a genomic sequencing
[48:48.60]Company recently
[48:49.60]So you know
[48:50.60]The doctors kind of order
[48:52.60]These genomic tests
[48:53.60]These sequencing tests
[48:54.60]To kind of identify
[48:55.60]If a patient has
[48:56.60]A particular disease
[48:57.60]This company helps
[48:58.60]And process it
[48:59.60]And this person
[49:00.60]Basically interacts
[49:01.60]With all the doctors
[49:02.60]And if the doctors
[49:03.60]Have any questions
[49:04.60]My understanding is that
[49:05.60]Medical affairs
[49:06.60]Is kind of like customer
[49:07.60]Support or customer success
[49:08.60]In pharma
[49:09.60]So this person
[49:10.60]Talks to doctors all day long
[49:11.60]And one of the things
[49:12.60]They started using elicit for
[49:13.60]Is like putting the results
[49:14.60]Of their tests
[49:15.60]As a query
[49:17.60]Like this test showed
[49:18.60]You know this percentage
[49:19.60]Presence of this
[49:20.60]And 40% that
[49:21.60]And whatever
[49:22.60]You know what genes are present
[49:23.60]Here or within this sample
[49:25.60]And getting kind of
[49:26.60]A list of academic papers
[49:27.60]That would support their findings
[49:29.60]And using this to help
[49:30.60]The doctors
[49:31.60]Interpret their tests
[49:32.60]So we talked about
[49:33.60]Okay cool
[49:34.60]Like if we built
[49:35.60]He's pretty interested
[49:36.60]In kind of doing a survey
[49:37.60]Of infectious disease
[49:38.60]Specialists
[49:39.60]And getting them
[49:40.60]To evaluate
[49:41.60]You know having them
[49:42.60]Right up their answers
[49:43.60]Comparing it to elicit
[49:44.60]Answers trying to see
[49:45.60]Can elicit start being
[49:46.60]Used to interpret
[49:47.60]The results of
[49:48.60]These diagnostic tests
[49:49.60]Because the way
[49:50.60]They ship these tests
[49:51.60]To doctors
[49:52.60]Is they report
[49:53.60]On a really wide
[49:54.60]Array of things
[49:55.60]He was saying
[49:56.60]That at a large
[49:57.60]Well resourced hospital
[49:58.60]Like a city hospital
[49:59.60]There might be
[50:00.60]A team of infectious disease
[50:01.60]Specialists who can
[50:02.60]Help interpret
[50:03.60]These results
[50:04.60]But at underresourced
[50:05.60]Hospitals or more
[50:06.60]Rural hospitals
[50:07.60]The primary care physician
[50:08.60]Can't interpret
[50:09.60]The test results
[50:10.60]So then they can't order
[50:11.60]They can't use it
[50:12.60]They can't help
[50:13.60]The patients with it
[50:14.60]So thinking about
[50:15.60]An evidence backed way
[50:16.60]Of interpreting these tests
[50:17.60]Definitely kind of
[50:18.60]An extension of the product
[50:19.60]That I hadn't considered
[50:20.60]Before
[50:21.60]But yeah the idea of
[50:22.60]Using that to bring
[50:23.60]More access to physicians
[50:24.60]In all different parts
[50:25.60]Of the country
[50:26.60]And helping them
[50:27.60]Interpret complicated
[50:28.60]We are kenjun
[50:29.60]From mv1
[50:30.60]On the podcast
[50:31.60]And we talked about
[50:32.60]Better allocating
[50:33.60]Scientific resources
[50:34.60]How do you think about
[50:35.60]These use cases
[50:36.60]And maybe
[50:37.60]How illicit
[50:38.60]Can help drive
[50:39.60]More research
[50:40.60]And do you see
[50:41.60]A world in which
[50:42.60]You know maybe the models
[50:43.60]Actually do
[50:44.60]Some of the research
[50:45.60]Before suggesting us
[50:46.60]Yeah I think
[50:47.60]That's like
[50:48.60]Very close to
[50:49.60]What we care about
[50:50.60]Our product values
[50:51.60]Are systematic
[50:52.60]Transparent and unbounded
[50:53.60]And I think
[50:54.60]You make research
[50:55.60]Especially more systematic
[50:56.60]And unbounded
[50:57.60]And here's
[50:58.60]The thing
[50:59.60]That's at stake here
[51:00.60]So for example
[51:01.60]I was
[51:02.60]Recently talking
[51:03.60]To people in longevity
[51:04.60]And I think
[51:05.60]There isn't really
[51:06.60]One field of longevity
[51:07.60]There are kind of
[51:08.60]Different
[51:09.60]Scientific subdomains
[51:10.60]That are surfacing
[51:11.60]Various things
[51:12.60]That are related
[51:13.60]To longevity
[51:14.60]And I think
[51:14.60]If you could
[51:15.60]More systematically
[51:16.60]Say look
[51:17.60]Here all the different
[51:18.60]Interventions
[51:19.60]We could do
[51:20.60]And here's
[51:21.60]The expected
[51:22.60]RI of these experiments
[51:23.60]Here's like
[51:24.60]The evidence so far
[51:25.60]That supports
[51:26.60]So much more systematic
[51:27.60]Than
[51:28.60]Sciences today
[51:29.60]I'd guess in like
[51:30.60]10 20 years we'll look back
[51:31.60]And it will be
[51:32.60]Incredible how
[51:33.60]Unsystematic science
[51:34.60]Was back in the day
[51:35.60]Our views kind of
[51:36.60]Have models
[51:37.60]Catch up to expert humans today
[51:39.60]Start with kind of
[51:40.60]Novice humans
[51:41.60]And then increasingly
[51:42.60]Expert humans
[51:43.60]But we really want
[51:44.60]The models to earn
[51:45.60]Their right to the expertise
[51:47.60]So that's why we do
[51:48.60]Things in this very step-by-step way
[51:49.60]That's why we don't
[51:50.60]Just like throw a bunch of data
[51:51.60]And apply a bunch of compute
[51:52.60]And hope we get good results
[51:54.60]But obviously at some point
[51:55.60]It's kind of
[51:56.60]Earned its stripes
[51:57.60]It can surpass
[51:58.60]Human researchers
[51:59.60]But I think that's where
[52:00.60]Making sure
[52:01.60]That the models
[52:02.60]Processes are really
[52:03.60]Explicit and transparent
[52:05.60]And that it's really
[52:06.60]Easy to evaluate
[52:07.60]Is important because
[52:08.60]If it does surpass
[52:09.60]Human understanding
[52:10.60]People will still need
[52:11.60]To be able to audit
[52:12.60]It's work somehow
[52:13.60]Or spot check
[52:14.60]It's work somehow
[52:15.60]To be able to reliably
[52:16.60]Trust it and use it
[52:17.60]So yeah
[52:18.60]That's kind of why
[52:19.60]The process-based approaches
[52:20.60]Is really important
[52:21.60]And on the question
[52:22.60]Of will models
[52:23.60]Do their own research
[52:24.60]Teachers that models
[52:25.60]Currently don't have
[52:26.60]That will need
[52:27.60]To be better there
[52:28.60]Is better world models
[52:30.60]I think currently models
[52:31.60]Are just not great
[52:32.60]At representing
[52:33.60]What's going on
[52:34.60]In a particular situation
[52:35.60]Or domain in a way
[52:36.60]That allows them to
[52:37.60]Come to interesting
[52:38.60]Surprising conclusions
[52:40.60]I think they're very good
[52:41.60]At coming to conclusions
[52:42.60]That are nearby
[52:43.60]To conclusions
[52:44.60]That people have come to
[52:45.60]Not as good
[52:46.60]At kind of reasoning
[52:47.60]And making
[52:48.60]Surprising connections maybe
[52:49.60]And so having
[52:50.60]Deeper models of
[52:52.60]What are the underlying
[52:53.60]Domains
[52:54.60]How are they related
[52:55.60]Or not related
[52:56.60]I think there will be
[52:57.60]An important ingredient
[52:58.60]From all to actually
[52:59.60]Being able to make
[53:00.60]Novel contributions
[53:01.60]On the topic of
[53:02.60]Hiring more expert humans
[53:03.60]You've hired some
[53:04.60]Very expert humans
[53:05.60]My friend Maggie
[53:06.60]Appleton joined you guys
[53:07.60]I think maybe
[53:08.60]A year ago-ish
[53:09.60]In fact, I think
[53:10.60]You're doing an offsite
[53:11.60]And we're actually
[53:12.60]Organizing our big
[53:13.60]AI UX meetup around
[53:14.60]Whenever she's
[53:15.60]In town in San Francisco
[53:16.60]How big is the team
[53:17.60]How have you sort of
[53:18.60]Transition your company
[53:19.60]Into this sort of PBC
[53:20.60]And sort of the plan
[53:21.60]For the future
[53:22.60]About half of us
[53:23.60]Are in the Bay Area
[53:24.60]And then distributed
[53:25.60]Across US and Europe
[53:26.60]A mix of mostly kind
[53:28.60]Of roles in engineering
[53:29.60]And product
[53:30.60]And I think that
[53:31.60]The transition to
[53:32.60]PBC was really
[53:33.60]Not that eventful
[53:34.60]Because I think
[53:35.60]We were already
[53:36.60]Even as a nonprofit
[53:37.60]We were already
[53:38.60]Shipping every week
[53:39.60]So very much
[53:40.60]Operating as a product
[53:41.60]And then I would say
[53:43.60]The kind of PBC component
[53:44.60]Was to very explicitly
[53:46.60]Stay that we have
[53:47.60]A mission that we care
[53:48.60]A lot about
[53:49.60]There are a lot of ways
[53:50.60]To make money
[53:51.60]We make us
[53:52.60]A lot of money
[53:53.60]But we are going
[53:54.60]To be opinionated
[53:55.60]About how we make money
[53:56.60]We're going to take
[53:57.60]The version of making
[53:58.60]A lot of money
[53:59.60]That's in line
[54:00.60]With our mission
[54:01.60]But it's like
[54:02.60]All very convergent
[54:03.60]Alicit is not going
[54:04.60]To make any money
[54:05.60]If it's a bad product
[54:06.60]If it doesn't actually
[54:07.60]Help you discover truth
[54:08.60]And do research
[54:09.60]More rigorously
[54:10.60]So I think for us
[54:11.60]The kind of mission
[54:12.60]And the success
[54:13.60]Of the company
[54:14.60]Are very intertwined
[54:15.60]We're hoping to grow
[54:16.60]The team quite a lot
[54:17.60]This year
[54:18.60]Probably some of our
[54:19.60]Highest priority roles
[54:20.60]In marketing
[54:21.60]Go to market
[54:22.60]Do you want to talk
[54:23.60]About their roles?
[54:24.60]Yeah, broadly
[54:25.60]We're just looking
[54:26.60]For senior software engineers
[54:27.60]And don't need
[54:28.60]Any particular AI expertise
[54:29.60]A lot of it is just
[54:30.60]How do you
[54:31.60]Build good orchestration
[54:33.60]For complex tasks
[54:34.60]So we talked earlier
[54:35.60]About these notebooks
[54:36.60]Scaling up
[54:37.60]Task orchestration
[54:38.60]And I think a lot
[54:39.60]Of this looks more
[54:40.60]Like traditional
[54:41.60]Soft engineering
[54:42.60]Than it does look
[54:43.60]Like machine learning
[54:44.60]Research and I think
[54:45.60]The people who are
[54:46.60]Like really good at
[54:47.60]Building good abstractions
[54:48.60]Building applications
[54:49.60]We've survived
[54:50.60]Even if some
[54:51.60]Of their pieces break
[54:52.60]Like making reliable
[54:53.60]Components out of
[54:54.60]Unreliable pieces
[54:55.60]I think those are the
[54:56.60]People we're looking for
[54:57.60]You know that's exactly
[54:58.60]What I used to do
[54:59.60]Have you explored
[55:00.60]The existing orchestration
[55:01.60]Frameworks, Temporal, Airflow
[55:03.60]Daxter, Prefects
[55:05.60]We've looked into
[55:06.60] Them a little bit
[55:07.60]I think we have
[55:08.60]Some specific requirements
[55:09.60]Around being able
[55:10.60]To stream work back
[55:11.60]Very quickly
[55:12.60]To our users
[55:13.60]Those could definitely
[55:14.60]Be relevant
[55:15.60]Okay, well you're hiring
[55:16.60]I'm sure we'll plug
[55:17.60]All the links
[55:18.60]And parting words
[55:19.60]Any words of wisdom
[55:20.60]Models you live by
[55:22.60]I think it's a really important
[55:23.60]Time for humanity
[55:24.60]So I hope everyone
[55:25.60]Listening to this podcast
[55:27.60]Can think hard about exactly
[55:29.60]How they want to
[55:30.60]Participate in this story
[55:31.60]There's so much to build
[55:33.60]And we can be really
[55:34.60]Intentional about what
[55:35.60]We align ourselves with
[55:36.60]There are a lot of applications
[55:38.60]That are going to be really good
[55:39.60]For the world
[55:39.60]And a lot of applications
[55:40.60]That are not
[55:41.60]And so yeah
[55:42.60]I hope people can
[55:43.60]Take that seriously
[55:44.60]And kind of seize the moment
[55:45.60]Yeah, I love how intentional
[55:46.60]You guys have been
[55:47.60]Thank you for sharing
[55:48.60]Thank you
[55:49.60]Thank you for coming on
[55:50.60](音乐)
[55:52.60](音樂)
[55:54.60](音樂)
[55:56.60](音樂)
[55:58.60](音樂)
[56:00.60](音樂)
[56:03.60](音樂)
[56:06.60](音樂)
[56:09.60](音樂)
[56:11.60](音樂)
[56:13.60](音樂)
[56:15.60]中文字幕:J Chong
[56:16.60]我只想要你和我一起去做一件事