WEBVTT

00:00:08.000 --> 00:00:14.000
giving the Zoom talk, um, at what's, like, 2 in the morning, 2.30 in the morning for you.

00:00:14.000 --> 00:00:19.000
Um… but yeah, we're making the best of a…

00:00:19.000 --> 00:00:27.000
tough situation, I think. Um, yeah, so I… I'm happy to introduce Yuan Seng. I, um, met him in…

00:00:27.000 --> 00:00:31.000
Berlin, I think, or maybe we met… no, we met once before at MIT, maybe.

00:00:31.000 --> 00:00:35.000
Um, at a workshop there. Anyway, um…

00:00:35.000 --> 00:00:37.000
Yeah, we ran into each other in…

00:00:37.000 --> 00:00:43.000
Berlin over the summer, um, and Yun Seng is doing very interesting things on…

00:00:43.000 --> 00:00:49.000
Uh, using agents, AI, and agents for, um, research in Astro.

00:00:49.000 --> 00:00:54.000
Uh, and since this is one of, uh, our interests here, both agents and also astroparticles,

00:00:54.000 --> 00:00:58.000
Um, I thought it'd be great to invite him to give this talk.

00:00:58.000 --> 00:01:00.000
And, um…

00:01:00.000 --> 00:01:06.000
And let's see… yeah, so I… oh, and Sen also has the distinction of having worked on…

00:01:06.000 --> 00:01:21.000
Uh, a topic that we also are very active in here, very close to our heart, which is this, uh, you know, using the kinematics of stars to probe the potential and mass density, um, using data, for instance, from the Gaia Space Telescope.

00:01:21.000 --> 00:01:26.000
So that was, I guess, your PhD research with Gregory? Or…

00:01:26.000 --> 00:01:27.000
Post thought.

00:01:27.000 --> 00:01:31.000
No, so that was… yeah, that was, uh, a postal, yes.

00:01:31.000 --> 00:01:34.000
Yeah.

00:01:34.000 --> 00:01:35.000
Yeah.

00:01:35.000 --> 00:01:37.000
Postdoc research, okay, postdoc research with, um, Gregory, yeah. So, so yeah, so we have a lot in common, a lot to talk about, I think.

00:01:37.000 --> 00:01:42.000
And, um, and yeah, so I think without further ado, please take it away.

00:01:42.000 --> 00:01:47.000
Okay, uh, let me share screen…

00:01:47.000 --> 00:01:49.000
Sorry, I'm still, like, recovering from…

00:01:49.000 --> 00:01:54.000
Oh, cool. Um, okay, can people see my screen?

00:01:54.000 --> 00:01:55.000
Yeah. Yeah.

00:01:55.000 --> 00:02:04.000
Okay, great, uh, thank you, David. Uh, and as David mentioned, I am really grateful that, uh, for, uh, you know, uh,

00:02:04.000 --> 00:02:08.000
for you to indulge in me to give a remote talk.

00:02:08.000 --> 00:02:14.000
Um, yep, so I would like very much to be there in person, but I couldn't do that.

00:02:14.000 --> 00:02:21.000
Um, and I should also mention that, you know, I will be more than, like, happy to have, uh, a personal Zoom meeting

00:02:21.000 --> 00:02:27.000
Uh, afterwards, um, maybe not right after, because it's…

00:02:27.000 --> 00:02:28.000
When it's finished, like, it will be 3…

00:02:28.000 --> 00:02:35.000
30 AM, but, uh, like, any time this week, next week, if you want to talk more, I will be, uh, be happy to chat.

00:02:35.000 --> 00:02:39.000
Okay, so, uh, as Debbie mentioned, you know, um,

00:02:39.000 --> 00:02:46.000
Today, I want to focus on AI Agent. It's something that I thought it was a hobby two years later, but it has

00:02:46.000 --> 00:02:58.000
gain more and more prominence in the field, but also, you know, my thinking for AI, like, agent has changed over time. I find that to be more fascinating, so I spend more and more time

00:02:58.000 --> 00:03:01.000
Uh, on it. But let me start by…

00:03:01.000 --> 00:03:08.000
talking about why I think AI Agent is particularly interesting for physics and astrophysics.

00:03:08.000 --> 00:03:15.000
Start me with the basic, we all know that, you know, there are lots of, like, interests for AI, for science, if you look at NSF.

00:03:15.000 --> 00:03:22.000
Uh, you know, for all the $200 million on AI Research Institute, half of them is,

00:03:22.000 --> 00:03:31.000
For stamps, right? And including the new two AI Center for Astronomy, one at UT Austin, and one at Northwestern.

00:03:31.000 --> 00:03:39.000
Um, of course, then we also have the DOE, the Genesis program. There are lots of, like, interests with a lot of, like, money.

00:03:39.000 --> 00:03:44.000
Like, flying, like, around, then the question, you know, I often got asked even,

00:03:44.000 --> 00:03:51.000
a decade let it go, when I started to do more on deep learning, people, like, always ask me about,

00:03:51.000 --> 00:04:02.000
If this is a hype, a myth, or a real deal, you know, this is a thing that, when I was at the IAS, uh, like, I was an odd one out that, you know, get, like, fascinating about deep learning, when…

00:04:02.000 --> 00:04:05.000
Most of the physics is still quite less skeptical.

00:04:05.000 --> 00:04:13.000
Um, I would think that the answer is all three of them, and I will explain why that is the case.

00:04:13.000 --> 00:04:21.000
Um, I think the proponent for AI, for science will always point to this one example that we love, which is the Alpha foe.

00:04:21.000 --> 00:04:26.000
Uh, which is not a terrible example, because certainly, you know, if you look at

00:04:26.000 --> 00:04:34.000
how we do, uh, chemistry, it has been completely transformed because of AI, right?

00:04:34.000 --> 00:04:42.000
But, uh, the follow-up question that I often got asked is, uh, why that… what would be the equivalent of the Alpha foe?

00:04:42.000 --> 00:04:49.000
for astronomy. Um, to answer that question, I think, like, there will not be an equivalent

00:04:49.000 --> 00:04:57.000
answer for that. But to answer why there is no equilibrium, uh, no equivalent in astronomy,

00:04:57.000 --> 00:05:05.000
wanted to explain a little bit what we have been doing, right, in terms of astronomy, so I know that not all people here are astronomers.

00:05:05.000 --> 00:05:10.000
Uh, for people who are interested in what has, like, happened in the last decade, I will point to

00:05:10.000 --> 00:05:24.000
my end-loader review. I spent, like, almost a half a year, like, writing these pieces. Uh, but I will not talk about anything in that piece, because this is a summary of what I think has happened in the last decade.

00:05:24.000 --> 00:05:31.000
It really comes down to, like, what has happened in the last decade, I think both in physics, but also in astronomy.

00:05:31.000 --> 00:05:36.000
Uh, people have been using, like, AI to extend, uh, statistical methods, right?

00:05:36.000 --> 00:05:52.000
in cosmology, for example, we are always wanting to come up with a better way to do, uh, inferences without, like, relying on some human, uh, heuristic. So we want to do some Bayesian inferences.

00:05:52.000 --> 00:05:58.000
But we don't have a very good models. So how do you, like, really…

00:05:58.000 --> 00:06:06.000
connect the simulation with the observation. So, lots of the AI really comes down to this idea, now become quite a popular

00:06:06.000 --> 00:06:10.000
in physics and astrophysics called the simulation base.

00:06:10.000 --> 00:06:22.000
our inferences. And this is something that I work, like, a lot in my, uh, Lear Before chatbot at ILM. I think these are still important, but, like, these are the things that I would call that

00:06:22.000 --> 00:06:31.000
individual tasks are AI, right? So in the timeframe of from 2013 to maybe, like, 2022,

00:06:31.000 --> 00:06:38.000
Like, a lot of the AI that we have been, like, dealing with are sort of the classical deep learning rearm. We are training

00:06:38.000 --> 00:06:43.000
a regression models, we are, like, training a probabilistic model, a classification of models.

00:06:43.000 --> 00:06:52.000
to do individual tasks better. And some of that, of course, is important, as David mentioned, you know, like, just understanding how to

00:06:52.000 --> 00:06:57.000
model the kinematics of star without having an explicit

00:06:57.000 --> 00:07:01.000
our models are still useful.

00:07:01.000 --> 00:07:05.000
But I choose not to talk about those topics, because I also find that those topics will

00:07:05.000 --> 00:07:09.000
unlikely to have solved the very profound, like, impact.

00:07:09.000 --> 00:07:12.000
in astronomy, not the 0 to 1.

00:07:12.000 --> 00:07:15.000
type of, uh, of impact.

00:07:15.000 --> 00:07:17.000
And why that is the case, uh…

00:07:17.000 --> 00:07:19.000
It really comes down to both

00:07:19.000 --> 00:07:24.000
fundamental physics, both in the particle physics and, uh, as well as astronomy.

00:07:24.000 --> 00:07:30.000
I would argue that the complexity that we are dealing with is too low for AI.

00:07:30.000 --> 00:07:35.000
Right, so if you're thinking about, like, chemistry or biology or material science,

00:07:35.000 --> 00:07:43.000
of course, the complexity is high, and this is why, you know, like, AI can do the wonderful jobs, like the alphaFo.

00:07:43.000 --> 00:07:46.000
But, like, in astronomy, the…

00:07:46.000 --> 00:07:55.000
the analogy that I love to give is that my niece has more non-Gaussian information comparing to the entire

00:07:55.000 --> 00:07:59.000
our universe. Like, a lot of things, the thing that we deal with,

00:07:59.000 --> 00:08:03.000
Uh, only weekly non-Gaussian, or weekly non-linear.

00:08:03.000 --> 00:08:12.000
Um, and this is why, because of the complexity is low, the individual task is unlikely to make a big impact. So if you think about what is the individual task,

00:08:12.000 --> 00:08:21.000
quite often, you have some data, you have some theory, you try to analyze the data, and decide if the theory is true or false.

00:08:21.000 --> 00:08:29.000
Like, in the case of, like, chemistry or biology, they are bottlenecked by some of these tasks, right? Thinking about, like, protein folding,

00:08:29.000 --> 00:08:33.000
There are simply no good human heuristic.

00:08:33.000 --> 00:08:39.000
Uh, that can do the job. So this is why when the alpha fall, like, happened, it has solved one task,

00:08:39.000 --> 00:08:44.000
But that task is so important that, like, it would have a zero-to-one type of,

00:08:44.000 --> 00:08:47.000
Tried up our transformation in the field.

00:08:47.000 --> 00:08:56.000
But because the astronomy complexity is much lower, if you think about much of the things that we deal with in astronomy or particle physics,

00:08:56.000 --> 00:09:00.000
We already have a… not a very bad

00:09:00.000 --> 00:09:06.000
way of dealing with that, right? We have good human, uh, uh…

00:09:06.000 --> 00:09:15.000
a heuristic that know how to deal with that, thinking about large-scale structure, we know that we need to go slightly beyond the power spectrum, but we have some way to do that.

00:09:15.000 --> 00:09:17.000
So,

00:09:17.000 --> 00:09:23.000
So, like, even though I think, you know, those study of SBI and all this,

00:09:23.000 --> 00:09:30.000
can be important, but they are always at the level of improve, you know, 5 to 20% of the inferences.

00:09:30.000 --> 00:09:33.000
And that itself might be important, right, if you want to detect

00:09:33.000 --> 00:09:38.000
I want to, like, constrain dark matter, if you can, like, improve by 10%, it's a lot.

00:09:38.000 --> 00:09:42.000
But still, you know, I want to choose to talk about, like, topics that I think is…

00:09:42.000 --> 00:09:55.000
more, uh, avant-garde, but not less established, but I think can have a bigger, like, impact in astronomy with, uh, the way she's thinking about using AI, uh,

00:09:55.000 --> 00:09:58.000
agent to expedite the discovery.

00:09:58.000 --> 00:10:03.000
So, um, when I give this talk to two years ago, I need to…

00:10:03.000 --> 00:10:07.000
explain what is agent, what is a large… a language models.

00:10:07.000 --> 00:10:15.000
But I think I no longer need to do that, because now you're, like, becoming quite a common thing that people hopefully know, right? So, like, sort of the idea of using

00:10:15.000 --> 00:10:22.000
agent using large, uh, a language model to solve a problem, meaning that we try to use

00:10:22.000 --> 00:10:28.000
Asian as a pseudo-human. So instead of dealing with individual tasks with deep learning,

00:10:28.000 --> 00:10:33.000
We are using the agent to make plans, and we are using

00:10:33.000 --> 00:10:43.000
some of the, uh, the reasoning ability of the agent to solve more holistic tasks.

00:10:43.000 --> 00:10:53.000
Um, and you also shouldn't surprise people that, you know, like, people like myself, and also I know, like, in rugust, people are thinking about that, just because, especially…

00:10:53.000 --> 00:10:59.000
Since last year, it seems to have a phase shift in terms of the ability of

00:10:59.000 --> 00:11:01.000
of AI to do real things.

00:11:01.000 --> 00:11:06.000
Right? Since last year, uh, the AI has started to,

00:11:06.000 --> 00:11:14.000
gain gold medals in all sort of, uh, Olympiads, including the math, the physics, the informatics.

00:11:14.000 --> 00:11:21.000
Uh, at OSU, we look into the astronomy Olympiad, because I coached the team in Malaysia, so…

00:11:21.000 --> 00:11:24.000
And those are people, and we also have a student just turned out to be

00:11:24.000 --> 00:11:28.000
the team, uh, leader in the US.

00:11:28.000 --> 00:11:30.000
So, we are, uh, uh…

00:11:30.000 --> 00:11:35.000
are able to work with the astronomy Olympiad, and also have the grading.

00:11:35.000 --> 00:11:46.000
Uh, that is aligned with how the students are being graded. Long story short, like, even in astronomy, like, Olympia, of course, it won the gold medal, but more importantly,

00:11:46.000 --> 00:11:52.000
If you look at the score, even with the last year, like, models,

00:11:52.000 --> 00:11:55.000
No student would have beaten Legit 5.

00:11:55.000 --> 00:12:00.000
So, the GBD5 will have won not only the gold medal, uh, like, it will rank the first place.

00:12:00.000 --> 00:12:03.000
in the entire comp… well, a competition.

00:12:03.000 --> 00:12:09.000
Uh, and this type of, like, high school competition, of course, is not cutting-edge research.

00:12:09.000 --> 00:12:14.000
But it's also not, like, simple, right? So this is one of the theory questions about,

00:12:14.000 --> 00:12:18.000
a neutron star by a binary.

00:12:18.000 --> 00:12:28.000
So, a bigger question is, like, we know that the AI can do some math. The question is whether or not in the open world, like, setting, especially

00:12:28.000 --> 00:12:32.000
like, in the task that the AI are not trained to do.

00:12:32.000 --> 00:12:34.000
Can AI also reach

00:12:34.000 --> 00:12:37.000
somewhat a human, uh, researcher.

00:12:37.000 --> 00:12:41.000
allowable.

00:12:41.000 --> 00:12:47.000
So these are something that I got interested two years ago. So, the first attempt that we tried to do…

00:12:47.000 --> 00:13:04.000
Uh, in 2024 is looking at the James Webb data, right? So we know that the James Webb data is exciting, and that a lot of people are analyzing the data, but we also know that from just collecting the James Webb data,

00:13:04.000 --> 00:13:10.000
to publishing the paper, but there are still many human intervention is needed.

00:13:10.000 --> 00:13:21.000
So, if, for example, if you take the James Webb data, it's what we call the SED of the galaxy. The SED stands for Spectral Energy Distribution. You can think about this as a

00:13:21.000 --> 00:13:32.000
call us a resolution spectrum. And the game that the astronomer play is try to find a physical model that can go through the data point.

00:13:32.000 --> 00:13:40.000
Right? Of course, the challenges here is that there are many, many physical models that we can assume, so there's no one fifth

00:13:40.000 --> 00:13:51.000
One physical model, and you just do a parameter optimization. So there are many, like, building blocks of, like, physical processes.

00:13:51.000 --> 00:13:53.000
that you can add into the models.

00:13:53.000 --> 00:14:00.000
So, of course, if you have a James Webb data, and you have your favorite student, you will give your data to your students.

00:14:00.000 --> 00:14:08.000
And ask your student to first fit a baseline of models, right? So make some physical assumption and try to fit the models.

00:14:08.000 --> 00:14:14.000
And quite often, of course, your student will come back and say, hey, you know, the model seems okay, but there are some discrepancy.

00:14:14.000 --> 00:14:24.000
Right? So, like, as a good supervisor, what you would do next is not to ask your student to, you know, exhaustively try all the models.

00:14:24.000 --> 00:14:34.000
What you would try to do is to ask, okay, we can look at some of the trends and decide what are the physical, like, assumptions that we need to change. So in this case, for example, if you see

00:14:34.000 --> 00:14:38.000
That is a wavelength-dependent discrepancy. Maybe the

00:14:38.000 --> 00:14:43.000
the solution here is to change the dust.

00:14:43.000 --> 00:14:48.000
properties, because we know that the grain size of the dust can…

00:14:48.000 --> 00:14:51.000
create a wavelength-dependent discrepancy.

00:14:51.000 --> 00:14:57.000
Or you can focus on the ultraviolet and notice that the current physical models

00:14:57.000 --> 00:15:04.000
Uh, under the predicting the ultra-wide flux, and therefore, you should change the parameterization of the star formation.

00:15:04.000 --> 00:15:11.000
a history. So this is, of course, a very simple example. I'd love to start with this simple example.

00:15:11.000 --> 00:15:17.000
Because I want to highlight the fact that, like, a lot of what we call, like, research,

00:15:17.000 --> 00:15:25.000
in the grander scheme, you can think about this as optimization, because, like, everything is just about, like, building a physical, like, models.

00:15:25.000 --> 00:15:32.000
But not all physical models they're building are a simple optimization of problems.

00:15:32.000 --> 00:15:40.000
Uh, especially in astronomy, for two reasons. One is because the data that we have is not that clean, right? So…

00:15:40.000 --> 00:15:44.000
Sometimes, a model that gives you a better chi-square

00:15:44.000 --> 00:15:50.000
can be less physically plausible, right? So there is some layer of…

00:15:50.000 --> 00:15:59.000
possibility that you need to fold in. But more importantly, it's just that the action space, like, if you're thinking about, if I want to build a physical, like, models,

00:15:59.000 --> 00:16:08.000
The action space are often just too vast, so it's more than just setting, you know, 10 parameters and optimizing over them.

00:16:08.000 --> 00:16:11.000
And humans are particularly

00:16:11.000 --> 00:16:15.000
capable of doing that. Like, if you think about what human is,

00:16:15.000 --> 00:16:21.000
like, better than, you know, just a gradient descent out.

00:16:21.000 --> 00:16:28.000
algorithm is human art particularly good at navigating a high-dimensional

00:16:28.000 --> 00:16:33.000
action space and take shortcuts. So we use our intuition or experience

00:16:33.000 --> 00:16:38.000
To decide what would be the next step to find the solution.

00:16:38.000 --> 00:16:45.000
Right. So the question is, like, this type of intuition or experience, like, we know that most, like, humans learn from experience,

00:16:45.000 --> 00:16:49.000
can also a language model learn from its own, uh,

00:16:49.000 --> 00:16:52.000
or experience.

00:16:52.000 --> 00:17:02.000
So, this is why, two years ago, we started to work on this thing called Mephisto. Uh, you might ask why we call that, like, Mephisto, because according to the German folk law,

00:17:02.000 --> 00:17:11.000
Le Brafisto is a demon that trades soul for knowledge. This is exactly what we do. We trade, like, our soul to our AI, like, overlord, uh, for knowledge.

00:17:11.000 --> 00:17:21.000
Right? So, the Mephistor is, like, uh, like, at this time, you know, like, most people have built some agents, so the idea of agent is just that you

00:17:21.000 --> 00:17:25.000
break down your question into small pieces.

00:17:25.000 --> 00:17:28.000
And like the chatbot to do

00:17:28.000 --> 00:17:35.000
each of the pieces, right? So in the case of the James Webb things, you need to have, like, a few,

00:17:35.000 --> 00:17:39.000
agents. So you have the first, like, agent,

00:17:39.000 --> 00:17:45.000
That is the proposing, like, agent, meaning that given the current fit, given the current physical models,

00:17:45.000 --> 00:17:53.000
What are some of the assumptions that I can change? So this, like, agent is, uh, connecting to all the paper and the database that we have, like, curated.

00:17:53.000 --> 00:17:57.000
To decide what would be the next step to take.

00:17:57.000 --> 00:18:00.000
The other agent turned out to be, like, actually the most work.

00:18:00.000 --> 00:18:04.000
that we have done 2 years ago, is to execute the action.

00:18:04.000 --> 00:18:10.000
Right? So, the challenges two years ago, I think this gap is getting smaller, but two years ago,

00:18:10.000 --> 00:18:16.000
Most of the codebase in astronomy are not interfacing with

00:18:16.000 --> 00:18:20.000
the agent, meaning that, you know, your chat GPT might not know how to run

00:18:20.000 --> 00:18:28.000
your specialized code. So a lot of the things that we do, like the back then, is to refactor all this code into what we

00:18:28.000 --> 00:18:34.000
they now call the MCP. So, the model context protocol,

00:18:34.000 --> 00:18:46.000
to really make sure that the tool is what we call agent ready, meaning that, you know, the task, the pipeline is easy, but to be able to make sure that your domain that's specific to

00:18:46.000 --> 00:18:52.000
can be used by the agent, take some work to get there.

00:18:52.000 --> 00:19:01.000
But more importantly, I think why it's important, uh, is that once you have made a action, then you have an agent that can, uh, reflect on

00:19:01.000 --> 00:19:10.000
why this works, or why this do not work, why something is better? Is this the action leads to a better outcome or not?

00:19:10.000 --> 00:19:19.000
And then you have another, like, agent that will distill the knowledge. Interestingly, uh, in the modern data lingo, like, if you think about cloud code,

00:19:19.000 --> 00:19:33.000
This is what we call the agent skills, alright? So, agent skills become quite a common term. Like, if you play with the open claw, that will be what you call the memory, right? So, like, all these things,

00:19:33.000 --> 00:19:36.000
has becoming more and more, like, popular in the last 6 months.

00:19:36.000 --> 00:19:45.000
But what we built back then is quite a proxy of what we call the agent scale and the memory on, uh, in, uh, in OpenClaw.

00:19:45.000 --> 00:19:49.000
So, if people are not, like, familiar with

00:19:49.000 --> 00:19:54.000
language models and the lingo, you could just think about this as a walker in the action space. So,

00:19:54.000 --> 00:20:03.000
You are doing some tree search, just like human, but you are letting the agent to learn from the tree search, and also, uh, update the tree search.

00:20:03.000 --> 00:20:07.000
over time.

00:20:07.000 --> 00:20:12.000
Uh, in practice, this also works. So, back then, we are comparing with GPT-4-0.

00:20:12.000 --> 00:20:17.000
So this is looking at some of the James Webb data and the chi-square of the feet.

00:20:17.000 --> 00:20:32.000
on average. And, uh, looking at the chi-square as a function of the iteration steps. So you can think about, like, Mephisto is like playing… it's like the alpha 0. So essentially, you are doing, like, the reinforcement learning through the memory,

00:20:32.000 --> 00:20:36.000
But just without changing the weights in the neural network.

00:20:36.000 --> 00:20:43.000
So, what has happened is that when, like, Mephisto played more and more games in the, like, in this case, the games is…

00:20:43.000 --> 00:20:46.000
to fit the spectrum, change the physical

00:20:46.000 --> 00:20:59.000
models to be able to fit the spectrum, then Mephisto get, like, better and better over time, because it has collected a lot of experience and memory that can help it to do a better job next time.

00:20:59.000 --> 00:21:05.000
So what would be the type of, like, knowledge that, like, the memory, or in the modern lingo, the agent skills?

00:21:05.000 --> 00:21:20.000
That, uh, the AI has learned. So this is one of them. So, you know, if the fit is overestimated in the UV and optical, what you should change is the properties of the dust. This is also what a experienced astronomer would learn over time.

00:21:20.000 --> 00:21:22.000
Right, so this is sort of, like, rule of thumbs.

00:21:22.000 --> 00:21:26.000
that, uh, that you will learn, like, over time.

00:21:26.000 --> 00:21:36.000
And this type of, like, memory, short-term and long-terms, will help the agent to do a better job. So why do we know that? For example, we also…

00:21:36.000 --> 00:21:44.000
comparing with some of the data set. So, in astronomy, we have this deep dataset called Cosmos 2020,

00:21:44.000 --> 00:21:54.000
And the classical way of, like, analyzing this galaxy spectrum is by building a grid of models and find the best fit.

00:21:54.000 --> 00:22:04.000
Right. So, what we try to compare here is to say we give only 1% of the budgets for Mephisto, meaning that we have built a grid, but we say, so let's say the grid is

00:22:04.000 --> 00:22:10.000
100… I think in this case, it's 100, uh, millions, uh, grid points.

00:22:10.000 --> 00:22:18.000
And then we tell, like, Mephisto, like, beforehand, is that you can only try a million models, meaning that 1%.

00:22:18.000 --> 00:22:23.000
Do your best, like, based on your memory and your knowledge.

00:22:23.000 --> 00:22:30.000
a random walk in the hypothesis space and find the solution. But just using 1% of the computation.

00:22:30.000 --> 00:22:34.000
And the y-axis here is showing you the comparison of the chi-square,

00:22:34.000 --> 00:22:41.000
Uh, you can see that, you know, in many cases, like, Mephisto is the performing, uh, equa- uh,

00:22:41.000 --> 00:22:48.000
equally, like, if not, like, better, despite using 1% of the computation, meaning that it has learned

00:22:48.000 --> 00:22:53.000
how to update the models based on the knowledge that it has gained.

00:22:53.000 --> 00:23:02.000
Right? So, uh, this is, again, old works. I'd love to start with this, that this is 2 years ago, but Discord has been made public, the agent with all the agent skills.

00:23:02.000 --> 00:23:08.000
has been made, like, public, and people have started to use that. Uh, so this is the word using, uh,

00:23:08.000 --> 00:23:12.000
of Bephisto to analyze a ultra-diffused galaxy.

00:23:12.000 --> 00:23:20.000
This is, like, another work that is using, like, Mephisto to analyze the quasar spectrum.

00:23:20.000 --> 00:23:22.000
to find a solution for this.

00:23:22.000 --> 00:23:25.000
bought, uh, sources.

00:23:25.000 --> 00:23:37.000
So, we also highlight that, you know, if this thing is not something that you have done before, uh, recently I chaired the NASA AI and ML Science and Technology Interest Group,

00:23:37.000 --> 00:23:54.000
One of the key things is to build AI literacy by curating a set of lecture. I, myself, I give the first few lectures exactly on this topic, so how do you build using the modern, large language models and build

00:23:54.000 --> 00:23:57.000
a, uh, a agent take a system,

00:23:57.000 --> 00:24:00.000
Um, you know, building, like, MCP for using tools and so forth.

00:24:00.000 --> 00:24:09.000
So, if you're interested, I will, like, encourage you to go through the lecture. Nowadays, like, the barrier is quite low, uh, because things are made much easier, so once you

00:24:09.000 --> 00:24:15.000
follow the field lecture here, you should be able to do much of what I just described.

00:24:15.000 --> 00:24:16.000
Can I… can I ask a question, Yuan Singh?

00:24:16.000 --> 00:24:22.000
But… yep.

00:24:22.000 --> 00:24:23.000
Yep.

00:24:23.000 --> 00:24:25.000
Um, so the, uh, going back to the, uh, the knowledge that it learns, is that… first of all, that's knowledge that, um,

00:24:25.000 --> 00:24:27.000
it's, like, interpretable, it's in natural language?

00:24:27.000 --> 00:24:35.000
Yeah, it's in natural languages, so it's very similar to the Asian skill, because the, uh, language models will also output

00:24:35.000 --> 00:24:43.000
Uh, the languages. So what is important is that, you know, I didn't mention this, the fourth agent here, it will distill the knowledge,

00:24:43.000 --> 00:24:49.000
But we also have the loops that will verify the knowledge, meaning that you can come up with all the proposal,

00:24:49.000 --> 00:24:54.000
But then we will take, uh, another, like, validation set.

00:24:54.000 --> 00:25:07.000
To see if such a knowledge, like, actually works in most of the time. So, like, only if those, like, knowledge is useful, then we append to, uh, other memory.

00:25:07.000 --> 00:25:08.000
Yeah.

00:25:08.000 --> 00:25:12.000
I see. And then, I guess I… it went by a little fast, so this loop that produces the knowledge…

00:25:12.000 --> 00:25:13.000
Yeah.

00:25:13.000 --> 00:25:14.000
or the memory, the skills,

00:25:14.000 --> 00:25:19.000
Yeah.

00:25:19.000 --> 00:25:20.000
Yeah, it's in a loop.

00:25:20.000 --> 00:25:26.000
Is that sort of, like, should we think of that like a training loop? And then now you've sort of got the agentic system that's trained, and you can…

00:25:26.000 --> 00:25:27.000
True.

00:25:27.000 --> 00:25:32.000
Yeah, I think it's more like a memory loops, because the weight has not been updated, so it just really like a pending longer and longer context, and also just trimming…

00:25:32.000 --> 00:25:34.000
the context, like, over time.

00:25:34.000 --> 00:25:38.000
Uh, the closest analogic, if people have been playing with

00:25:38.000 --> 00:25:42.000
with open claw, it's very similar to open claw.

00:25:42.000 --> 00:25:46.000
So you have a memory base, and you keep

00:25:46.000 --> 00:25:55.000
Just like human, you recall… you keep writing down what works and what do not work, but you also keep, like, a trimming.

00:25:55.000 --> 00:25:58.000
from your memory, what actually led to not worth.

00:25:58.000 --> 00:25:59.000
for Compactify. Yep.

00:25:59.000 --> 00:26:06.000
So I guess… so when you release Mephisto, it's with some knowledge that's been already accumulated, but then these papers you…

00:26:06.000 --> 00:26:09.000
Yeah.

00:26:09.000 --> 00:26:10.000
Yeah.

00:26:10.000 --> 00:26:11.000
You mentioned that are starting to use it. Do they, as they use it, do they generate…

00:26:11.000 --> 00:26:14.000
more skills for themselves, or…?

00:26:14.000 --> 00:26:21.000
Yeah, I don't think they generate more skill, so I think because, like, for a single, like, sources, it's hard to come up with new skills.

00:26:21.000 --> 00:26:27.000
So I think, like, lots of things just, like, are using the skills that we already, uh,

00:26:27.000 --> 00:26:28.000
are provided.

00:26:28.000 --> 00:26:34.000
I see. And then you, you provided those… you generated those skills by having it do what again?

00:26:34.000 --> 00:26:35.000
By training?

00:26:35.000 --> 00:26:36.000
Uh, we are training on Cosmos 2020 and Jade.

00:26:36.000 --> 00:26:37.000
Oh, Cosmos 2020, okay, I see.

00:26:37.000 --> 00:26:38.000
Yeah. Yeah.

00:26:38.000 --> 00:26:43.000
I see. So you had some, basically some…

00:26:43.000 --> 00:26:44.000
Yeah.

00:26:44.000 --> 00:26:46.000
test bed, task, set of tasks for it to basically train itself.

00:26:46.000 --> 00:26:47.000
Yeah.

00:26:47.000 --> 00:26:50.000
to… and then with that, you release it, and you hope it's useful for other people.

00:26:50.000 --> 00:26:51.000
Yep, exactly.

00:26:51.000 --> 00:26:53.000
Okay, okay. Got it. Thanks.

00:26:53.000 --> 00:27:02.000
Right. So, of course, you know, like, these type of things is… like, this is, like, back in 2024 and 2025,

00:27:02.000 --> 00:27:13.000
in the AI time is a hubber time. Uh, things are still quite limited, so, you know, the type things that the agent can do are limited, but we have, like, started to explore some of

00:27:13.000 --> 00:27:20.000
Uh, the things that the astronomer needs to do all the time, but also require some human intervention. So I will give you two more examples.

00:27:20.000 --> 00:27:28.000
So, one is equivalent with measurements. So, for people who work on stellar spectrum or stellar astrophysics, you know that this is

00:27:28.000 --> 00:27:35.000
the bread and butter scale that any grad student and postdoc need to learn. So, basically, we collect some spectrum,

00:27:35.000 --> 00:27:39.000
And then you want to calculate the area under the curve.

00:27:39.000 --> 00:27:50.000
And this allowed to be something that seemingly easy, but people have decided that the human in the loop is important, because there are many edge cases,

00:27:50.000 --> 00:27:56.000
They cannot just write… so there's basically no one, uh, robust

00:27:56.000 --> 00:28:02.000
algorithm that can fit all. So this is why most of the students, grad student, postdoc,

00:28:02.000 --> 00:28:07.000
in that field, are trained to do this type of tasks. So why the human in the loop is important for this type of things?

00:28:07.000 --> 00:28:16.000
So, for example, you can run a… write a algorithm, but sometimes it will miss some of the blended line. A blended line means that

00:28:16.000 --> 00:28:24.000
For one line, you have two features, right? So this is why the human need to decide if this is really another lie, or just noise.

00:28:24.000 --> 00:28:28.000
Uh, the human eye are still the best to do this thing.

00:28:28.000 --> 00:28:34.000
And more practically also, you know, because we need to, uh, normalize the spectrum and find the continuum.

00:28:34.000 --> 00:28:41.000
It's also, like, something that, uh, historically would require a loss of the human intervention.

00:28:41.000 --> 00:28:51.000
This is another case where the RK code that we have in astronomy are not agent-ready, so we use what we call, uh, IRAF.

00:28:51.000 --> 00:28:58.000
So it's a very old code, and interface that, you know, all the astronomers love and hate.

00:28:58.000 --> 00:29:04.000
Uh, but in order to make it to be Asian ready, we essentially rewrite.

00:29:04.000 --> 00:29:10.000
the version of, like, IRAF into a version that the AI can work with.

00:29:10.000 --> 00:29:15.000
But with that, we also show that the AI can do these tasks all by itself, meaning that if you decide

00:29:15.000 --> 00:29:20.000
when to… how to adjust the continuum, like, to decide when to…

00:29:20.000 --> 00:29:28.000
get, like, another modes in the feeds. Uh, this, of course, has important implications, I know that quite well, because

00:29:28.000 --> 00:29:34.000
Not too long ago, when we worked on a project like this, where a few hundred spectrum, I have to

00:29:34.000 --> 00:29:40.000
a higher… a very experienced postdoc. By that time, I…

00:29:40.000 --> 00:29:50.000
he make more money than me? It's because he's a very experienced poster. Even that, you know, like, he needs to work for 6 months.

00:29:50.000 --> 00:30:03.000
Uh, to painstakingly doing this at a human intervention. But with this code, we call agent equivalent with our agent, uh, we redo the entire thing and find that to be, uh,

00:30:03.000 --> 00:30:11.000
are consistent with the human measurement, but with only, like, $100 with a few hours.

00:30:11.000 --> 00:30:19.000
Right, so this is the thing that we are talking about. Even though these are not groundbreakingly, like, difficult tasks,

00:30:19.000 --> 00:30:24.000
But the automation is real, and a lot of the things that we are doing in the past

00:30:24.000 --> 00:30:29.000
that consume, like, a lot of time can already be automated.

00:30:29.000 --> 00:30:38.000
Uh, the other example that I will show is, uh, the citizen science, right? So, in astronomy, we love to…

00:30:38.000 --> 00:30:44.000
ask the public to help us to do some classification, because we still believe that

00:30:44.000 --> 00:30:51.000
no algorithm can catch all of the outlier, and the human intuition and the human

00:30:51.000 --> 00:31:00.000
visual, like, the ability are still the best, right? So, in OSU, we run the assassin assuray, which is a decade-long

00:31:00.000 --> 00:31:07.000
time domain as a way. We even have an assassin zoo, which is asking the public to classify the light curve.

00:31:07.000 --> 00:31:11.000
What we have done also is showing that, you know, with proper, like, tool use,

00:31:11.000 --> 00:31:13.000
We redo the Assassin Zoo.

00:31:13.000 --> 00:31:18.000
With only a few hours and few hundred dollars, and we are…

00:31:18.000 --> 00:31:26.000
able to find some outlier from the light curve that were missed by all the, uh, the people in the Asa Shin Tsu.

00:31:26.000 --> 00:31:29.000
So this type of, like, automation is also, uh…

00:31:29.000 --> 00:31:31.000
are happening.

00:31:31.000 --> 00:31:34.000
So, I know that you might be thinking,

00:31:34.000 --> 00:31:39.000
Um, maybe this is the future, right? There's no grad student, like, there's no postdoc,

00:31:39.000 --> 00:31:43.000
Uh, not so fast, I think, like, you know, I…

00:31:43.000 --> 00:31:48.000
So the second part, I want to give you a more optimistic view of what can be done.

00:31:48.000 --> 00:31:54.000
But I think that there's also be good to be more clear-eyed about what is the limitation. So I think, like, that is the plot twist.

00:31:54.000 --> 00:31:57.000
Um, it suffices to say that, you know,

00:31:57.000 --> 00:32:02.000
I think AI are changing really quickly, so each time I have to give this talk, I need to…

00:32:02.000 --> 00:32:19.000
temper this third part of the talks, because I feel like AI keep exceeding, uh, my imagination. But even now, I think it's still fair to say that AI are still struggling in many tasks that are easy for humans.

00:32:19.000 --> 00:32:33.000
So I'll give you two. One is understanding the scientific charts, right? So this is something that even a freshman student can do quite well if you give them an archived paper. Like, at least you should be able to read the plots

00:32:33.000 --> 00:32:41.000
quite accurately. Uh, this is a somewhat old benchmark by Princeton, but back then, in 2024,

00:32:41.000 --> 00:32:48.000
Uh, is, uh, the AI are not doing that well. So the gap is closing now, but I don't think it reached the human, uh,

00:32:48.000 --> 00:32:50.000
a level yet.

00:32:50.000 --> 00:32:57.000
There's also an example that I think is more striking. One is the ARC AGI2 benchmark.

00:32:57.000 --> 00:33:03.000
So this is a task that is very easy for human, but turned out to be quite difficult for AI.

00:33:03.000 --> 00:33:12.000
So the background goes as follows. So it's kind of the games that you love to play. Not you, I love to play. As a kid.

00:33:12.000 --> 00:33:16.000
probably you two, uh, is the game goes as, you know, if top…

00:33:16.000 --> 00:33:20.000
Right is corresponding to top left, what would be the bottom right?

00:33:20.000 --> 00:33:25.000
Right? So if you just screen for maybe, like, 2 seconds, then you realize that

00:33:25.000 --> 00:33:35.000
Uh, in order to connect the top left to top right, what we are doing is to color code the different blocks with the number of holes in the block, right? So if I have one hole, I will…

00:33:35.000 --> 00:33:42.000
the color code that we've green. If I have no holes, I will color code that with a yellow, and so forth.

00:33:42.000 --> 00:33:46.000
And therefore, for the bottom panel, uh, you should do the same.

00:33:46.000 --> 00:33:53.000
So, humans have no problem, right? So this is a task that you do not need a PhD to do very well.

00:33:53.000 --> 00:33:59.000
But 6 months later, when Sam, uh, Altman declared that the AGI is coming,

00:33:59.000 --> 00:34:05.000
Uh, the GPT-5 is only, like, getting 10% correct, right? So…

00:34:05.000 --> 00:34:17.000
But to understand why this is the case, uh, the best way to understand is this is what we call the Model Vax paradox. So, even back in the 1990s, people have noticed that there are things that are easy for human,

00:34:17.000 --> 00:34:21.000
thought out to be hard for computer, and vice, uh…

00:34:21.000 --> 00:34:24.000
Aversa. And to understand why,

00:34:24.000 --> 00:34:30.000
more of X parallel depths happens. I think the best way I love to explain to people is AI are essentially

00:34:30.000 --> 00:34:37.000
trying to reverse, uh, engineer the evolution, right? So, things that are happening

00:34:37.000 --> 00:34:46.000
The last, for mammals, uh, will be the easiest to be, uh, be, uh, be, uh, uh, uh, uh,

00:34:46.000 --> 00:34:54.000
are imitated. So this is why a lot of things that we, uh, we process in the frontal lobes, right? Like, like calculation, like deriving, like, equations.

00:34:54.000 --> 00:34:57.000
Those things are actually…

00:34:57.000 --> 00:35:03.000
quite mechanical, and it's, uh, easier to be, uh, be imitated.

00:35:03.000 --> 00:35:07.000
But a lot of tasks that we tend to not

00:35:07.000 --> 00:35:11.000
price? Uh, turned out to be quite hard for AI, right? So these are the things that…

00:35:11.000 --> 00:35:13.000
Even the earliest, uh…

00:35:13.000 --> 00:35:19.000
a memo has, but, uh, it's, uh, uh, but it's just something that we don't pay attention.

00:35:19.000 --> 00:35:27.000
Right, so, like, spatial reasoning is one, like, common sense, uh, visual… the reasoning. So those happen much…

00:35:27.000 --> 00:35:34.000
earlier in the evolution, but turned out to be more difficult for AI.

00:35:34.000 --> 00:35:48.000
Right? So, and this is also, of course, like, reflected in all of the benchmarks, including the one that we mentioned earlier. So, in the astronomy Olympiad that we work on, yes, you know, the AI could have won first place,

00:35:48.000 --> 00:35:51.000
But if you look at what are the questions that it loses point,

00:35:51.000 --> 00:35:59.000
Quite often, you're just, like, coming down to some of these things that is very, uh, easy for students, right? Like, understanding the plots.

00:35:59.000 --> 00:36:07.000
Uh, it's still quite difficult for AI. So this is why I think I want to highlight this just to mention that

00:36:07.000 --> 00:36:09.000
AI systems are getting quite strong.

00:36:09.000 --> 00:36:16.000
But there are still ecosystems that we need to build for AI, and…

00:36:16.000 --> 00:36:17.000
For astronomy or particle physics,

00:36:17.000 --> 00:36:23.000
Uh, big companies simply do not have that much an appetite to make sure that all of your tools

00:36:23.000 --> 00:36:29.000
are ready to be used, uh, by agent. So there are lots of, like, groundwork learning to be done.

00:36:29.000 --> 00:36:32.000
So I will show you, like, some of the things that…

00:36:32.000 --> 00:36:41.000
like my group is doing, uh, just a quick flash. So one is, uh, uh, improving the understanding of charts, right? So, so this is, like, something that we work…

00:36:41.000 --> 00:36:48.000
with people at ANU, my, uh, my previous employer, so I still have some students there working on

00:36:48.000 --> 00:36:56.000
what we call non-natural, uh, image, like, understanding, because AI are getting quite good at, uh, natural,

00:36:56.000 --> 00:37:01.000
images, but non-natural, like, images like charts are still quite difficult.

00:37:01.000 --> 00:37:03.000
For AI. Of course, this gap is…

00:37:03.000 --> 00:37:10.000
rapidly changing, like, if people have played with Nano Banana, you know that, you know, uh, like, Google seems to be

00:37:10.000 --> 00:37:15.000
knowing what they are doing, and they are doing quite well in that domain.

00:37:15.000 --> 00:37:18.000
The other thing that we are doing…

00:37:18.000 --> 00:37:22.000
Yep.

00:37:22.000 --> 00:37:23.000
Yeah.

00:37:23.000 --> 00:37:24.000
Sorry. Is it possible to ask questions? On your previous slides?

00:37:24.000 --> 00:37:28.000
Well, when you showed the, um, the statistics of what the, um,

00:37:28.000 --> 00:37:29.000
Yeah.

00:37:29.000 --> 00:37:31.000
the AI is getting wrong.

00:37:31.000 --> 00:37:32.000
Yeah.

00:37:32.000 --> 00:37:33.000
Right. Is that a language

00:37:33.000 --> 00:37:35.000
parsing problem, like, it's not…

00:37:35.000 --> 00:37:46.000
parsing correctly the language of the question, or is it, like, it cannot synthesize?

00:37:46.000 --> 00:37:47.000
I think… yeah.

00:37:47.000 --> 00:37:48.000
Or find the answer through its patterns? Like, should it be able to…

00:37:48.000 --> 00:37:55.000
solve it? Like, do we know what's the reason why it's not gathered?

00:37:55.000 --> 00:38:00.000
Yeah, it's a ladder, because we are using the multi, uh…

00:38:00.000 --> 00:38:04.000
a model, so we are not describing the plot, we are, like, giving the plot.

00:38:04.000 --> 00:38:14.000
for the AI. Uh, you might also see on Twitter that, you know, like, sometimes, like, GBD-5, like, cannot solve a high school trigonometry, like, problems, because

00:38:14.000 --> 00:38:18.000
It does not understand what is fine X, so if you draw a triangle, you know,

00:38:18.000 --> 00:38:32.000
Like, it's still not bad, but there are… these are very simple tasks that you simply do not have the ability to understand chart, uh, not… it does not have the ability, it's just not, like, performing at

00:38:32.000 --> 00:38:39.000
the level of, uh, a human, uh, can do. I think this is still real, right? If you…

00:38:39.000 --> 00:38:45.000
even use, like, nano the banana to edit your plot, quite often, you know that it makes some mistake.

00:38:45.000 --> 00:38:50.000
Uh, it's still a task that is quite that difficult.

00:38:50.000 --> 00:38:51.000
Uh, to answer your question?

00:38:51.000 --> 00:38:54.000
Thank you.

00:38:54.000 --> 00:39:03.000
Okay, okay. Another thing that I think is important to notice is, you know, because the database that we have, you would not want the AI to just…

00:39:03.000 --> 00:39:19.000
give you the answer, right? You want to have some, like, recommender, like, system based on the knowledge that you have. And this is something that, like, my group has been working as well, meaning that we are using the literature in astronomy and curate a knowledge graph, so this is the connection of all the concepts.

00:39:19.000 --> 00:39:25.000
But more… more recently, what we have been doing is to take the entire corpus of astronomy,

00:39:25.000 --> 00:39:32.000
distilled all the objects and the concepts, and tried to make a recommended system. You can think about this like, like Google Search, right?

00:39:32.000 --> 00:39:39.000
Just like, even with AI, you would need a Google search to be able to do a robust

00:39:39.000 --> 00:39:49.000
recommended that system. And for a very niche domain, like astronomy or particle physics, it's particularly important that we have to do that, because, like, no one will do that.

00:39:49.000 --> 00:39:56.000
Uh, for us. So a lot of the work that we are doing is building the groundwork for the knowledge graph, and also improving

00:39:56.000 --> 00:40:01.000
some of the ability of the AI system.

00:40:01.000 --> 00:40:04.000
Right. So…

00:40:04.000 --> 00:40:14.000
The other thing that I would mention, but I don't think is… I… I will start by saying that I don't think it's no longer a very interesting route of research. It's training specialized models.

00:40:14.000 --> 00:40:23.000
Right, so… two years ago, we started to work with the national labs and say, how well we can fine-tune models

00:40:23.000 --> 00:40:27.000
that can perform in quite well in a certain task. In this case, we are looking at

00:40:27.000 --> 00:40:32.000
the Astronomy Q&A, right? So we curate some benchmarks in Q&A, and we're asking,

00:40:32.000 --> 00:40:42.000
Like, can… like, can we just fine-tune models based on all the astronomy, like, corpus, and try to improve the performance?

00:40:42.000 --> 00:40:53.000
So, because of that, we need to build a benchmark, uh, and we have been tracing the performance of all the AI models, so this is the score, the accuracy.

00:40:53.000 --> 00:41:00.000
as a function of cost. By the way, the human accuracy is 66%, right? So if I were to do the test,

00:41:00.000 --> 00:41:05.000
I will fail the task because I'm not that good at recalling some of the facts.

00:41:05.000 --> 00:41:17.000
Uh, but if you find your models, uh, like, you can do quite well. So I think what we have learned is that, yes, if you have a single task, like astronomy Q&A, you can fine-tune the models with, like, 10,000

00:41:17.000 --> 00:41:19.000
GPU, uh, uh…

00:41:19.000 --> 00:41:24.000
hours, then you will be able to, for this singular task,

00:41:24.000 --> 00:41:29.000
a 70 billion models can, uh, deliver a performance like HGPT-5.

00:41:29.000 --> 00:41:38.000
But it's only for this task. So the reason that I think, like, this is not a very fruitful way is because if you want to use AI as a… as an, uh…

00:41:38.000 --> 00:41:45.000
are Asian. You want them to have more holistic ability, right? So it's just like…

00:41:45.000 --> 00:41:47.000
someone can do.

00:41:47.000 --> 00:41:50.000
Well, in a who wants to be a…

00:41:50.000 --> 00:41:57.000
billionaire might not be a good, like, researcher. But yes, if you fine-tune a human to just train on who wants to be a

00:41:57.000 --> 00:42:01.000
a millionaire or Japanese, uh, like, you can do quite well.

00:42:01.000 --> 00:42:05.000
So, sorry, can I just ask about this?

00:42:05.000 --> 00:42:06.000
Yep.

00:42:06.000 --> 00:42:08.000
So, the cost you're counting is the cost to train the entire GPT-5?

00:42:08.000 --> 00:42:13.000
No, so the cost here for the other, like, models is the, uh, is the inference cost.

00:42:13.000 --> 00:42:14.000
Oh, inference costs, okay.

00:42:14.000 --> 00:42:20.000
So for the open source model, we are asking if I were to pay a company to load

00:42:20.000 --> 00:42:21.000
Yeah.

00:42:21.000 --> 00:42:24.000
my, uh, Hugging Face models, how much would that cost for the inference?

00:42:24.000 --> 00:42:25.000
the training cost of projects.

00:42:25.000 --> 00:42:27.000
inference, I see. So you can make it a lot cheaper for the inference. Okay, got it.

00:42:27.000 --> 00:42:28.000
Yeah, yeah.

00:42:28.000 --> 00:42:30.000
But then for the training,

00:42:30.000 --> 00:42:37.000
Yeah, the training is horrendous. I think we… for the 70 billion models, we train for about

00:42:37.000 --> 00:42:40.000
3 times 10 to the power 5 GPU hours.

00:42:40.000 --> 00:42:43.000
Yeah, I think it's less than a billion.

00:42:43.000 --> 00:42:50.000
But quarter of a million.

00:42:50.000 --> 00:42:51.000
Yeah. Yeah.

00:42:51.000 --> 00:42:52.000
So, like, what's the… I guess, what's the cost of that versus… I mean, and then the GPT-5, they already… well, I don't know, yeah, okay. Okay, I understand what the comparison is, alright.

00:42:52.000 --> 00:42:56.000
Yeah, so it's an academic, like, exercise, which is why I say, uh,

00:42:56.000 --> 00:43:01.000
Identify… so this is something that we started, but we… we wrapped up, but I…

00:43:01.000 --> 00:43:06.000
Uh, I think the conclusion is interesting. For a small academic group, you can

00:43:06.000 --> 00:43:09.000
like, fine-tune a 70 billion models.

00:43:09.000 --> 00:43:12.000
But I just don't think you want to do it.

00:43:12.000 --> 00:43:14.000
Uh, the gain is small.

00:43:14.000 --> 00:43:21.000
Alright. Uh, but nonetheless, because we are curating our own, like, benchmark, we are able to track

00:43:21.000 --> 00:43:28.000
Since the beginning of time, figuring out what is the cost efficiency, meaning that, you know, if you think about is the slopes here.

00:43:28.000 --> 00:43:31.000
Right, so it's the score, like, divided by the cost.

00:43:31.000 --> 00:43:40.000
Right? So, uh… so this is… if you just, like, ignore the astro station, just looking at all the open source and proprietary models,

00:43:40.000 --> 00:43:45.000
The cost efficiency in terms of answering astronomy-like questions?

00:43:45.000 --> 00:43:50.000
has improved by 6 orders of magnitude since 2023.

00:43:50.000 --> 00:43:56.000
I'm not sure that people have, like, noticed this, right? So even AI stopped improving in terms of the ability.

00:43:56.000 --> 00:44:04.000
Just the cost efficiency is crazy, right? So, I have not seen, like, anything in my life that has improved by a

00:44:04.000 --> 00:44:06.000
by a million phone in costs of…

00:44:06.000 --> 00:44:10.000
efficiency. So even though, like, astronomy can do

00:44:10.000 --> 00:44:15.000
simple tasks, and even the few things that I just mentioned, like measuring the equip the equivalent width.

00:44:15.000 --> 00:44:25.000
Like, we just take no cost, like, in a few years, just because, you know, the language models are improving, and the costs are dropping.

00:44:25.000 --> 00:44:28.000
Right. Uh, one…

00:44:28.000 --> 00:44:37.000
Uh, side note, not only the coin is dropping, the ability is unlocking on a daily basis, right? So this ARC AGI benchmark that I mentioned,

00:44:37.000 --> 00:44:48.000
that GPT-5 did not do well in September. Last month, late last month, with GPT-5.4 and the Gemini 3.1, this actually reached 85%.

00:44:48.000 --> 00:44:53.000
So it suffices to say that even for things that we think are very human,

00:44:53.000 --> 00:44:58.000
AI are closing the gap very, very, uh, drastically.

00:44:58.000 --> 00:45:02.000
Right. Uh, so this is why…

00:45:02.000 --> 00:45:11.000
I think that this is a thing that I think the community has to grapple with. I think there are things that, you know, you can be worried, but there are also things that, like, we…

00:45:11.000 --> 00:45:14.000
We as a community has to think about.

00:45:14.000 --> 00:45:22.000
Uh, the obvious thing is that, you know, how do we train a student? Or what is the… like, how do we even encourage

00:45:22.000 --> 00:45:27.000
supervisor to take on, like, students, right? So, as we all know,

00:45:27.000 --> 00:45:30.000
a lot of the things that we do is much more efficient with just Cloud Code.

00:45:30.000 --> 00:45:36.000
But it does not mean that the apprenticeship is not important, right? It's just that these two

00:45:36.000 --> 00:45:45.000
objective now start to, like, bifurcate. If you just want to optimize your scientific, like, output as a senior, like, researcher,

00:45:45.000 --> 00:45:50.000
that is a very, like, dangerous path, because, like, you might come to…

00:45:50.000 --> 00:45:56.000
the conclusion that the trainings a student is not worth it. I see a question.

00:45:56.000 --> 00:46:00.000
Yep, uh, yep. So, so, so, this is why we, uh…

00:46:00.000 --> 00:46:07.000
are hosting this workshop. I thought it would be in person at Ohio State, but we will change that to a remote.

00:46:07.000 --> 00:46:09.000
Uh, to discuss, like, about this.

00:46:09.000 --> 00:46:22.000
Uh, what is the future of the scholarship? I think that is something that we need to think about, which leads to my last topic in the last few minutes. Something that I got interested for no reason, and is kind of like,

00:46:22.000 --> 00:46:31.000
a bit off-topic for any, like, physics and astronomy talk, which is the epistemic, like, implication of AI.

00:46:31.000 --> 00:46:35.000
Uh, because we are at the level that, you know, AI are automating so much of the science,

00:46:35.000 --> 00:46:42.000
Even as a scientist, you start to need to think about what is the epistemic, like, implication of science.

00:46:42.000 --> 00:46:52.000
This something has caught my attention, so this is why last, uh, late last year, we held, uh, conferences in, in, uh, at OSU. We bring together the philosopher of science,

00:46:52.000 --> 00:46:56.000
the linguistic and also the scientists to talk about what is the implication.

00:46:56.000 --> 00:47:01.000
I wish the finding has, uh, culminating into this piece.

00:47:01.000 --> 00:47:09.000
That actually just came out, like, yesterday. I do put a archive, like, version much earlier, uh, but the nature, like, astronomy version is just…

00:47:09.000 --> 00:47:18.000
came out, so I worked with the philosopher of science, uh, to think about this question. So I will, like, summarize what we find, uh, but

00:47:18.000 --> 00:47:23.000
This is so important that, you know, I started to direct a new, uh,

00:47:23.000 --> 00:47:28.000
initiative at OSU that are focusing on, like, agent and the epistemic

00:47:28.000 --> 00:47:33.000
the implication of agent. So, what we have found, right? So, I think…

00:47:33.000 --> 00:47:40.000
I don't think I am at hawkish as, you know, like Sam Altman or Dario…

00:47:40.000 --> 00:47:42.000
Amode, because…

00:47:42.000 --> 00:47:48.000
Uh, I still feel like humans cannot be, uh, eliminated from science.

00:47:48.000 --> 00:47:52.000
I'll put it this way, I think, like, by definition, by the epistemic

00:47:52.000 --> 00:47:58.000
the definition? Science, as we define it, will cease to exist.

00:47:58.000 --> 00:48:02.000
If the human is not, uh, participating.

00:48:02.000 --> 00:48:05.000
Right, so why that is the case? I think that the key here is that

00:48:05.000 --> 00:48:10.000
Uh, AI have no interest in understanding the universe. We want to understand

00:48:10.000 --> 00:48:14.000
the universe. So this is why we construct story and context.

00:48:14.000 --> 00:48:25.000
to understand the universe. I think that some of the things that the engineer miss is that they're doing science or physics or astronomy are not making prediction.

00:48:25.000 --> 00:48:29.000
The making the prediction is one of the things that we do,

00:48:29.000 --> 00:48:32.000
But ultimately, what we try to do is to be able to

00:48:32.000 --> 00:48:37.000
find some characteristic that we can construct some, uh, some narrative.

00:48:37.000 --> 00:48:40.000
And try to make the world intelligible to us.

00:48:40.000 --> 00:48:48.000
not intelligible to the AI. I will encourage you to read that piece if you are interested, because I spent a lot of time to think about that.

00:48:48.000 --> 00:48:53.000
that thing. I don't think I will do, like, justice to the piece, but I want to give you some unders… so why…

00:48:53.000 --> 00:49:01.000
I think human has to participate. Because science is about, like, narrative. We are telling a story to ourselves.

00:49:01.000 --> 00:49:04.000
And because we are telling a story to ourselves,

00:49:04.000 --> 00:49:07.000
Then, the narrative becomes

00:49:07.000 --> 00:49:12.000
are very important. No words can contain all the information that we

00:49:12.000 --> 00:49:15.000
one, two. So, the words…

00:49:15.000 --> 00:49:21.000
is not a sufficient carrier of all the contacts that we…

00:49:21.000 --> 00:49:31.000
we understand, right? So, give you an example, uh, we talked with a linguist, right? So the linguist will always point to this example from

00:49:31.000 --> 00:49:36.000
Uh, the sticks with a story from Hermanue, so Herb, uh, Hermingway is a

00:49:36.000 --> 00:49:44.000
Great master in the writing six-word story. One of the famous ones goes as follows, uh, for sales,

00:49:44.000 --> 00:49:47.000
baby shoes, and never worn.

00:49:47.000 --> 00:49:51.000
Right, so this is a six-word story, but I think for most of the audience,

00:49:51.000 --> 00:49:59.000
the emotional weight is more than the six words. So this is my point, because a lot of the things that we understand, like the science is the same,

00:49:59.000 --> 00:50:07.000
Now, a lot of the scientific understanding that we have as a community is circulating

00:50:07.000 --> 00:50:14.000
as a meme in the community, right? But this is something that is not encoded.

00:50:14.000 --> 00:50:17.000
in the text.

00:50:17.000 --> 00:50:21.000
And all the science really comes down to, like, making the world intelligible to us.

00:50:21.000 --> 00:50:27.000
AI can help us to make the world more intelligible, but ultimately, we want to tell the story such that

00:50:27.000 --> 00:50:31.000
the other human can also retell the story, right? So this is why

00:50:31.000 --> 00:50:33.000
Uh, I am not that…

00:50:33.000 --> 00:50:40.000
pessimistic, I don't think, you know, it's not like, oh, we make a switch, and, like, AI will do all, like, all the signs.

00:50:40.000 --> 00:50:43.000
And the human will be…

00:50:43.000 --> 00:50:51.000
irrelevant. Exactly because I think that some of the people in the industry or engineers, they simply miss the fact of

00:50:51.000 --> 00:50:59.000
What is the philosophy of science in the first place? So I find that quite, uh, illuminating to talk to the philosophers, to understand, like, where we stand.

00:50:59.000 --> 00:51:08.000
But at the same time, I'm also not, you know, turning a blind eye. I really find that this is a talk that is particularly… not particularly hard to give.

00:51:08.000 --> 00:51:16.000
Because, like, every few months, what I say is obsolete. Like, if not, you know, like, I think, like, half of the talks here is already, like, obsolete.

00:51:16.000 --> 00:51:36.000
Things are moving really fast, and we are just living in a very interesting time. So I will stop here with this code that I wrote in the piece, and take any question that you might have.

00:51:36.000 --> 00:51:43.000
Alright, we have plenty of time for questions, so please, uh, don't be shy.

00:51:43.000 --> 00:51:50.000
Yeah.

00:51:50.000 --> 00:51:54.000
Are people talking in the seminar room? Because we can't quite hear…

00:51:54.000 --> 00:51:55.000
Okay, okay.

00:51:55.000 --> 00:51:58.000
No, we're just trying to see the chat. We're working on it.

00:51:58.000 --> 00:51:59.000
Come on.

00:51:59.000 --> 00:52:09.000
Well, while you're waiting, I have a bunch of questions, for sure. So, yeah, you mentioned that things are moving very fast, and that, you know, it's tough to give the talk.

00:52:09.000 --> 00:52:10.000
Yeah.

00:52:10.000 --> 00:52:14.000
Because things become obsolete quickly. So, going back to the beginning of your talk in the agentic systems you've developed,

00:52:14.000 --> 00:52:15.000
Yeah.

00:52:15.000 --> 00:52:17.000
Um, compared to, say, Cloud Code, which a lot of us are using these days,

00:52:17.000 --> 00:52:20.000
Yeah.

00:52:20.000 --> 00:52:21.000
Yeah.

00:52:21.000 --> 00:52:24.000
Uh, with great success, I would say. Like, how would you say… what would you say is the…

00:52:24.000 --> 00:52:28.000
space, uh, where we could hope to improve on

00:52:28.000 --> 00:52:29.000
on the Frontier Lab products, versus just…

00:52:29.000 --> 00:52:30.000
Very cool.

00:52:30.000 --> 00:52:34.000
Yum.

00:52:34.000 --> 00:52:35.000
Yeah.

00:52:35.000 --> 00:52:36.000
Use them. So the kind of research into developing these agentic systems that you did,

00:52:36.000 --> 00:52:37.000
Yep.

00:52:37.000 --> 00:52:42.000
like… like, does it have any value?

00:52:42.000 --> 00:52:43.000
Anymore.

00:52:43.000 --> 00:52:44.000
Yeah. Yeah, yeah, yeah, this is the thing that I keep…

00:52:44.000 --> 00:52:45.000
Okay.

00:52:45.000 --> 00:52:46.000
debating, because ultimately,

00:52:46.000 --> 00:52:49.000
what the system do not…

00:52:49.000 --> 00:52:55.000
naturally do is to be able to connect some of the code that we wrote.

00:52:55.000 --> 00:53:00.000
Just because… but it's also not that hard if you just say, you know, these are the all…

00:53:00.000 --> 00:53:04.000
all the codebase and turn that into some MCP.

00:53:04.000 --> 00:53:06.000
like, call, click, and do that. Just that…

00:53:06.000 --> 00:53:10.000
Clot has my band back. So this is the part of…

00:53:10.000 --> 00:53:12.000
project that I keep at debating, because…

00:53:12.000 --> 00:53:15.000
I have a student, I was, like, thinking to…

00:53:15.000 --> 00:53:16.000
you know, turn most of the well-used codebase into some agentic framework.

00:53:16.000 --> 00:53:20.000
I did a pretty good.

00:53:20.000 --> 00:53:29.000
Boss, I feel like, you know, yes, you can do that, but also, what is the value? Because all people, like, wherever you need to build a system, like, you might just ask Claude Code to…

00:53:29.000 --> 00:53:32.000
to refactor that. So…

00:53:32.000 --> 00:53:36.000
Uh, unclear. I think that for the, sort of, like, baking…

00:53:36.000 --> 00:53:39.000
agent LED2.

00:53:39.000 --> 00:53:44.000
Unless the code is, uh, very, uh, uh…

00:53:44.000 --> 00:53:47.000
huge, right? Huge piece of code.

00:53:47.000 --> 00:53:51.000
then yes, I think you want to do that yourself. But if, like, for simple code,

00:53:51.000 --> 00:53:54.000
then they will not be worth it.

00:53:54.000 --> 00:53:55.000
Mm-hmm, mm-hmm.

00:53:55.000 --> 00:53:58.000
Uh, because, like, you're thinking about some of this,

00:53:58.000 --> 00:54:08.000
Votron code. I don't think, uh, is the one prompt for cloud code to make that into something that, like, other can use. So you need some people to do that.

00:54:08.000 --> 00:54:17.000
All right, um, but I didn't find that more interesting to think about some of the… some of the knowledge graph and recommended system. I think, like, that is something that…

00:54:17.000 --> 00:54:24.000
the big company is unlikely to do. I think that has a more enormous implication to the accuracy.

00:54:24.000 --> 00:54:25.000
Of things, uh, that we need.

00:54:25.000 --> 00:54:30.000
Yes, thank you.

00:54:30.000 --> 00:54:32.000
I see a question on Zoom. Yannis, do you want to ask it yourself? Or, uh, go ahead.

00:54:32.000 --> 00:54:38.000
Yes, uh, first of all, thank you for a great presentation and a very… on a very thought-provoking and cutting-edge subject.

00:54:38.000 --> 00:54:46.000
Now, my question is, why do humans think of AI as non-human,

00:54:46.000 --> 00:54:48.000
After all, it's been programmed by humans,

00:54:48.000 --> 00:54:53.000
It's trained on human-produced content.

00:54:53.000 --> 00:55:02.000
And it's used by humans for human purposes. So why do you think we think of AI as something…

00:55:02.000 --> 00:55:03.000
Oh, uh-oh.

00:55:03.000 --> 00:55:06.000
outside of the human realm.

00:55:06.000 --> 00:55:12.000
Oh, no, I don't think they are non-human, but I don't think they are human, so I think, like, the…

00:55:12.000 --> 00:55:18.000
The epistemic implication? So at first, we were just thinking that, you know, like,

00:55:18.000 --> 00:55:26.000
maybe an angle to understand the impact of AI is, like, simulation, right? Like, 20 years ago, like, simulation happens,

00:55:26.000 --> 00:55:31.000
And no one, I believe that simulation is science until people have to accept that simulation is part of

00:55:31.000 --> 00:55:36.000
the knowledge that we have. But what is really different from simulation is that

00:55:36.000 --> 00:55:45.000
the AI talks back. So this is, like, uh, I thought the biologists, I think that it's very fun, I think the remark is that, uh, AI is the first

00:55:45.000 --> 00:55:48.000
species, non-human, that can talk to a human.

00:55:48.000 --> 00:55:55.000
Which is true. You think about, you know, all the species on Earth, none of them can communicate with us.

00:55:55.000 --> 00:56:00.000
AI can, but it's… it's not human in the classical sense.

00:56:00.000 --> 00:56:06.000
When you have such a system that is expanding your perception,

00:56:06.000 --> 00:56:09.000
How do you really redefine?

00:56:09.000 --> 00:56:15.000
What is the knowledge? Become more involved than just treating this as no…

00:56:15.000 --> 00:56:19.000
simulation, right? So I think, like, this is the thing that we are…

00:56:19.000 --> 00:56:21.000
like we are grappling with.

00:56:21.000 --> 00:56:28.000
Of course, I think the impact is also likely to be, uh, larger than simulation, but I think it's…

00:56:28.000 --> 00:56:33.000
kind of a definition, but you are right that we should treat them as

00:56:33.000 --> 00:56:35.000
We shouldn't treat them as non…

00:56:35.000 --> 00:56:38.000
non-object. I think that is a…

00:56:38.000 --> 00:56:41.000
Because I think the character… the characteristic is that

00:56:41.000 --> 00:56:48.000
it worked with us. It does not give us answer. It's a system that communicates with us.

00:56:48.000 --> 00:56:53.000
And how do you include such a thing in…

00:56:53.000 --> 00:56:54.000
the endeavor of…

00:56:54.000 --> 00:56:57.000
a scientific inquiry.

00:56:57.000 --> 00:57:01.000
is unprecedented, right? We never have

00:57:01.000 --> 00:57:05.000
like, any, like, kangaroos come in and say, oh, no, kangaroos,

00:57:05.000 --> 00:57:08.000
like, brain is part of the scientific endeavor.

00:57:08.000 --> 00:57:11.000
Yeah, so I think this is what we are…

00:57:11.000 --> 00:57:24.000
like, grappling with.

00:57:24.000 --> 00:57:28.000
Any other questions from the room?

00:57:28.000 --> 00:57:35.000
I think you'd kind of touched on the question that I've been thinking about in terms of the agentic stuff in our discussions here.

00:57:35.000 --> 00:57:41.000
We were kind of feeling that a lot of the agent systems we'd been seeing in.

00:57:41.000 --> 00:58:03.000
you know particle. physics discussions have been kind of a ways to get around the context.

00:58:03.000 --> 00:58:04.000
Right.

00:58:04.000 --> 00:58:08.000
Limits. Um… Is that… do you think that that's a useful way to think about sort of what people are doing with with agent systems? I don't mean that sort of in a dismissive way, right? Like the context window is a real can be a real issue.

00:58:08.000 --> 00:58:15.000
Yeah, I'm not sure there's, like, a conclusion to that, right? So there's always, like, the retrieval system, and also just, like, long context

00:58:15.000 --> 00:58:19.000
should be able to retain the everything.

00:58:19.000 --> 00:58:21.000
Um,

00:58:21.000 --> 00:58:25.000
you know, just as a physicsist training, I always love to…

00:58:25.000 --> 00:58:30.000
hope that we can do some algorithm, meaning that a retrieval, a system that is effective.

00:58:30.000 --> 00:58:34.000
not just relying on Infite90 long, like, contacts.

00:58:34.000 --> 00:58:37.000
It's just that there does not seem to be…

00:58:37.000 --> 00:58:39.000
Uh… up.

00:58:39.000 --> 00:58:41.000
are elegant, but to my taste.

00:58:41.000 --> 00:58:48.000
But it could also be true that if the context is long enough and you can retain, like, everything, so…

00:58:48.000 --> 00:58:55.000
Who cares? Yeah, I'm not sure I have a good answer to that.

00:58:55.000 --> 00:58:59.000
Thanks.

00:58:59.000 --> 00:59:00.000
Saurabh has a question on Zoom. Um, I can read it out loud, or Saurabh, do you want to…

00:59:00.000 --> 00:59:04.000
Janice, I guess.

00:59:04.000 --> 00:59:05.000
Um, yeah, you speak…

00:59:05.000 --> 00:59:07.000
Ask it.

00:59:07.000 --> 00:59:11.000
But better than me, so I will let you read out the question, and then I will answer.

00:59:11.000 --> 00:59:15.000
Oh.

00:59:15.000 --> 00:59:25.000
Anos… well, yeah, so I guess Saurabh is asking… okay, he can't talk or something, so he's, uh, asking, yeah, what do we… how do we advise, uh, junior and senior…

00:59:25.000 --> 00:59:31.000
researchers, uh, what skills should they be learning? What would be your advice?

00:59:31.000 --> 00:59:32.000
Yeah.

00:59:32.000 --> 00:59:37.000
I always tell my student the same thing. I think, like, um, I think there are two things. I think, like,

00:59:37.000 --> 00:59:39.000
Not only we need to tell the junior as…

00:59:39.000 --> 00:59:46.000
junior students, because I don't feel like they are the problem. I think, like, the professors are the problems, because

00:59:46.000 --> 00:59:54.000
the professor, I decided not to hire a grad student because I caught cold, right? I think that is a bigger problem, but…

00:59:54.000 --> 00:59:56.000
For students,

00:59:56.000 --> 00:59:59.000
I don't think that anything changed, I…

00:59:59.000 --> 01:00:05.000
The only thing that I always, even before AI, I always emphasize on a good taste.

01:00:05.000 --> 01:00:13.000
Right? Your… this is, like, the narrative order thing, right? I think, like, what makes a good student a good student really come from

01:00:13.000 --> 01:00:18.000
the acquired taste of what is a good prop… problems.

01:00:18.000 --> 01:00:23.000
Uh, I think I find the ability to find good, good problems,

01:00:23.000 --> 01:00:26.000
become even more important now. It… it…

01:00:26.000 --> 01:00:29.000
In the past, it always has to be important, just somehow, I think the…

01:00:29.000 --> 01:00:32.000
the whole industry of science has

01:00:32.000 --> 01:00:34.000
makes students find

01:00:34.000 --> 01:00:38.000
of putting more emphasis on problem solving.

01:00:38.000 --> 01:00:40.000
But not problem-like finding.

01:00:40.000 --> 01:00:46.000
I find that the junior student, I think, no. I think this is the danger, because…

01:00:46.000 --> 01:00:48.000
If… and how, like, how do you learn from, like,

01:00:48.000 --> 01:00:56.000
problem finding, quite often, you learn that from your supervisor. The supervisor has a lot more experience, and this is why the craftsmanship

01:00:56.000 --> 01:00:57.000
get… get passed on.

01:00:57.000 --> 01:01:04.000
But if your supervisor are just using Claude code to do everything, then you don't have the opportunity to learn

01:01:04.000 --> 01:01:08.000
that pop, uh, that is very, uh, of…

01:01:08.000 --> 01:01:12.000
are valuable. So, you know, I always say, you know, just…

01:01:12.000 --> 01:01:17.000
Even more important, you should go to the clock game, or you should go to, like, coffee, you should go to, like, talk to people.

01:01:17.000 --> 01:01:24.000
But if a student can use Claude Code, then maybe they can problem find much faster. You know, they could…

01:01:24.000 --> 01:01:25.000
I'm not the optimistic, because…

01:01:25.000 --> 01:01:31.000
probably a lot of different problems and see which ones lead somewhere. Much, much more than they could have in the past.

01:01:31.000 --> 01:01:36.000
I'm not sure what the people experience, like, at least my experience is that the…

01:01:36.000 --> 01:01:39.000
ability to use Cloud Code is…

01:01:39.000 --> 01:01:42.000
almost super linear…

01:01:42.000 --> 01:01:49.000
are dependent on your… your maturity in the field. So if you know what question to ask,

01:01:49.000 --> 01:01:52.000
what is a good answer, you get to answer quite quickly.

01:01:52.000 --> 01:02:00.000
student, you can use Cloud Code, but I don't think it fundamentally… I think, like, like, there might even be loss in all the generation, and…

01:02:00.000 --> 01:02:09.000
Yeah. That's interesting. Yeah, we're just starting to play around with it here, I think, but I don't know, I have a suspicion that may be true.

01:02:09.000 --> 01:02:10.000
Yeah.

01:02:10.000 --> 01:02:11.000
Yeah. So, yeah, yeah.

01:02:11.000 --> 01:02:19.000
They've done studies, though, right? The effectiveness of these tools is… they've done big studies, right? Like, it's proportional here.

01:02:19.000 --> 01:02:20.000
education level.

01:02:20.000 --> 01:02:27.000
Yeah, I think it's like… but I think, like, that study also has been debunked. I think it's, like, N-square, so if your ability is N, then with AI, you are N squared.

01:02:27.000 --> 01:02:29.000
So if you are one, then you stay 1.

01:02:29.000 --> 01:02:32.000
If you are 10, then you… I think that's the problem.

01:02:32.000 --> 01:02:37.000
Uh, and Janice has a problem, has a question?

01:02:37.000 --> 01:02:38.000
Yeah, go ahead, Yanis, yeah.

01:02:38.000 --> 01:02:46.000
Yeah. Yes, actually, my question touches on both the last question on, like, you know, problem finding versus problem solving, but also

01:02:46.000 --> 01:02:49.000
more on the…

01:02:49.000 --> 01:02:51.000
context parsing.

01:02:51.000 --> 01:02:57.000
Now, in the Natural Sciences, I think the natural sciences are better suited

01:02:57.000 --> 01:03:02.000
for the uses of AI that we think of, because we have

01:03:02.000 --> 01:03:09.000
controlled experiments, we are very well-defined set of assumptions, right?

01:03:09.000 --> 01:03:14.000
So, uh…

01:03:14.000 --> 01:03:17.000
So I guess then it's easy to code and have AI help us

01:03:17.000 --> 01:03:23.000
with the computational complexity, or even the linguistic complexity, which is

01:03:23.000 --> 01:03:30.000
that goes into the problem finding. Oh, I haven't thought of that problem, or I haven't thought in that direction in AI.

01:03:30.000 --> 01:03:33.000
can do the computationally intensive part.

01:03:33.000 --> 01:03:39.000
of doing the parsing that we don't have the time or the focus, or the direction.

01:03:39.000 --> 01:03:41.000
to think at.

01:03:41.000 --> 01:03:47.000
So, that's understandable, but, uh, could we also, as physicists,

01:03:47.000 --> 01:03:51.000
um… be… raise more

01:03:51.000 --> 01:04:01.000
loudly, the problem of interpretability,

01:04:01.000 --> 01:04:02.000
Yeah.

01:04:02.000 --> 01:04:03.000
which AI suffers from in the non-natural-based

01:04:03.000 --> 01:04:04.000
Golden Quote sciences.

01:04:04.000 --> 01:04:14.000
And…

01:04:14.000 --> 01:04:15.000
Yeah.

01:04:15.000 --> 01:04:18.000
So, can you speak a little bit about that, which has to do with about context parsing, or assumptions, which are lurking

01:04:18.000 --> 01:04:19.000
Yeah. Yeah.

01:04:19.000 --> 01:04:20.000
very profusely in the non-natural setting versus the natural sciences, and could, could,

01:04:20.000 --> 01:04:28.000
AI be using the natural sciences

01:04:28.000 --> 01:04:29.000
Yeah.

01:04:29.000 --> 01:04:40.000
to kind of, like, reverse engineer. You do an experiment, right? You do experiments, you think you have all your ducks in order,

01:04:40.000 --> 01:04:41.000
Yeah.

01:04:41.000 --> 01:04:42.000
And then you produce some results which you think are correct, but maybe you've missed an assumption. Could you use AI to go backwards and tell you, uh, ah, your assumption is not correct, that's not…

01:04:42.000 --> 01:04:48.000
the right experiment. Along these directions, if you can elaborate.

01:04:48.000 --> 01:04:52.000
Yeah, yeah, I think, you know, yeah, I think…

01:04:52.000 --> 01:04:59.000
You're right, you think about the whole thing that we are doing in science, what the biggest impact is really in the middle part, right? So…

01:04:59.000 --> 01:05:06.000
I think AI are weaker on the… both the beginning and the end. So what I meant by the beginning is the assumption.

01:05:06.000 --> 01:05:09.000
Right, so what assumptions are making sense? Why do you want to ask?

01:05:09.000 --> 01:05:11.000
There's such a question.

01:05:11.000 --> 01:05:14.000
AI can do something, but…

01:05:14.000 --> 01:05:19.000
I don't think it… I think, like, asking it to auto-correct for the assumption.

01:05:19.000 --> 01:05:25.000
is too much for us, but the fact that you can do the middle part extremely quickly,

01:05:25.000 --> 01:05:28.000
Meaning that the human can try, like, many things, like, very quickly.

01:05:28.000 --> 01:05:38.000
Right, so I think, like, it's not changing the beginning, but it's helping, because, like, a lot of the projects that I would not even thinking about doing the calculation because, you know, why would I want to…

01:05:38.000 --> 01:05:47.000
apply for an SF grant for a student, and then wait for the grant who comes in, and then do the calculation, but now we can do that right away. So we can change the assumption.

01:05:47.000 --> 01:05:57.000
But also, the end part is weak, right? I think the end part is the story, like, telling. I think less so about, like, particle physics, but astronomy is a lot of the narrative, right? We…

01:05:57.000 --> 01:06:02.000
describe Galaxy evolution based on these five things that we think are important.

01:06:02.000 --> 01:06:10.000
Not because these 5 things are the most predictive one, it's just that we believe, you know, in the whole context of that knowledge.

01:06:10.000 --> 01:06:12.000
These are the characteristics of the system.

01:06:12.000 --> 01:06:16.000
So, what I see now is that the middle part is

01:06:16.000 --> 01:06:20.000
kind of, like, being obliterated by AI.

01:06:20.000 --> 01:06:22.000
all the calculation, everything.

01:06:22.000 --> 01:06:27.000
But what is really, like, lacking that no one talked about is really the top end.

01:06:27.000 --> 01:06:31.000
the bottom, right? And this is particularly important because

01:06:31.000 --> 01:06:33.000
I think as a community, we need to…

01:06:33.000 --> 01:06:35.000
be, you know,

01:06:35.000 --> 01:06:39.000
not just giving up your data to the company, kind of.

01:06:39.000 --> 01:06:43.000
I think… I think that's the danger, because

01:06:43.000 --> 01:06:49.000
Um, I think the dangers of this AI for science hype is they are changing

01:06:49.000 --> 01:06:55.000
the entire evaluation of the value of a scientist. So there…

01:06:55.000 --> 01:07:00.000
turn something that we thought to be very important, meaning that, you know, we find a very good question,

01:07:00.000 --> 01:07:05.000
We… I always find, though, it's very difficult to find a good question, and very hard to come up with a good

01:07:05.000 --> 01:07:09.000
the narrative. And this is still something that we should really hold…

01:07:09.000 --> 01:07:12.000
hold tight. But…

01:07:12.000 --> 01:07:19.000
the AI for science paper will tell you that they can solve things much faster, and therefore, science is over.

01:07:19.000 --> 01:07:26.000
I didn't think that is true, but they are changing how they see signs, and that is the danger, because, you know, they have

01:07:26.000 --> 01:07:29.000
the power to change,

01:07:29.000 --> 01:07:31.000
the narrative at the…

01:07:31.000 --> 01:07:39.000
upper level, right? If the people who decide on funding, deciding that it's more important to solve 100

01:07:39.000 --> 01:07:44.000
not important that question quickly, then, like, finding someone to…

01:07:44.000 --> 01:07:46.000
find some interesting, like, question.

01:07:46.000 --> 01:07:48.000
then that would be the problem.

01:07:48.000 --> 01:07:56.000
But I also find that maybe AI at some point can also do that. I still haven't seen that. I also do not know how to…

01:07:56.000 --> 01:07:58.000
algorithmically to structure that.

01:07:58.000 --> 01:08:02.000
So, not sure I answered your question, but I feel like

01:08:02.000 --> 01:08:07.000
Yeah, I don't think AI can auto-correct for the assumption, to some extent. They can, but…

01:08:07.000 --> 01:08:11.000
Uh, that we… the beginning part has tool.

01:08:11.000 --> 01:08:18.000
Currently, it's still mostly come from human.

01:08:18.000 --> 01:08:20.000
Can I ask you a question? I continued in that kind of like a flow of thought.

01:08:20.000 --> 01:08:23.000
Yeah. Yeah.

01:08:23.000 --> 01:08:46.000
So… I think you have taught this more than I think anybody else here or like me. So, for example, your in your article that that you just like said that go to read. So.

01:08:46.000 --> 01:08:49.000
Yes, yes, yes.

01:08:49.000 --> 01:08:50.000
Yeah.

01:08:50.000 --> 01:08:55.000
Now that we are getting to this like era that, for example, your article is in nature, and I have this joke, like, oh, if it's in nature, it's most likely wrong, because you go for high instead of so yeah, so again, I think there have been a story that if you go, like.

01:08:55.000 --> 01:08:59.000
Yeah.

01:08:59.000 --> 01:09:09.000
Yeah.

01:09:09.000 --> 01:09:13.000
Yeah. Yeah.

01:09:13.000 --> 01:09:14.000
Yeah.

01:09:14.000 --> 01:09:18.000
and read the nature paper like most of them are like just wrong. It's just because the incentives to publish are more in the hype than being right, right? And so, yeah. So like we are… Starting in an age that do like to do things are becoming cheaper and cheaper, meaning that if you want to analyze some data, understand some code, or even do your own code, now this… the marginal price of this is going to zero pretty much right now. You can do it.

01:09:18.000 --> 01:09:25.000
Yeah.

01:09:25.000 --> 01:09:27.000
Yeah.

01:09:27.000 --> 01:09:28.000
Yeah. Yeah.

01:09:28.000 --> 01:09:31.000
So this is going to start to rise. a lot of, I guess scientists are going to start to publish solo, and then you're going to start to have a lot of noise right? Then the noise is gonna like a lot of things.

01:09:31.000 --> 01:09:32.000
Yeah.

01:09:32.000 --> 01:09:36.000
Papers coming out and then it's gonna be hard to disentangle the noise from the signal.

01:09:36.000 --> 01:09:42.000
Yeah.

01:09:42.000 --> 01:09:49.000
Yeah.

01:09:49.000 --> 01:09:52.000
Yeah.

01:09:52.000 --> 01:09:53.000
Yeah.

01:09:53.000 --> 01:09:54.000
But in a sense, I think it's good, because now also you are kind of like this benchmark that become the goal has now, like, they don't have value anymore. So yeah, so, like, I don't know, like, how are we gonna, like, yeah, like, value, like, what is good science or not, or it's gonna be a taste of each individual science, I don't know.

01:09:54.000 --> 01:10:03.000
So I think the institutional, like, level is really the difficult part. I'm not that worried about AI

01:10:03.000 --> 01:10:09.000
doing the calculation for me, but you are right that I am more worried about

01:10:09.000 --> 01:10:13.000
the… it will change, like, the nature of…

01:10:13.000 --> 01:10:20.000
of what we do. And I don't think most of the institutions are ready to respond to that.

01:10:20.000 --> 01:10:22.000
Right? So…

01:10:22.000 --> 01:10:26.000
I'm not sure even, like, you know, how do you even… as you say,

01:10:26.000 --> 01:10:28.000
If I decide to write

01:10:28.000 --> 01:10:32.000
the level of a undergraduate, like, paper, that I wrote.

01:10:32.000 --> 01:10:35.000
I can probably write once.

01:10:35.000 --> 01:10:39.000
In two days, right? So, each year I can write, like,

01:10:39.000 --> 01:10:43.000
180 first-order paper.

01:10:43.000 --> 01:10:52.000
that are feeling like isochron to something, right? And that is not what we want, but if you really want to just say the metric, then… I think, you know, that is…

01:10:52.000 --> 01:10:58.000
where I think a conversation like this is important, like, I think the community need to know

01:10:58.000 --> 01:11:02.000
What is the valuable and what is not valuable?

01:11:02.000 --> 01:11:09.000
Now, whether or not how to solve that, I do not know, just because we will likely to be swarmed by neighbors.

01:11:09.000 --> 01:11:17.000
the… I should say, the symbol, I think, is that, you know, when that happened, and that will happen, is that, like, people have to come up with something, because

01:11:17.000 --> 01:11:20.000
What else can you do? You cannot read all the paper, and all the papers are, like, basically…

01:11:20.000 --> 01:11:26.000
That will finally get rid of these stupid metrics, like citation counts and age indices. Finally,

01:11:26.000 --> 01:11:27.000
Yeah, yeah, it's kind of like you…

01:11:27.000 --> 01:11:28.000
So, those metrics will die.

01:11:28.000 --> 01:11:29.000
Yes, yes, exactly, that's what I went.

01:11:29.000 --> 01:11:36.000
I think that's the best outcome, but I'm not sure, you know, I think to get to the outcome, you need to do some work. It's not like you tosh…

01:11:36.000 --> 01:11:45.000
the whole thing, and then say, oh, you know, good thing will happen. I think, you know, at least, like, it's easy to say, wow, no, of course, we have to do something else, because what else?

01:11:45.000 --> 01:11:46.000
Mm-hmm.

01:11:46.000 --> 01:11:52.000
But I don't think that we'll naturally happens. I think it's easier in the scientific community, but

01:11:52.000 --> 01:11:54.000
If you go outside the community, you know, there's…

01:11:54.000 --> 01:11:57.000
It's just… people don't know what we are doing.

01:11:57.000 --> 01:12:01.000
And if you look at, yeah,

01:12:01.000 --> 01:12:04.000
is difficult, yeah.

01:12:04.000 --> 01:12:12.000
Uh, so I find… so this is why I find I spend more time on the… the epistemic level, because I feel like

01:12:12.000 --> 01:12:20.000
I don't always have good ideas, or the idea can wait, but I think the institutional, like, implication

01:12:20.000 --> 01:12:26.000
Okay, not a week. This is happening, and we need to know what to do.

01:12:26.000 --> 01:12:30.000
Thanks.

01:12:30.000 --> 01:12:39.000
Okay, yeah, well, thank you very much, Jensen, for the very stimulating talk, and I think we can all give you another round of applause.

01:12:39.000 --> 01:12:40.000
So, um…

01:12:40.000 --> 01:12:42.000
Thank you.

01:12:42.000 --> 01:12:47.000
Uh, yeah, we could… we can probably see about setting up some kind of virtual visit.

01:12:47.000 --> 01:12:48.000
Yeah, yeah, I, yeah, uh…

01:12:48.000 --> 01:12:51.000
Um… so yeah, let's be in touch about that.

01:12:51.000 --> 01:12:52.000
Yeah, for sure.

01:12:52.000 --> 01:12:54.000
Yeah