Before making AI do research, perhaps we should first let it __reproduce__ resea...

ErigmolCt · 2025-06-04T07:28:50 1749022130

Reproducibility is the baseline. Until models can consistently read, understand, and implement existing research correctly, "AI scientist" talk is mostly just branding.

slewis · 2025-06-04T00:04:35 1748995475

OpenAI created a benchmark for this: https://openai.com/index/paperbench/

suddenlybananas · 2025-06-04T06:38:29 1749019109

Still has data contamination though.

Szpadel · 2025-06-04T13:35:59 1749044159

still LLM cannot beat it so it's good enough for start

patagurbon · 2025-06-03T23:23:06 1748992986

You would have to have a very complete audit trail for the LLM and ensure the paper shows up nowhere in the dataset.

We have rare but not unheard of issues with academic fraud. LLMs fake data and lie at the drop of a hat

TeMPOraL · 2025-06-03T23:44:18 1748994258

> You would have to have a very complete audit trail for the LLM and ensure the paper shows up nowhere in the dataset.

We can do both known and novel reproductions. Like with both LLM training process and human learning, it's valuable to take it in two broad steps:

1) Internalize fully-worked examples, then learn to reproduce them from memory;

2) Train on solving problems for which you know the results but have to work out intermediate steps yourself (looking at the solution before solving the task)

And eventually:

3) Train on solving problems you don't know the answer to, have your solution evaluated by a teacher/judge (that knows the actual answers).

Even parroting existing papers is very valuable, especially early on, when the model is learning how papers and research looks like.

6stringmerc · 2025-06-04T12:55:32 1749041732

…because there are no consequences for AI. Humans understand shame, pain, and punishment. Until AI models develop this conditional reasoning as part of their process, to me, they’re grossly overestimated in capability and reliability.

ojosilva · 2025-06-03T22:49:21 1748990961

I thought you were going to say "give AI the first part of a paper (prompt) and let it finish it (completion)" as a validation AI can produce science at par with research results. Before it can do that, I have no hope that it can produce novel ideas.

DrScientist · 2025-06-04T11:30:14 1749036614

I once had a university assignment where they provided the figures from a paper and we had to write the paper around the just the given figures.

A bit like how you might write a paper yourself - starting with the data.

As it turned out I thought the figures looked like data that might be from a paper referenced in a different lecturers set of lectures ( just on the conclusion, he hadn't shown the figures ) - so I went down the library ( this is in the days of non-digitized content - you had to physically walk the stacks ) and looked it up - found the original paper and then a follow up paper by the same authors....

I like to think I was just doing my background research properly.

I told a friend about the paper and before you know it the whole class knew - and I had to admit to the lecturer that I'd found the original paper when he wondered why the whole class had done so well.

Obviously this would be trivial today with an electronic search.

bee_rider · 2025-06-04T00:29:53 1748996993

I guess it would also need the experimental data. It would, I guess, also need some ability to do little experiments and write off those ideas as not worth following up on…

tbrownaw · 2025-06-04T00:13:31 1748996011

> For example, give it a paper of some deep learning technique and make it produce an implementation of that paper.

Or maybe give it a paper full of statistics about some experimental observations, and have it reproduce the raw data?

bee_rider · 2025-06-04T00:27:48 1748996868

Like, have the AI do the experiment? That could be interesting. Although I guess it would be limited to experiments that could be done on a computer.

YossarianFrPrez · 2025-06-03T22:40:55 1748990455

Seconded, as not only is this an interesting idea, it might also help solve the issue of checking for reproducibility. Yet even then human evaluators would need to go over the AI-reproduced research with a fine-toothed comb.

Practically speaking, I think there are roles for current LLMs in research. One is in the peer review process. LLMs can assist in evaluating the data-processing code used by scientists. Another is for brainstorming and the first pass at lit reviews.

darkoob12 · 2025-06-04T07:55:05 1749023705

Reproduciblity was never a serious issue in AI research community. I think one of the main reasons for explosive progress in AI was the open community and people could easily reproduce other people's research. If you look at top tier conferences you see that they share everything paper, latex, code, data, lecture video etc.

After ChatGPT big cooperations stopped sharing their main research but it still happens at academia.

mnky9800n · 2025-06-04T08:16:30 1749024990

I think what I would rather like to see is the reproduction of results from experiments that the AI didn't see but are well known. Not reproducing AI papers. For example, assuming a human can build it, would an AI, not knowing anything except what was known at the time, be able to design the millikan oil drop experiment? Or would it be able to design an Taylor-Coutte setup for exploring turbulence? Would it be able to design a linear particle accelerator or a triaxial compression experiment? I think an interesting line of reasoning would be to restrict the training data to what was known before a seminal paper was produced. Like take Lorenz atmospheric circulation paper, train an AI on only data that comes from before that paper was published. Does the AI produce the same equations in the paper and the same description of chaos that Lorenz arrived at?

raxxorraxor · 2025-06-04T10:59:02 1749034742

You probably have to fight quite a few battles because many people know their papers aren't reproducible. More politics than science really.

It would be the biggest boon to science since sci-hub though.

And since a large set of studies won't be reproducible, you need human supervision as well, at least at first.

thrance · 2025-06-04T00:38:18 1748997498

Side note: I wonder why it's not normalized for more papers to come with a reference implementation. Wouldn't have to be efficient, or even be easily runnable. Could be a link to a repository with a few python scripts.

mike_hearn · 2025-06-04T07:35:13 1749022513

Sometimes papers do. Like everything with academia, there's no consistency and it varies mostly by field. It's especially common in CS and less common in other fields.

The main reason people don't do it is because incentives are everything, and university/government management set bad incentives. The article points this out too. They judge academics entirely by some function of paper citations, so academics are incentivized to do the least possible work to maximize that metric. There's no positive incentive to publish more than necessary, and doing so can be risky because people might find flaws in your work by checking it. So a lot of researchers hide their raw data or code for as long as possible. They know this is wrong and will typically claim they'll publish it but there's a lot of foot dragging, and whatever gets released might not be what they used to make the paper.

In the commercial world the incentives are obviously different, but the outcomes are the same. Sometimes companies want the ideas to be used as they compliment the core business, other times the ideas need to be protected to be turned into a core business. People like to think academics and industrial research are very different but everyone is optimizing for some metric, whether they like it or not.

thaumasiotes · 2025-06-04T03:05:13 1749006313

> Before it can do that, I have no hope that it can produce novel ideas.

Producing novel ideas is the most famous trait of current LLMs, the thing people are spending all their time trying to prevent.

Bassilisk · 2025-06-04T06:56:15 1749020175

> Producing novel ideas is the most famous trait of current LLMs

Could you please explain what you mean or give a simple example?

SkyBelow · 2025-06-04T16:11:21 1749053481

I think they were speaking to hallucinations. That is the case of a novel idea, often one that even sounds pretty plausible to a casual observer, but which isn't useful (arguably worse than simply being useless given it can trick people) and one connected only in a superficial way (why it fools the casual observer but the expert realizes it as a hallucination).