Paper2video: Automatic video generation from scientific papers

hirenj · 2025-10-12T09:07:12 1760260032

This is great - now I can get the authentic conference experience of a disengaged speaker reading out the slides in a monotone, without all the hassle of international travel and scheduling.

In all seriousness, there could be more utility in this if it helped explain the figures. I jumped ahead to one of the figures in the example video, and no real attention was given to it. In my experience, this is really where presentations live and die, in the clear presentation of datapoints, adding sufficient detail that you bring people along.

IanCal · 2025-10-12T16:06:18 1760285178

If it doesn’t cram text at a tiny point size and introduce a slide with “you can’t see this but” then it’s likely better than the majority of scientific presentations I’ve seen.

netsharc · 2025-10-12T10:11:52 1760263912

There's porn site (is it even porn if it's just nudity) which niche is women reading the news while taking off their clothes.

For papers, it doesn't have to go that far, but I imagine a polished AI girl (or guy) reading the summary would be more engaging.

Hah, "SteveGPT, present your PowerPoints like Steve Jobs did!"

a99c43f2d565504 · 2025-10-12T11:22:25 1760268145

Besides just porn or nudity, maybe we could also add violence into the arsenal of engagement. For example, maybe the viewer could use a virtual sword or shotgun on some key concepts in the presentation to initiate a tangent going on a deep dive on the concept, and then come back to the presentation once done with the rabbit hole.

anarticle · 2025-10-12T14:14:03 1760278443

Feels like the theme of Videodrome coming back: https://www.youtube.com/watch?v=RxXkIGVwgB4

Add sex and violence to your boring paper reading sessions more exciting!

mtillman · 2025-10-12T15:04:07 1760281447

I was just thinking about this movie on Friday while at a concert. Lorna Shore, awesome show. Anyways, the person in front of me was watching an overweight person (purpose of the niche I suspect which is why I mention it) do their daily chore routine (laundry, cleaning, etc) on tiktok. After the video was finished, my fellow concert attendee quickly went to Amazon and purchased the iron in the video. No links clicked, just serious chore fomo leading to a purchase. All while standing 3 feet from a circle pit/wall of death/etc while Lorna Shore was playing 20 ft from their face.

rft · 2025-10-12T11:37:38 1760269058

A VR interactive thesis defense/sword fighting crossover game sounds just weird enough to work. Maybe base it on the fight mechanics of Until You Fall [1], we could call it "Until You Graduate" (I will see myself out for that one) or "Thesis Offense" [2].

[1] https://store.steampowered.com/app/858260/Until_You_Fall/

[2] https://xkcd.com/1403/

sebastiennight · 2025-10-12T15:24:38 1760282678

Upon first reading I thought you were suggesting a "polish" AI presenter for a second...

fsh · 2025-10-12T08:01:23 1760256083

The samples from the authors' GitHub are just some text vomited onto slides, and the AI voice reading them point by point. Exactly the opposite of a good presentation.

mattjenner · 2025-10-12T08:45:49 1760258749

This might likely develop faster than your typical researcher's presentation skills. It could also increase access more generally. Science communication is a skill, plus an interested reader's ability to get to a conference (or watch the recordings) is limited. If this expands access to science, I'm for it.

(and I generally think AI-produced content is slop).

davidsainez · 2025-10-12T09:14:48 1760260488

IMO this seems like exactly the use cases where AI fails consistently: engaging storytelling and finding the simplest solution to a problem. For example, LLMs are really good at generating walls of code that will run but don't really have good taste in architecting a solution. When I use them for coding I will spend time thinking of a good high-level approach and then use LLMs to fill in the more boilerplate style code

ninesnines · 2025-10-12T07:05:01 1760252701

Ah I guess if you’re very bad at presentations, then this could be beneficial. However, scientific presentations are meant to be communicating science and making things stick to your audience (no matter if it’s scientists or children you’re presenting to). This does not fix that problem at all. For anyone thinking of using this: please watch: https://m.youtube.com/watch?v=Unzc731iCUY and maybe a talk from Jane Goodall on how to engagingly show your science. I would hate to see a lot of conference presentations be made with this generator.

Another thing that improved my personal presentation skills was noting down why I liked a presentation or why I didn’t - what specific things a person did to make it engaging. Just paying attention to that improved my presentation skills enormously

rhl314 · 2025-10-12T14:02:37 1760277757

Shameless plug: I have been working on a tool that lets you create whiteboard explainers.

It also works with research papers.

Here is an explainer of the famous Attention is all you need paper https://www.youtube.com/watch?v=7x_jIK3kqfA

(You can try it here https://magnetron.ai)

alfonsodev · 2025-10-12T16:42:45 1760287365

wow! you are almost there, if you made a version that was only drawings, or drawings first titles later, would be awesome, right now titles take too long to write a title, making the filling and meanwhile the pace is lost with the narration, then it makes a cool drawing super fast, so it feels like with a bit of tweaking in the pace you'll be able to get an outstanding result.

Congratulations on this cool idea and results.

Where can I follow the progress or get notified ?

rhl314 · 2025-10-12T21:04:25 1760303065

Thanks for the feedback. Working on the making the video and narration sync better.

> Where can I follow the progress or get notified ?

I send out product updates once a week or so. Will keep you posted.

sebastiennight · 2025-10-12T07:19:31 1760253571

Very interesting project, and I found two things particularly smart and well executed in the demo:

1. Using a "painter commenter" feedback loop to make sure the slides are correctly laid out with no overflowing or overlapping elements.

2. Having the audio/subtitles not read word-for-word the detailed contents that are added to the slides, but instead rewording that content to flow more naturally and be closer to how a human presenter would cover the slide.

A couple of things might possibly be improved in the prompts for the reasoning features, eg. in `answer_question_from_image.yaml`:

  1. Study the poster image along with the "questions" provided.
  2. For each question:
     • Decide if the poster clearly supports one of the four options (A, B, C, or D). If so, pick that answer.
     • Otherwise, if the poster does not have adequate information, use "NA" for the answer.
  3. Provide a brief reference indicating where in the poster you found the answer. If no reference is available (i.e., your answer is "NA"), use "NA" for the reference too.
  4. Format your output strictly as a JSON object with this pattern:
     {
       "Question 1": {
         "answer": "X",
         "reference": "some reference or 'NA'"
       },
       "Question 2": {
         "answer": "X",
         "reference": "some reference or 'NA'"
       },
       ...
     }

I'd assume you would likely get better results by asking for the reference first, and then the answer, otherwise you probably have quite a number of answers where the model just "knows" the answer and takes from its own training rather than from the image, which would bias the benchmark.

ks2048 · 2025-10-12T16:25:54 1760286354

While the TTS sounds very good, it is interesting how some subtle prosody issues make it sound very unnatural.

example: Geoff Hinton saying "Forward-forward Algorithm" with a long pause after the first "forward".

(first few seconds in the first demo on https://showlab.github.io/Paper2Video/)

ks2048 · 2025-10-12T02:41:29 1760236889

Project page (links to both github and arxiv): https://showlab.github.io/Paper2Video/

tummler · 2025-10-12T15:32:20 1760283140

At last, they've come for Two Minute Papers.

anothernewdude · 2025-10-12T06:36:46 1760251006

This is the opposite of what I want. I'd rather turn videos into articles.

Lerc · 2025-10-12T07:35:53 1760254553

People a different, I would prefer paper to video, but this iimplentation is not yet sufficient for what I would use. But as Doctorcarolorangyfaheer says maybe a few more papers down the line

tobwen · 2025-10-12T09:49:00 1760262540

Hrhr, I'd love to have automatic CODE generation from Scientic Papers :D

anarticle · 2025-10-12T14:17:30 1760278650

You're in luck! Paper2Agent + Paper2Code do just that: https://arxiv.org/abs/2504.17192 https://arxiv.org/abs/2509.06917

progbits · 2025-10-12T10:00:04 1760263204

Damn, they automated Károly Zsolnai-Fehér