Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Consistency over multiple minutes and it runs in real time at 720p? I did not expect world models to be this good yet.

> Genie 3’s consistency is an emergent capability

So this just happened from scaling the model, rather than being a consequence of deliberate architecture changes?

Edit: here is some commentary on limitations from someone who tried it: https://x.com/tejasdkulkarni/status/1952737669894574264

> - Physics is still hard and there are obvious failure cases when I tried the classical intuitive physics experiments from psychology (tower of blocks).

> - Social and multi-agent interactions are tricky to handle. 1vs1 combat games do not work

> - Long instruction following and simple combinatorial game logic fails (e.g. collect some points / keys etc, go to the door, unlock and so on)

> - Action space is limited

> - It is far from being a real game engines and has a long way to go but this is a clear glimpse into the future.

Even with these limitations, this is still bonkers. It suggests to me that world models may have a bigger part to play in robotics and real world AI than I realized. Future robots may learn in their dreams...





Gaming is certainly a use case, but I think this is primarily coming as synthetic data generation for Google's robots training in warehouses:

https://www.theguardian.com/technology/2025/aug/05/google-st...

Gemini Robot launch 4 mo ago:

https://news.ycombinator.com/item?id=43344082


This also seems pretty valuable to create CGI cut scenes...

I similarly am surprised at how fast they are progressing. I wrote this piece a few months ago about how I think steering world model output is the next realm of AAA gaming:

https://kylekukshtel.com/diffusion-aaa-gamedev-doom-minecraf...

But even when I wrote that I thought things were still a few years out. I facetiously said that Rockstar would be nerd-sniped on GTA6 by a world model, which sounded crazy a few months ago. But seeing the progress already made since GameNGen and knowing GTA6 is still a year away... maybe it will actually happen.


> Rockstar would be nerd-sniped on GTA6 by a world model

I'm having trouble parsing your meaning here.

GTA isn't really a "drive on the street simulator", is it? There is deliberate creative and artistic vision that makes the series so enjoyable to play even decades after release, despite the graphics quality becoming more dated every year by AAA standards.

Are you saying someone would "vibe model" a GTAish clone with modern graphics that would overtake the actual GTA6 in popularity? That seems extremely unlikely to me.


I don't _really_ mean it obviously but I think a key component of what makes something like GTA compelling is that fully modeled world you move around in. These things take what amounts to hundreds if not thousands of man years to create "traditionally", and the fact someone can now prompt to life a city or any other environment with simiailr (or better) fidelity is a massive change in how we think about creative content production.

GTA6 will not actually be nerd-sniped, but it's easy to see how a lot of what makes the game defensible is being rapidly commoditized.


GTA VI's story mode won't be surpassed by a world model, but the fucking around and blowing things up part conceivably could, and that's how people are spending their time in GTA. I don't see a world model providing the framing needed to contextualize the mayhem, thereby making it fun, anytime soon myself, but down the line? Maybe.

I would envision someone creative working together with Gen AI being able to do something awesome, rather than just saying "make me a GTA clone".

But someone creative having a vision in their head and then just guiding AI to flesh out the assets, details, etc.


They will then learn the bitter lesson that convincing the GenAI to create something that brings your vision to life is impossible. It's a real talent to even be able to define for yourself what your vision is, and then to have artists achieve it visually in any medium is a process of back and forth between people with their own interpretations evolving the idea into something even better and cohesive.

GenAI will never get there because it can't, by design. It can riff on what was, and it can please the prompter, but it cannot challenge anyone creatively. No current LLM's can, either. I'll eat my hat if this is wrong in ten years, but it won't be.

It will generate refined slop ad nauseam, and that will train people's brains into spotting said slop faster using less energy. And then it'll be shunned.


You can just specify to a high degree and it will follow your instructions. You’re coping bro.

bro, how you could get the very precise and predictable editing bro that you have in a regular game engine bro. also bro, empty pretty world with nothing to do bro is lame bro

Probably depends on how you engage with GTA. “Drive on the street simulator” along with arrays of weapons and explosions is the majority of my hours in GTA.

I despise the creative and artistic vision of GTA online, but I’m clearly in a minority there gauging by how much money they’ve made off it.


I took the "creative and artistic vision" line to refer to the story mode.

I assumed the opposite because I haven't heard about GTA's story in ages, but could be a sampling bias. It's hand-wavy, but last I recall most of the microtransactions didn't show up in single player (like if you bought a car, you couldn't use it in single player) so the people spending money on it are doing it for online, not the story.

I didn't think the story was earth-shattering; it was fine, but no Baldur's Gate.

Edit: In retrospect, the characters were fairly iconic. I still distinctly remember Trevor.


The future of games was MMORPGs and RPG-ization in general as other genres adopted progression systems. But the former two are simply too expensive and risky even today for AAA to develop. Which brings us to another point, the problem with Western AAA is more about high levels of risk aversion, which is what's really feeding the lack of imaginative. And that's more to do with the economics of opportunity cost to the S&P 500.

Anyways, crafting pretty looking worlds is one thing, but you still need to fill them in with something worth doing, and that's something we haven't really figured out. That's one of the reasons why the sandbox MMORPG was developed as opposed to "themeparks". The underlying systems, the backend is the real meat here. At most with the world models right now is that you're replacing 3d artists and animators, but I would not say that is a real bottleneck in relation to one's own limitations.


> Which brings us to another point, the problem with Western AAA is more about high levels of risk aversion, which is what's really feeding the lack of imaginative.

Maybe I’m misinterpreting what you’re saying here, but 2021 til present has been a glut of some of the best titles ever made, by pretty much any measure


You may be interested in reading something I wrote a while ago that's pretty dead on to what you're talking about there:

https://kylekukshtel.com/game-design-mimetics


Great read and insights! I learned a few things and found myself nodding along most of the article.

I'm trying to wrap my head around this since we're still seeing text spit out slowly ( I mean slowly as in 1000's of tokens a second)

I'm starting to think some of the names behind LLMs/GenAI are cover names for aliens and any actual humans involved have signed an NDA that comes with millions of dollars and a death warrant if disobeyed.


Lizard people?

So this just happened from scaling the model

Unbelievable. How is this not a miracle? So we're just stumbling onto breakthroughs?


Is it actually unbelievable?

It's basically what every major AI lab head is saying from the start. It's the peanut gallery that keeps saying they are lying to get funding.


Even as a layman and AI skeptic, to me this entirely matches my expectations, and something like this seemed like it was basically inevitable as of the first demos of video rendering responding to user input (a year ago? maybe?).

Not to detract from what has been done here in any way, but it all seems entirely consistent with the types of progress we have seen.

It's also no surprise to me that it's from Google, who I suspect is better situated than any of its AI competitors, even if it is sometimes slow to show progress publicly.


https://worldmodels.github.io/

I think this was the first mention of world models I've seen circa 2018.

This is based on VAEs though.


Google seems to have had the keys to changing the world years ago and decided not to.

Hard to fault them as the process towards ASI now appears to be runaway and uncontrollable.


>It's basically what every major AI lab head is saying from the start.

I suppose it depends what you count as "the start". The idea of AI as a real research project has been around since at least the 1950s. And I'm not a programmer or computer scientist, but I'm a philosophy nerd and I know debates about what computers can or can't do started around then. One side of the debate was that it awaited new conceptual and architectural breakthroughs.

I also think you can look at, say, Ted Talks on the topic, with guys like Jeff Hawkins presenting the problem as one of searching for conceptual breakthroughs, and I think similar ideas of such a search have been at the center of Douglas Hofstadter's career.

I think in all those cases, they would have treated "more is different" like an absence of nuance, because there was supposed to be a puzzle to solve (and in a sense there is, and there has been, in terms of vector space and back propagation and so on, but it wasn't necessarily clear that physics could "pop out" emergently from such a foundation).


When they say "the start", I think they mean the start of the current LLM era (circa 2017). The main story of this time has been a rejection of the idea that major conceptual breakthroughs and complex architectures are needed to achieve intelligence. Instead, it's better to focus on simple, general-purpose methods that can scale to massive amounts of data and compute (i.e. the Bitter Lesson [1]).

[1] http://www.incompleteideas.net/IncIdeas/BitterLesson.html


Oof ... to call other people's decades of research into directed machine learning "a colossal waste of researcher's time" is indeed a rather toxic point of view unsurprisingly causing a bitter reaction in scientists/researchers.

Even if his broader point might be valid (about the most fruitful directions in ML), calling something a "bitter lesson" while insulting a whole field of science is ... something.

Also as someone involved in early RL, he should know better.


The start of deep neural networks, ie AlexNet

It's akin to us sending a rocket to space and immediately discovering a wormhole. Sure, there's a lot of science about what's out there, but to discover all this in our first few trips to orbit ...

Joscha Bach postulates that what we call consciousness must be something rather simple, an emergent property present in all sufficiently complex biological organisms.

We don't inherit any software, so cognitive function must bootstrap itself from it's underlying structure alone.

https://media.ccc.de/v/38c3-self-models-of-loving-grace


   > We don't inherit any software
I wonder, though. Many animal species just "know" how to perform certain complex actions without being taught the way humans have to be taught. Building a nest, for example.

If you say that this is emergent from the "underlying structure alone", doesn't this mean that it would still be "inherited" software (though in this case, maybe we think of it like punch cards).


We inherit ~2GB of digital data as DNA. Quite how that turns into nest building how tos is not yet known but it must happen somehow.

I’ve seen different figures for information content of DNA but they’re all mostly misleading. What we actually inherit is much more. We are the result of an unpacking algorithm starting from a single cell over time, so our information content should at the very least include the entirety of the cell (which is probably impossible to calculate). Additionally, in a more general sense, arbitrarily complex behavior can be derived from very simple mathematics, e.g. cellular automata. With sufficient complex dynamics (which for us are given by the laws of physics), even very small information changes lead to vastly different “emergent behavior”, whatever that means. One could improperly say that part of the information is included in the laws of physics itself.

A biological example that I like: the neural structures for vision develop almost fully formed from the very beginning. The state of our network at initialization is effectively already functional. I’m not sure to which extent this is true for humans, but it is certainly true for simpler organisms like flies. The way cells achieve this is through some extremely simple growth rules as the structure is being formed for the first time. Different kinds of cells behave almost independently of each other, and it just so happens that the final structure is a perfectly functional eye. I’ve seen animations of this during a conference talk and it was one of the most fascinating things I’ve ever seen. It truly shows how the complexity of a biological organism is just billions of times any human technology. And at the same time, it’s a beautiful illustration of the lack of intelligent design. It’s like watching a Lego assemble by just shaking the pieces.


Problems like this will turn out to have simple solutions. Once we get past the idea of "inherited instinct" (obvious nonsense and easily proved to be so) the solution will be easier to see.

An example that might be useful: dragonflies lay their eggs in water. Since a dragonfly has like a 4-bit CPU you might be amazed at how it manages to get all the processing required to identify a body of water from a distance into its tiny mind, and also marvel at what sort of JPEG+++ encoding must be used to convey what water looks like from generation to generation.

But they don't do that at all: instead they have eyes that are sensitive to polarized light. The surface of water polarizes reflected light. So do things like polished gravestones. So dragonflies will lay their eggs on gravestones too.

One I like to ponder is: beavers building damns. Do they have an encoded algorithm that knows that they need to damn the river to have a place to live, by gnawing on trees, carrying them to the right place on the river bed, etc? Nope, certainly they don't have that. Perhaps they have teeth that grow so long that they hurt, motivating the animal to gnaw on something solid to wear them down. The only solid thing they have available is a tree.


A similar phenomenon was demonstrated with deep neural networks nearly a decade ago. You optimize the architecture using randomized weights instead of optimizing the weights. You can still optimize the weights in a separate additional step to improve performance.

That's interesting indeed - or take spiders building nets. So there must be some 'microcode' that does get inherited like physical features.

But then you have things like language or societal customs that are purely 'software'.


I’ve always said that animals have short term and long term memory via the hippocampus, and then there’s supragenerational memory stored in DNA - behaviors that are learned over many generations and passed down via genetics.

The emergent property theory seems logical, but I'm also partial to the quantum-tunneling-miasma theory which basically posits that there could be something fairly complex going on, and we just lack the ability to observe/measure it in our current physics. (Although I have difficulty coherently separating this theory from faith-based beliefs)

>We don't inherit any software, so cognitive function must bootstrap itself from it's underlying structure alone.

Hardware and software, as metaphors applied to biology, I think are better understood as a continuum than a binary, and if we don't inherit any software (is that true?), we at least inherit assembly code.


> we don't inherit any software (is that true?), we at least inherit assembly code

To stay with the metaphor, DNA could be rather understood as firmware that runs on the cell. What I mean with software is the 'mind' that runs on a collection of cells. Things like language, thoughts and ideas.

There is also a second level of software that runs not on a single mind alone, but collection of minds, to form cliques or a societies. But this is not encoded in genes, but in memes.


I think we have some notion of a proto-grammar or ability to linguistically conceptualize, probably at the level of some primordial conceptual units that are more fundamental than language, thoughts and ideas in the concrete forms we generally understand them to have.

I think it's like Chomsky said, that we don't learn this infrastructure for understanding language any more than a bird "learns" their feathers. But I might be losing track of what you're suggesting is software in the metaphor. I think I'm broadly on board with your characterization of DNA, the mind and memes generally though.


At the most fundamental level, is it even linguistic? Would Tarzan speak at all?

Children (who aren't alone) will invent languages to communicate between each other, see Nicaraguan Sign Language.

Don't know who this Bach dude is, but I've been postulating the same thing since the early 1980s. Only to my friends in the pub, but still..

> We don't inherit any software

How do you claim to know this?


Lemme start by saying this is objectively amazing. But I just really wouldn't call it a breakthrough.

We had one breakthrough a couple of years ago with GPT-3, where we found that neural networks / transformers + scale does wonders. Everything else has been a smooth continuous improvement. Compare today's announcement to Genie-2[1] release less than 1 year ago.

The speed is insane, but not surprising if you put in context on how fast AI is advancing. Again, nothing _new_. Just absurdly fast continuous progress.

[1] - https://deepmind.google/discover/blog/genie-2-a-large-scale-...


Wasn't the model winning gold in IMO result of a breakthrough? I doubt an stochastic parrot can solve math at IMO level...

Why wouldn't it? I still have to hear one convincing argument how our brain isn't working as a function of probable next best actions. When you look at amoebas work, and animals that are somewhere between them and us in intelligence, and then us, it is a very similar kind of progression we see with current LLMs, from almost no state of the world, to a pretty solid one.

As far as we know, it was "just" scale on depth (model capability) and breadth (multiple agents working at the same time).

There are a lot of "interesting" emergent behaviors that happen just a result of scaling.

Kind of like how a single neuron doesn't do much, but connect 100 billion of them and well...


becoming really, really hard to refute the Simulation Theory

Bitter lesson strikes again!

_Especially_ given the goal of a world model using a rasters-only frame-by-frame approach. Holy shit.

It is truly remarkable. Even if you expected this to happen eventually, like I did, it seems like your assumptions are 2-10x of the time it’s needed to progress.

It makes me think that Stargate might actually lead to AGI/ASI


> Future robots may learn in their dreams...

So prescient. I definitely think this will be a thing in the near future ~12-18 months time horizon


I may be wrong, but this seems to make no sense.

A neural net can produce information outside of its original data set, but it is all and directly derived from that initial set. There are fundamental information constraints here. You cannot use a neural net to itself generate from its existing data set wholly new and original full quality training data for itself.

You can use a neural net to generate data, and you can train a net on that data, but you'll end up with something which is no good.


Humans are dependent on their input data (through lifetime learning and, perhaps, information encoded in the brain from evolution), and yet they can produce out of distribution information. How?

There is an uncountably large number of models that perfectly replicate the data they're trained on; some generalize out of distribution much better. Something like dreaming might be a form of regularization: experimenting with simpler structures that perform equally well on training data but generalize better (e.g. by discovering simple algorithms that reproduce the data equally well as pure memorization but require simpler neural circuits than the memorizing circuits).

Once you have those better generalizing circuits, you can generate data that not only matches the input data in quality but potentially exceeds it, if the priors built into the learning algorithm match the real world.


Humans produce out-of-distribution data all the time, yet if you had a teacher making up facts and teaching them to your kids, you would probably complain.

Humans also sometimes hallucinate and produce non-sequitors.

Maybe you do, but people don't "hallucinate". Lying or being mistaken is a very different thing.

Computers aren't humans.

We have truly reached peak hackernews here.


I might be misunderstanding your comment so sorry if so. Robots have sensors and RL is a thing, they can collect real world data and then processing and consolidating real world experiences during downtime (or in real time), running simulations to prepare for scenarios, and updating models based on the day's collected data. The way I saw it that I thought was impressive was the robot understood the scene, but didn't know how the scene would respond to it's actions, so it gens videos of the possible scenarios, and then picks the best ones and models it's actuation based on it's "imagination".

This is definitely one of the potential issues that might happen to embodied agents/robots/bodies trained on the "world model". As we are training a model for the real world based on a model that simulates the real world, the glitches in the world simulator model will be incorporated into the training. There will be edge cases due to this layered "overtraining", where a robot/agent/body will expect Y to happen but X will happen, causing unpredictable behaviour.I assume that a generic world agent will be able to autocorrect, but this could also lead to dangerous issues.

I.e. if the simulation has enough videos of firefighters breaking glass where it seems to drop instantaneously and in the world sim it always breaks, a firefighter robot might get into a problem when confronted with unbreakable glass, as it expects it to break as always, leading to a loop of trying to shatter the glass instead of performing another action.


The benefit of these AI-generated simulation models as a training mechanism is that it helps add robustness without requiring a large training set. The recombinations can generate wider areas of the space to explore and learn with but using a smaller basis space.

To pick an almost trivial example, let's say OCR digit recognition. You'll train on the original data-set, but also on information-preserving skews and other transforms of that data set to add robustness (stretched numbers, rotated numbers, etc.). The core operation here is taking a smallset in some space (original training data) and producing some bigset in that same space (generated training data).

For simple things like digit recognition, we can imagine a lot of transforms as simple algorithms, but one can consider more complex problems and realize that an ML model would be able to do a good job of learning how to generate bigset candidates from the smallset.


We are miles away from the fundamental constraint. We know that our current training methodologies are scandalously data inefficient compared to human/animal brains. Augmenting observations with dreams has long been theorized to be (part of) the answer.

> current training methodologies are scandalously data inefficient compared to human/animal brains

Are you sure? I've been ingesting boatloads of high definition multi-sensory real-time data for quite a few decades now, and I hardly remember any of it. Perhaps the average quality/diversity of LLM training data has been higher, but they sure remember a hell of a lot more of it than I ever could.


It is possible - for example, getting a blob of physics data, fitting a curve then projecting the curve to theorise what would happen in new unseen situations. The information constraints don't limit the ability to generate new data in a specific domain from a small sample; indeed it might be possible to fully comprehend the domain if there is an underlying process it can infer. It is impossible to come up with wildly unrelated domains though.

Approximately speaking, you have a world model and an agent model. You continue to train the world model using data collected by the robot day-to-day. The robot "dreams" by running the agent model against the world model instead of moving around in the real world. Dreaming for thousands of (simulated) hours is much more efficient than actually running the physical hardware for thousands of wall clock hours.

I actually think you can.

The LLM has plenty of experts and approaches etc.

Give it tool access let it formulate it's own experiments etc.

The only question here is if it becomes a / the singularity because of this, gets stuck in some local minimum or achieves random perfection and random local minimum locations.


Humans can learn from visualising situations and thinking through different scenarios. I don't see why AI / robots can't do similar. In fact I think quite a lot of training for things like Tesla self driving is done in simulation.

It's feasible you could have a personal neural net that fine-tunes itself overnight to make less inference mistakes in the future.

AlphaGo would seem to be a conceptually simple counter example.

Any idea how humans do it? Where do they get novel information from?

what is a robot dream when there is clearly no consciousness?

What's with this insane desire for anthropomorphism? What do you even MEAN learn in its dreams? Fine-tuning overnight? Just say that!


  > What's with this insane desire for anthropomorphism?
Devil's advocate: Making the assumption that consciousness is uniquely human, and that humans are "special" is just as ludicrous.

Whether a computational medium is carbon-based or silicon-based seems irrelevant. Call it "carbon-chauvinism".


"Consciousness" is an overloaded thought killer that swerves all conversation into obfuscated semantic arguments. One person will be talking about 'internality' and self-image (in the testable, mechanical sense that you could argue Chain of Thought models already have in a petty way) and the other will be grappling with the concept of qualia and the ineffable nature of human experience.

That's not even a devil's advocate, many other animals clearly have consciousness, at least if we're not solipsistic. There have been many very dangerous precedents in medicine where people have been declared "brain dead" only to awake and remember.

Since consciousness is closely linked to being a moral patient, it is all the more important to err on the side of caution when denying qualia to other beings.


AI has traditionally been driven by "metaphor-driven development" where people assume the brain has system X, program something they give the same name, and then assume because they've given it that name it must work because it works in the brain.

This is generally a bad idea, but a few of the results like "neural networks" did work out… eventually.

"World model" is another example of a metaphor like this. They've assumed that humans have world models (most likely not true), and that if they program something and call it a "world model" it will work the same way (definitely not true) and will be beneficial (possibly true).

(The above critique comes from Phil Agre and David Chapman.)


Yes, and an object in OOP isn't really a physical object. And a string isn't really a thin bit of rope.

No-one cares. It's just terminology.


I'm invested in a startup that is doing something unrelated robotics, but they're spending a lot of time in Shenzhen, I keep a very close eye on robotics and was talking to their CTO about what he is seeing in China, versions of this are already being implemented.

[flagged]


I have no doubts that is the case. Just look at this new Unitree robot that was unveiled a mere 6 hours ago: https://youtu.be/ve9USu7zpLU?feature=shared

And these are consumer options, affordable to you and me, not only to some military. If those are the commonly available options... there may be way more advanced stuff that we haven't seen.


this stuff is old tech, and has nothing to do with transformers. The Boston Dynamics style robot dogs are always shown in marketing demos like the one you linked in secretly very controlled environments. Let me know when I can order one that will bring the laundry downstairs for my wife.

I asked for real examples from someone who claimed to have first hand experience, not more marketing bullshit



lol I thought this was going to be the receipts but no, I guess asking for evidence is against the rules!

y'all are in a religion


I genuinely have no idea where you’re coming from on this. You’re assuming way too much about everyone you disagree with

Calling someone a liar without evidence is kinda rude.

This is so standard it's part of the tutorials now

https://developer.nvidia.com/isaac/gr00t


Gr00t works but it is not a standard yet. Can't share what I have heard, but the success rate is still far behind than other methods.

"Do Androids Dream of Electric Sheep?"

The guy who tried was invite by Google to try it.

He seems to me too enthusiastic, such that I feel Google asked him in particular because they trusted him to write very positively.


I doubt there was a condition on writing positively. Other people who tested have said this won't replace engines. https://togelius.blogspot.com/2025/08/genie-3-and-future-of-...

You don’t ask people to speak how you want, you simply only invite people who already have a history of speaking how you want. This phenomena is explained in detail I. Noam Chomsky’s work around mass media (eg NY Times doesn’t tell their editors what to do exactly, but only hire editors who already want to say what NY Times wants, or have a certain world view). The same can be applied to social media reviews. Invite the person who gives glowing reviews all the time.

Do you know where Noam makes that argument? I've been trying to figure out where I picked it up years ago. I'd like to revisit it to deepen my understanding. It's a pretty universal insight.

Look for discussion with British journalist Andrew Marr during a BBC interview in 1996.

The lead in to the quote starts at https://youtu.be/GjENnyQupow?t=662

"I don't say you're self-censoring - I'm sure you believe everything you're saying; but what I'm saying is, if you believed something different, you wouldn't be sitting where you're sitting." -- Noam Chomksy to Andrew Marr


It's a shame the interviewer didn't quite grasp that point and dig a little deeper into it. Listening to it again I'm reminded of "The masters tools will never dismantle the master's house".

Thank you for finding that link for me :)


I think it was in "Manufacturing Consent" by Edward S. Herman and Noam Chomsky.

https://en.wikipedia.org/wiki/Manufacturing_Consent#:~:text=...

https://www.goodreads.com/book/show/12617.Manufacturing_Cons...

Though this is often associated with his and Herman's "Propaganda Model," Chomsky has also commented that the same appears in scholarly literature, despite the overt propaganda forces of ownership and advertisement being absent:

https://en.wikipedia.org/wiki/Propaganda_model#:~:text=Choms...



> What I don't think this technology will do is replace game engines. I just don't see how you could get the very precise and predictable editing you have in a regular game engine from anything like the current model. The real advantage of game engines is how they allow teams of game developers to work together, making small and localized changes to a game project.

I've been thinking about this a while and it's obvious to me:

Put Minecraft (or something similar) under the hood. You just need data structures to encode the world. To enable mutation, location, and persistence.

If the model is given additional parameters such as a "world mesh", then it can easily persist where things are, what color or texture they should be, etc.

That data structure or server can be running independently on CPU-bound processes. Genie or whatever "world model" you have is just your renderer.

It probably won't happen like this due to monopolistic forces, but a nice future might be a future where you could hot swap renderers between providers yet still be playing the same game as your friends - just with different looks and feels. Experiencing the world differently all at the same time. (It'll probably be winner take all, sadly, or several independent vertical silos.)

If I were Tim Sweeny at Epic Games, I'd immediately drop all work on Unreal Engine and start looking into this tech. Because this is going to shore them up on both the gaming and film fronts.


As a renderer, given a POV, lighting conditions, and world mesh might be a very, very good system. Sort of a tight MCP connection to the world-state.

I think in this context, it could be amazing for game creation.

I’d imagine you would provide item descriptions to vibe-code objects and behavior scripts, set up some initial world state(maps), populated with objects made of objects - hierarchically vibe-modeled, make a few renderings to give inspirational world-feel and textures, and vibe-tune the world until you had the look and feel you want. Then once the textures and models and world were finalised, it would be used as the rendering context.

I think this is a place that there is enough feedback loops and supervision that with decent tools along these lines, you could 100x the efficiency of game development.

It would blow up the game industry, but also spawn a million independent one or two person studios producing some really imaginative niche experiences that could be much, much more expansive (like a AAA title) than the typical indie-studio product.


> you could 100x the efficiency of game development.

> It would blow up the game industry, but also spawn a million independent one or two person studios producing some really imaginative niche experiences that could be much, much more expansive (like a AAA title) than the typical indie-studio product.

All video games become Minecraft / Roblox / VRChat. You don't need AAA studios. People can make and share their own games with friends.

Scary realization: YouTube becomes YouGame and Google wins the Internet forever.


You’ve just described what Roblox is already doing.

I haven’t checked on Roblox recently, but afaik it doesn’t really allow complete creative freedom or the ability to have a picture and say “make the world look like this, and make the character textures match the vibe” and have it happen. Don’t they still have a unified world experience or can you really customize things that deeply now?

Can you make a basically indistinguishable copy of other games in Roblox? If so, that’s pretty cool, even without AI integration.


Roblox can't beat Google in AI. Roblox has network effects with users, but on an old school tech platform where users can't magic things into existence.

I've seen Roblox's creative tools, even their GenAI tools, but they're bolted on. It's the steam powered horse problem.


It wouldn't be surprising if a structured version of this with state cached per room for example could be used in a game.

& you're basically seeing GPT-3 and saying it will never be used in any serious application.. the rate of improvement in their model is insane


Don't put the world state into the model. Use the model as a renderer of whatever objects the "engine" throws at it.

Use the CPU and RAM for world state, then pass it off to the model to render.

Regardless of how this is done, Unreal Engine with all of its bells and whistles is toast. That C++ pile of engineering won't outdo something this flexible.


How many watts and how much capital does it take to run this model? How many watts and how much capital does it take to run unity or unreal? I suspect there's a huge discrepancy here, among other things.

Making it exactly what you want, make edits or follow some exact state is the whole freaking problem here

But can we use it to create movies one scene at a time?

I don't know. I wasn't there and I'm excited.

I think this puts Epic Games, Nintendo, and the whole lot into a very tough spot if this tech takes off.

I don't see how Unreal Engine, with its voluminous and labyrinthine tomes of impenetrable legacy C++ code, survives this. Unreal Engine is a mess, gamers are unhappy about it, and it's a PITA to develop with. I certainly hate working with it.

Innovator's Dilemma fast approaching the entire gaming industry and they don't even see it coming it's happening so fast.

Exciting that building games could become as easy as having the idea itself. I'm imagining something like VRChat or Roblox or Fortnite, but where new things are simply spoken into existence.

It's absolutely terrifying that Google has this much power.


How so? It's not really by itself being creative yet, no? It sure seems like a game changer but who knows if one can even use this at scale?

I played around with Diamond WM on my 3090 machine. I also ran fast SDXL-turbo and LCM models with ControlNets paired with a 3D game prototype I threw together. The results were very compelling, and I was just one person hacking things together.

This is 100% going to happen on-device. It's just a matter of time.


I am convinced as well this will eventually be how we render games and simulations.

Maybe just as kind of a DLSS on steroids where the engine only renders very simple objects and a world model translates these to the actual graphics.


I imagine Unreal Engine will start incorporating such stuff?

Also he is ex-Google Mind. Like the worst kind of pick you can make when there are dozens of eligible journalists out there.

Wow. How do we know if we’re not in Genie 4 right now.

> this is a clear glimpse into the future.

Not for video games it isn’t.


Unless and until state can be stored outside of the model.

I for one would love a video game where you're playing in a psychedelic, dream-like fugue.


It is plausible to run a full simulation the old fashioned way and realtime render it with a diffusion model.

It is not currently, or near term, realistic to make a video game where a meaningful portion of the simulation is part of the model.

There will probably be a few interactive model-first experiences. But they’ll be popular as short novelties not meaningful or long experiences.

A simple question to consider is how would you adjust a set of simple tunables in a model-first simulator? For example giving the player more health, making enemies deal 2x damage, increasing move speed, etc etc. You can not.


Unless you can change fundamental problem that it doesnt follow the state the way you want exactly

It's kinda crazy though that a single game session would be burning enough natural gas to power 3 cities. Unless that's not true

Yeah this is going to be excellent for robotics because it’s good enough to clear the reality gap (visually - physics would be another story).

Curious how multiplayer would possibly work not only logistically, but technically and from a game play POV

consider the hardware DOOM runs on. 720p would only be a true test of capability if every bit of possible detail was used.

But that was always going to be the case?

Reality is not composed of words, syntax, and semantics. A human modal is.

Other human modals are sensory only, no language.

So vision learning and energy models that capture the energy to achieve a visual, audio, physical robotics behavior are the only real goal.

Software is for those who read the manual with their new NES game. Where are the words inside us?

Statistical physics of energy to make machine draw the glyphs of language not opionated clustering of language that will close the keyboard and mouse input loop. We're like replicating human work habits. Those are real physical behaviors. Not just descriptions in words.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: