00:00:00So last week, Google unveiled Genie 3, their flagship infinite world model, where you get
00:00:05to simulate an environment and interact with it like in a real video game.
00:00:10And suddenly all the video game stocks absolutely plummeted out of fear that this might be the
00:00:16beginning of the end of the video game industry.
00:00:20And then something even more interesting happened.
00:00:22A Chinese tech company called Robiant released their own open source Genie competitor, which
00:00:28appears to have even better graphics than its Google counterpart.
00:00:32And now all of a sudden the floodgates are open for the race to determine which company
00:00:37will be the first one to replace traditional video games with this new kind of gaming tech.
00:00:43But while everyone is hyping up this new infinite world model craze, I'm here to tell you this
00:00:49might just be a hyped up promise with no actual substance.
00:00:54What makes me so sure of it?
00:00:55Well, that's what we're going to talk about in today's video.
00:01:02So as soon as Genie 3 came out, I rushed to the site to try it for myself.
00:01:07But as soon as I clicked the explore button, I was presented with a disappointing 404 button.
00:01:14And that's because I live in Canada.
00:01:16And for the time being, Google has only allowed the citizens of the United States to try out
00:01:20this state of the art technological wonder.
00:01:23So obviously I turned on my VPN and tried again from a US location.
00:01:27And this time I was met with another disappointing rejection, stating that I need to be an UltraPlan
00:01:33member to access this revolutionary piece of software.
00:01:37And if you're wondering how much does the UltraPlan cost, well, let's just say it's a bit too much
00:01:41of what I would be comfortable paying just to try out this overhyped AI tool.
00:01:46But this begs the question, why is it so hard to get your hands on Genie 3 in the first place?
00:01:51And the answer to this question will be very important to our story, but I'll get back to
00:01:56that later in this video.
00:01:57So although I had no luck or no disposable funds to try out Genie 3, meanwhile, luckily,
00:02:04over the other side of the globe, a Chinese company called Robiont, which appears to be
00:02:09a subsidiary to Ant Group, which in turn is an affiliate company of Alibaba Group, which
00:02:15happens to be the same company that owns Quen, came out with their own infinite world model
00:02:20called Lingbot World, which surprisingly is open source.
00:02:25So that means we can actually test it out and see what it's capable of.
00:02:29And looking at their examples, it looked absolutely stunning.
00:02:32But once I started inspecting the project page, I was met with another huge disappointment.
00:02:38Although their project page is filled with example videos where you can freely walk around
00:02:43the space with your arrow keys, in reality, this model version that involves full character
00:02:48controls is still under development.
00:02:51They are planning to release Lingbot fast, which would be a full Genie 3 equivalent, but
00:02:56we don't know when that is coming yet.
00:02:57For the time being, we get access to their 14 billion parameter base model, which offers
00:03:03quote high fidelity controllable and logically consistent simulations.
00:03:08But basically the only thing this model is capable of doing as of now is generate a video.
00:03:14Yep, just the video.
00:03:16So I was kinda confused, where does the control factor come in?
00:03:20Well, they do have the option to provide your own intrinsic camera position values, so you
00:03:25can in a sense control the camera movement, which I guess offers an alternative to navigation
00:03:31using the arrow keys, but you would have to pre-record that.
00:03:35How is it different from any other video generator out there that also offers the ability to control
00:03:40camera movements?
00:03:41Well, here's the key distinction.
00:03:44In a regular AI video generator, the AI model tries to always predict the next frame as the
00:03:50reference video progresses, and we've seen in many internet meme videos how terribly wrong
00:03:55this gets if the video just keeps on going, and that is because the model doesn't retain
00:04:00information about what's going on outside of the frame.
00:04:04So if a camera pans away from the object and then pans back, the object might not be there
00:04:09anymore because the whole scene is generated on the fly.
00:04:13This is where the 14 billion parameter geometric brain of the Lingbot World model comes into
00:04:18play.
00:04:19Unlike a standard video generator that simply guesses the next set of pixels, Lingbot World
00:04:24uses camera intrinsics data and 6 degrees of freedom poses to match every pixel to a specific
00:04:31point in 3D space.
00:04:33It creates what researchers call "object permanence" because it understands the mathematical relationship
00:04:39between the camera's lens and the environment.
00:04:42So basically it remembers that a specific object exists at specific coordinates.
00:04:47And this structural integrity is why this model is so massive and computationally hungry.
00:04:52How hungry?
00:04:53Oh boy, let me tell you.
00:04:55I tried deploying the Lingbot World model on an instance with a single RTX 1590 GPU and
00:05:02I tried running the basic sample demo they provided and it just crashed immediately.
00:05:07It was kind of naive of me to think that a single 1590 would be able to handle that load.
00:05:13Then I tried running it with dual 1590s and nope, it still crashed.
00:05:18Then I tried it with 4 1590s and once again, it still crashed.
00:05:23Then I spun up a container with 8 RTX 1590s and tried running the basic demo example and
00:05:31it still crashed.
00:05:32See, the reason is that when running this infinite world model for a prolonged period of time,
00:05:38the amount of memory this model has to store about the scenes keeps getting bigger and bigger
00:05:44up to a point where you will just get an out of memory error because you just ran out of
00:05:49RAM.
00:05:50But I did manage to successfully run the sample demo on an 8 GPU setup by lowering the sample
00:05:55size from the default 70 to just 20.
00:05:59And honestly, the difference between 70 and 20 samples was not that noticeable.
00:06:03But this just shows how insanely computationally expensive running this infinite world model
00:06:09becomes.
00:06:10And getting back to Genie 3, this is exactly why they allow access to it for ultra members
00:06:16only because they need to somehow recuperate the GPU costs of running this thing.
00:06:21And this is also why you only get a certain amount of seconds for one demo because at some
00:06:27point the memory just balloons to a point that the whole system just comes crashing down.
00:06:32And to give you an idea of how insanely expensive it would be to run such a model on consumer
00:06:37grade hardware, a single RTX 1590 costs up to $5,000.
00:06:43Now take 8 of those, which is the minimum required of running this thing.
00:06:48Man, even saying that out loud sounds ridiculous.
00:06:51But anyway, 8 of those will cost you up to $40,000, not to mention all the other parts
00:06:57and RAM which is also exploding in price right now.
00:07:01And when you take that into account, this number, plus the max runtime limit of 60 seconds at
00:07:06which Genie is capping their runs, plus the ballooning RAM memory issue are exactly the
00:07:12reasons why this whole infinite world model thing is just a hype and is not really remotely
00:07:18achievable on consumer hardware with the current architecture that we have right now.
00:07:24And even the authors of both of these tools are admitting these problems.
00:07:28The high inference cost currently necessitates enterprise grade GPUs making the technology
00:07:34inaccessible on consumer hardware.
00:07:37The simulation lacks long term stability.
00:07:39This often leads to environmental drifting where the scene gradually loses structural
00:07:44integrity over extended durations.
00:07:46Exactly.
00:07:48And at least the LinkBot team is being open about it.
00:07:51Let's see what Google has to say about it.
00:07:53The model can support a few minutes of continuous interaction rather than extended hours.
00:07:59I mean, they're not openly admitting it, but at this point we all know why that is.
00:08:04So that's why I'm telling you folks, traditional video games are not disappearing anytime soon.
00:08:09This just seems like a pipe dream at this point and maybe, just maybe, in the future, if they
00:08:15figure out how to solve these computational problems, we might start thinking about this.
00:08:20But right now, bruh, come on.
00:08:23I'm also super curious to try out LinkBot fast when it finally arrives.
00:08:27But until then, I don't think this technology is going mainstream anytime soon.
00:08:32But if you're curious about trying out LinkBot world for yourself, here's my advice.
00:08:37Don't do what I did.
00:08:38Don't stack up eight RTX 1590s together because such a configuration on a platform like RunPod
00:08:45will drain $7 every hour of its runtime.
00:08:48Instead, spin up a single H200 container, which only costs $3.50 per hour and set the
00:08:55"nproc/node" flag to 1 and maybe lower the sample count to 50 or even 20 and you'll be
00:09:01good to go.
00:09:02You could also use the 4-bit quantized version of this model, created by the user Caelan Humphries,
00:09:08which significantly reduces GPU memory consumption while maintaining comparative visual quality
00:09:13for inference.
00:09:15So you technically could try to run that on a single RTX 1590.
00:09:19And if you do so, let me know how it goes.
00:09:21So as for myself, I ran the basic demo on an H200 container and yeah, basically got the
00:09:28same result as their demo page.
00:09:30And then I generated an AI image of this Viking fighting against Loki and fed this image to
00:09:36the same command.
00:09:37And this is the result I got.
00:09:39I guess you can see how the model maintains the integrity of the environment and the castle
00:09:44throughout the video, but it still generates some weird artifacts.
00:09:48So honestly, I don't know what to think of it, to be honest.
00:09:52I'm pretty sure I could generate a better gameplay video using a standard comfy UI pipeline, which
00:09:59by the way, if you're interested of learning how to make your own video generator like Sora
00:10:04without the heavy compute cost, check out this video I did a while ago on that topic.
00:10:09So there you have it, folks, that is my take on Genie 3 and all the hype and the future
00:10:15of video games.
00:10:16I really appreciate the team behind Lingbot open sourcing their models so we can get a
00:10:20better insight as to how a Genie like model works.
00:10:25But those are just my two cents on the topic.
00:10:27More importantly, what do you think about these infinite world models?
00:10:30I'm curious to know what you think, so drop your thoughts in the comments section down
00:10:35below.
00:10:36And folks, if you found this video useful, let me know by smashing that like button underneath
00:10:40the video.
00:10:41And also don't forget to subscribe to our channel for more videos like this one.
00:10:45This has been Andris from Better Stack and I will see you in the next videos.
00:11:00(upbeat music)