Two recent events tell us there’s plenty going on behind the scenes in terms of visionOS development.
Apple didn’t cast much light on visionOS at WWDC this year, and it hasn’t received much attention since. But don’t mistake this something for nothing. Two recent events indicate there’s a lot going on behind the scenes.
The first is the release of Submerged, the first movie filmed in Immersive Video written and directed by Academy Award-winning filmmaker Edward Berger (All Quiet on the Western Front).
The second is new research from Apple’s Machine Learning teams that shows how to create accurate depth-of-field data from single-lens cameras using conventional computers.
An immersive movie about immersion
Submerged is a claustrophobic, adrenaline-fuelled, 17-minute story set on a sinking ship — in this case a war-damaged submarine — capturing the crew as they fight to stay alive. The movie is made for Vision Pro devices, and reviewers already claim it delivers a sense of immediacy and intimacy they’ve never experienced before.
All of this is interesting, but how can this kind of experience be delivered in an even more powerful way? How can Apple’s technologies support an even more immersive user experience?
That’s what I think Apple is working on based on the second event to have emerged in the last few days: the introduction of an AI-based model Apple calls Depth Pro.
AI provides depth
What this does is powerful. The AI can basically map the depth of a 2D image. The technology behind it seems similar to what you’d expect if you were building an autonomous vehicle, given such vehicles must be able to accurately determine depth using images of nearby objects in real time.
Apple’s researchers seem to have developed this tech so it will run accurately on an iPhone. They claim that apps using the Depth Pro model can produce accurate depth maps based on images captured by a single lens camera in just 0.3 seconds when run on a computer running a standard GPU.
The team says the tech could have big implications for robots, real-time mapping, and improved camera or video effects. You can read the company’s research paper on these features here, or its post concerning Depth Pro on the company’s machine learning website.
Information is power
Being able to take it to the movies suggests Apple now has a technology that can automatically figure out depth from 2D images. Of course, a movie is just a sequence of 2D images, which means the company has tech to figure out spatial positioning based on what you see on screen.
You can already see this to some extent in that visionOS can turn existing photos into spatial images, adding depth to create a stereoscopic effect. It also makes sense to use that tech to generate 3D environments from 2D images.
What next? In August 2023, Apple researchers published a paper explaining FineRecon, which showed how 3D scenes constructed from posed images using AI can be made more accurate and deliver scenes that offer more fidelity. That research couples well with earlier information concerning a project to deliver enhanced 3D indoor scene understanding.
Movies you can walk through
Combine all these ingredients and, in theory, the breakthrough Apple might achieve could involve the creation of tech that can both understand images, and also add to them. After all, if you know that object A is in one position and object B in another, you can more easily deliver the illusion of walking between or even behind those objects to a Vision Pro user.
Generative AI (genAI) solutions already exist that can create video or image “fakes,” but to what extent can the computer exploit its knowledge of depth of field to generate 3D experiences in which you can literally walk behind the objects you see? And how could those technologies be applied to the viewing experience of watching Apple’s Submerged movie?
Even as it is, the experience of being in a sinking submarine is immersive in both senses of the word — but being able to find your own viewpoint within that action in high fidelity would realize every video gamer’s dreams. It would certainly sell a few movies.
Arranging the scenery
It’s important not to jump too far forward. Building technologies to achieve these things is going to be much more challenging than simply pontificating on the possibilities in prose, but there are other potential visionOS implications to the application of accurate depth-of-field data based on 2D images. I’m particularly thinking about use in emergency response, medicine, remote drone control — even space exploration, and all from a single-lens camera, making the tech lightweight and highly portable.
In other words, along with new frontiers for creative expression, there are viable business opportunities about to be unlocked by Apple’s home-grown reality distortion machine. Will we see some of them emerge with visionOS 3.0 at next year’s WWDC? Is it then we’ll really see how Apple Intelligence can work miracles with Spatial Reality?
Please follow me on LinkedIn, Mastodon, or join me in the AppleHolic’s bar & grill group on MeWe.