ZoeDepth: AI-powered depth calculation for 3D matte paintings and set extensions

ZoeDepth: AI-powered depth calculation for 3D matte paintings and set extensions

In art and design, understanding the depth of an environment is crucial. Depth is the perception of how far or near objects are from the viewer. It gives a sense of dimensionality and realism to a scene. While Depth is utilized naturally in 3D rendering, it is not always easy to capture or create when working with 2D matte paintings or background plates. This is where ZoeDepth comes into play. It’s a unique tool that calculates depth effectively, enabling artists to create realistic 3D meshes from 2D backdrops. 

This article will explore how ZoeDepth can be used in 3D production to expedite scene building, save time and money when using a render farm, and iterate faster during early visualization and blocking.

What is matte painting?

Matte painting is a visual art technique used in the film industry to create the illusion of landscapes, sets, or distant locations that are not present on set. Traditionally, artists would paint these scenes on large sheets of glass which would then be integrated with the live-action footage.

However, the digital age has transformed this art form. Nowadays, matte painters use digital tools to combine and modify various elements to create their scenes. They might use real-world images, create 3D models, apply textures, and adjust lighting.

3D technology has greatly enhanced the capabilities of matte painting. It allows artists to animate 2.5D shots with 3D cameras, a technique known as camera mapping. It also enables the creation of intricate set extensions such as cities or landscapes.

For independent 3D animated shorts, it isn’t always feasible to model and texture elements that are further away from the camera, but to rely solely on an image backdrop limits camera movement, creating big opportunity costs for Cinematographic immersion. This is because of a visual effect called Parallax, where the position or direction of an object appears to change when viewed from different positions.

It’s like when you’re in a moving car, looking out the window - the trees close by seem to zoom past quickly, while the mountains in the distance appear to move slowly. This is Parallax, and it is integral to our visual experience. Unfortunately, even faster workarounds for achieving Parallax, like taking a 2.5D approach require considerable time spent creating depth maps and projecting images onto geometry - at least, before technology like Zoedepth. 

Understanding ZoeDepth

ZoeDepth’s ability to calculate depth effectively sets it apart from other tools. It combines relative and metric depth to provide more accurate depth estimation. Relative depth is the comparison of the distance between objects in an image, while metric depth is the absolute distance of objects from the camera. By combining these two types of depth, ZoeDepth can account for both the perspective and the scale of the scene. This translates into its ability to understand the foreground, midground, and background elements of an image. By distinguishing these elements, ZoeDepth can create a 3D mesh that accurately represents the original environment.

ZoeDepth: AI-powered depth calculation for 3D matte paintings and set extensions
Left: a static matte painting plate Right: The Depth Map generated by ZoeDepth

Creating 3D meshes

When creating a 3D mesh from an environment image, understanding the spatial relationship between the different elements of the image is crucial. In many cases, different elements. To create a 3D mesh from a 2D image, ZoeDepth uses a process called depth map fusion. A depth map is an image that shows the depth of each pixel in the original image. ZoeDepth generates two depth maps for each image: one for the relative depth and one for the metric depth. It then fuses these two depth maps together to create a final depth map that captures both the perspective and the scale of the scene. ZoeDepth then uses this final depth map to create a 3D mesh that maintains the integrity of the original image, providing a realistic and immersive experience.

ZoeDepth: AI-powered depth calculation for 3D matte paintings and set extensions
ZoeDepth testing Environment in Hugging Face. The right is a 3D mesh generated from the image with the mid and background elements separated and already textured.

Benefits for 3D artists

For 3D artists, ZoeDepth is a game-changer. It allows them to create backdrops for environments with good parallax out of the box. With ZoeDepth, artists can create 3D meshes that have good parallax without the need for extensive manual work. The depth maps generated by ZoeDepth can also be used for post-processing effects in photography and filmmaking, such as selective focus, bokeh, or background removal. These effects can enhance the aesthetic and the mood of a scene, making it more appealing and captivating.

A test render of the generated mesh imported into a Blender scene, where Parallax is evident in the midground ruins and the misty land-mass behind it. Thanks to the generated model from ZoeDepth, Depth of Field and a Mist Pass could be utilized instantly, as the mid and background were already separate and distanced from each other.

By leveraging the implicit 3D information encoded in images, ZoeDepth provides several key advantages over traditional digital 3D techniques that save artists significant time and production costs.

Simplified Texturing

As the generated 3D meshes are meant to be viewed from a distance, complex PBR texturing is unnecessary. Scenes can utilize a single image projected onto an emission shader. This radically reduces texture memory usage and render times compared to fully texturing complex 3D geometry.

Reduced Geometry

Heavily subdivided geometry and displacement maps are no longer needed to fake depth in a 2.5D image. ZoeDepth extracts true 3D shape information directly from the source image. This lightens scene data sizes and speeds up viewport interaction.

Streamlined Workflow

Artists can avoid the lengthy process of modeling distant architectural details or landscapes by hand. Images created with matte painting, photogrammetry, or AI generation need only be analyzed once by ZoeDepth to populate a scene. This frees up hours traditionally spent recreating assets digitally.

Iterative Testing

Early scene blocking, camera blocking, lighting studies and test renders are significantly quicker using the lightweight 3D meshes from ZoeDepth versus traditional modeling. This shortens the design loop and allows more concepts to be evaluated before render farm time is reserved.

Render Cost Savings

Leveraging the streamlined workflow and simplified textures, complex scenes that previously required prohibitive render times can now be brought to approval stages and rendered to a render farm for a fraction of the cost. Budgets stretch further as more work is done outside the farm.

In summary, ZoeDepth's single-source image extraction process provides drastic time-saving advantages and cost reductions versus traditional 3D workflows through simplified modeling, texturing and scene population techniques.


ZoeDepth is revolutionizing the field of 3D art and design with its effective depth calculation. By understanding the different elements of an image and translating them into a 3D mesh, it provides artists with a powerful tool to create realistic and engaging environments and set extensions. With ZoeDepth, 3d Environment and Set Design is more accessible than ever. 

You can try ZoeDepth in its Hugging Face space: https://huggingface.co/spaces/shariqfarooq/ZoeDepth

Related Posts

No items found.
No items found.
live chat