AI in the Big Easy: NVIDIA Research Lets Content Creators Improvise With 3D Objects

Inverse rendering pipeline NVIDIA 3D MoMa will be presented this week at the Conference on Computer Vision and Pattern Recognition in New Orleans.
by Isha Salian

Jazz is all about improvisation — and NVIDIA is paying tribute to the genre with AI research that could one day enable graphics creators to improvise with 3D objects created in the time it takes to hold a jam session.

The method, NVIDIA 3D MoMa, could empower architects, designers, concept artists and game developers to quickly import an object into a graphics engine to start working with it, modifying scale, changing the material or experimenting with different lighting effects.

NVIDIA Research showcased this technology in a video celebrating jazz and its birthplace, New Orleans, where the paper behind 3D MoMa will be presented this week at the Conference on Computer Vision and Pattern Recognition.

Extracting 3D Objects From 2D Images

Inverse rendering, a technique to reconstruct a series of still photos into a 3D model of an object or scene, “has long been a holy grail unifying computer vision and computer graphics,” said David Luebke, vice president of graphics research at NVIDIA.

“By formulating every piece of the inverse rendering problem as a GPU-accelerated differentiable component, the NVIDIA 3D MoMa rendering pipeline uses the machinery of modern AI and the raw computational horsepower of NVIDIA GPUs to quickly produce 3D objects that creators can import, edit and extend without limitation in existing tools,” he said.

To be most useful for an artist or engineer, a 3D object should be in a form that can be dropped into widely used tools such as game engines, 3D modelers and film renderers. That form is a triangle mesh with textured materials, the common language used by such 3D tools.

trumpet mesh
Triangle meshes are the underlying frames used to define shapes in 3D graphics and modeling.

Game studios and other creators would traditionally create 3D objects like these with complex photogrammetry techniques that require significant time and manual effort. Recent work in neural radiance fields can rapidly generate a 3D representation of an object or scene, but not in a triangle mesh format that can be easily edited.

NVIDIA 3D MoMa generates triangle mesh models within an hour on a single NVIDIA Tensor Core GPU. The pipeline’s output is directly compatible with the 3D graphics engines and modeling tools that creators already use.

The pipeline’s reconstruction includes three features: a 3D mesh model, materials and lighting. The mesh is like a papier-mâché model of a 3D shape built from triangles. With it, developers can modify an object to fit their creative vision. Materials are 2D textures overlaid on the 3D meshes like a skin. And NVIDIA 3D MoMa’s estimate of how the scene is lit allows creators to later modify the lighting on the objects.

Tuning Instruments for Virtual Jazz Band

To showcase the capabilities of NVIDIA 3D MoMa, NVIDIA’s research and creative teams started by collecting around 100 images each of five jazz band instruments — a trumpet, trombone, saxophone, drum set and clarinet — from different angles.

NVIDIA 3D MoMa reconstructed these 2D images into 3D representations of each instrument, represented as meshes. The NVIDIA team then took the instruments out of their original scenes and imported them into the NVIDIA Omniverse 3D simulation platform to edit.

editing the 3D trumpet in NVIDIA Omniverse

In any traditional graphics engine, creators can easily swap out the material of a shape generated by NVIDIA 3D MoMa, as if dressing the mesh in different outfits. The team did this with the trumpet model, for example, instantly converting its original plastic to gold, marble, wood or cork.

Creators can then place the newly edited objects into any virtual scene. The NVIDIA team dropped the instruments into a Cornell box, a classic graphics test for rendering quality. They demonstrated that the virtual instruments react to light just as they would in the physical world, with the shiny brass instruments reflecting brightly, and the matte drum skins absorbing light.

These new objects, generated through inverse rendering, can be used as building blocks for a complex animated scene — showcased in the video’s finale as a virtual jazz band.

The paper behind NVIDIA 3D MoMa will be presented in a session at CVPR on June 22 at 1:30 p.m. Central time. It’s one of 38 papers with NVIDIA authors at the conference. Learn more about NVIDIA Research at CVPR.