SemCity: Semantic Scene Generation with Triplane Diffusion

1             2    
*Equal Contribution

SemCity generates (a) novel real-outdoor scenes and performs practical downstream tasks:
(b) SSC refinement → (c) Scene outpainting → (d) Scene inpainting.

Abstract

We present SemCity, a 3D diffusion model, for semantic scene generation in real-world outdoor environments. Most 3D diffusion models focus on generating a single object, synthetic indoor scenes, or synthetic outdoor scenes, while the generation of real-world outdoor scenes is rarely addressed. In this paper, we focus on generating a real-outdoor scene through learning a diffusion model on a real-world outdoor dataset. In contrast to synthetic data, real-outdoor datasets often contain more empty spaces due to sensor limitations, causing challenges in learning real-outdoor distributions. To address this issue, we exploit a triplane representation as a proxy form of scene distributions to be learned by our diffusion model. Furthermore, we propose a triplane manipulation that integrates seamlessly with our triplane diffusion model. The manipulation improves our triplane diffusion model’s applicability in a variety of downstream tasks related to outdoor scene generation such as scene inpainting, scene outpainting, and semantic scene completion refinements. In experimental results, we demonstrate that our triplane diffusion model shows meaningful generation results compared with existing work in a real-outdoor dataset, SemanticKITTI. We also show our triplane manipulation facilitates seamlessly adding, removing, or modifying objects within a scene. Further, it also enables the expansion of scenes toward a city-level scale. Finally, we evaluate our method on semantic scene completion refinements where our diffusion model enhances predictions of semantic scene completion networks by learning data distribution.

Results Gallery

Semantic Scene Generation

Our scene diffusion method can unconditionally generate novel real-outdoor semantic scenes. The results demonstrate diverse and realistic generated scenes with various road shapes, including L, T, Y, straight, and crossroads.

Image isn't loaded.

Scene Inpainting

Our method inpaints given scenes via adding, removing and changing objects. Also, the overall scene structure can be modified. The red boxes refer to inpainting regions.

Image isn't loaded.

Scene Outpainting

Given scenes, our method produces various outpainted scenes. The red boxes indicate the original scene for outpainting.

Image isn't loaded.


Further, our method can extend the scene toward larger-scale.

Image isn't loaded.

SSC (Semantic Scene Completion) Refinement

Our method refines predicted scenes of SSC networks that takes an RGB image or LiDAR point clouds as inputs.

Image isn't loaded.

Citation

@inproceedings{lee2024semcity,
    title={SemCity: Semantic Scene Generation with Triplane Diffusion},
    author={Lee, Jumin and Lee, Sebin and Jo, Changho and Im, Woobin and Seon, Juhyeong and Yoon, Sung-Eui},
    booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
    year={2024}
}