Carve3D: Improving Multi-view Reconstruction Consistency for Diffusion Models with RL Finetuning

Xie, Desai; Li, Jiahao; Tan, Hao; Sun, Xin; Shu, Zhixin; Zhou, Yi; Bi, Sai; Pirk, Sören; Kaufman, Arie E.

Computer Science > Computer Vision and Pattern Recognition

arXiv:2312.13980v1 (cs)

[Submitted on 21 Dec 2023 (this version), latest version 9 Apr 2024 (v2)]

Title:Carve3D: Improving Multi-view Reconstruction Consistency for Diffusion Models with RL Finetuning

Authors:Desai Xie, Jiahao Li, Hao Tan, Xin Sun, Zhixin Shu, Yi Zhou, Sai Bi, Sören Pirk, Arie E. Kaufman

View PDF HTML (experimental)

Abstract:Recent advancements in the text-to-3D task leverage finetuned text-to-image diffusion models to generate multi-view images, followed by NeRF reconstruction. Yet, existing supervised finetuned (SFT) diffusion models still suffer from multi-view inconsistency and the resulting NeRF artifacts. Although training longer with SFT improves consistency, it also causes distribution shift, which reduces diversity and realistic details. We argue that the SFT of multi-view diffusion models resembles the instruction finetuning stage of the LLM alignment pipeline and can benefit from RL finetuning (RLFT) methods. Essentially, RLFT methods optimize models beyond their SFT data distribution by using their own outputs, effectively mitigating distribution shift. To this end, we introduce Carve3D, a RLFT method coupled with the Multi-view Reconstruction Consistency (MRC) metric, to improve the consistency of multi-view diffusion models. To compute MRC on a set of multi-view images, we compare them with their corresponding renderings of the reconstructed NeRF at the same viewpoints. We validate the robustness of MRC with extensive experiments conducted under controlled inconsistency levels. We enhance the base RLFT algorithm to stabilize the training process, reduce distribution shift, and identify scaling laws. Through qualitative and quantitative experiments, along with a user study, we demonstrate Carve3D's improved multi-view consistency, the resulting superior NeRF reconstruction quality, and minimal distribution shift compared to longer SFT. Project webpage: this https URL.

Comments:	Project webpage: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2312.13980 [cs.CV]
	(or arXiv:2312.13980v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2312.13980

Submission history

From: Desai Xie [view email]
[v1] Thu, 21 Dec 2023 16:10:33 UTC (9,455 KB)
[v2] Tue, 9 Apr 2024 04:41:53 UTC (9,933 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Carve3D: Improving Multi-view Reconstruction Consistency for Diffusion Models with RL Finetuning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Carve3D: Improving Multi-view Reconstruction Consistency for Diffusion Models with RL Finetuning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators