Text-based 2D image editing models have recently reached an impressive level of maturity, motivating a growing body of work that uses them to drive 3D edits. While effective for appearance-based modifications, such 2D-centric 3D editing pipelines often struggle with fine-grained 3D editing, where localized structural changes must be applied while strictly preserving an object’s overall identity.
To address this limitation, we propose Prox-E, a training-free framework that enables fine-grained 3D control through an explicit, primitive-based geometric abstraction. Our framework first abstracts an input 3D shape into a compact set of geometric primitives. A pretrained vision-language model then edits this abstraction to specify primitive-level changes, which are subsequently used to guide a 3D generative model. This enables fine-grained, localized modifications while preserving unchanged regions of the original shape.
Through extensive experiments, we show that Prox-E consistently balances identity preservation, shape quality, and instruction fidelity more effectively than existing approaches, including 2D-based 3D editors and training-based methods.
Bio:
Etai Sella is a fourth-year PhD student at Tel Aviv University, supervised by Hadar Averbuch-Elor and Or Patashnik. His research focuses on making generative AI more controllable and editable, with an emphasis on 3D editing. He is currently an intern at Snap Research.