הצהרת נגישות close

דלג לתוכן ראשי
דלג לניווט
דלג לחיפוש

Calendar event

חיפוש אירוע

שם האירוע

תאריך

תאריך סיום

קטגוריה

כנסים

הרצאות

אירועי תרבות

אירועים אקדמיים

מועצת המנהלים הבינלאומית

סימפוזיונים

בחרו קטגוריה אחת או יותר

מארגן

אפריל 20, 1996 - אפריל 20, 2029

Date:02חמישיינואר 2025

Vision and AI

More information
שעה	12:15 - 13:15
כותרת	Utilizing Pre-trained Diffusion Models for Text-based Image and Video Editing
מיקום	בניין יעקב זיסקינד Room 1 - 1 חדר
מרצה	Vladimir Kulikov Technion
מארגן	המחלקה למדעי המחשב ומתמטיקה שימושית
צרו קשר	karina.avadia@weizmann.ac.il
תקציר	Show full text abstract about Text-to-image (T2I) diffusion/flow models achieve state-of-t...» Text-to-image (T2I) diffusion/flow models achieve state-of-the-art results in image synthesis. Many works leverage these models for real image editing, where a predominant approach involves inverting the image into its corresponding gaussian-like noise map. However, inversion by itself is often insufficient for structure preserving edits. In our first work in this talk, termed ‘An Edit Friendly DDPM Noise Space’ [1], we present alternative latent noise maps for denoising diffusion probabilistic models (DDPMs) that do not have a standard normal distribution. These noise maps allow for perfect reconstruction of any real image, and lead to structure preserving edits, as we exemplify in our experiments. In our second work, we tackle the task of text-based video editing using T2I diffusion models. Here the main challenge lies in maintaining the temporal consistency of the original video during the edit. Many methods leverage explicit correspondence mechanisms, which struggle with strong nonrigid motion. In contrast, our method termed ‘Slicedit’ [2], introduces a fundamentally different approach, which is based on the observation that spatiotemporal slices of natural videos exhibit similar characteristics to natural images. Thus, the same T2I diffusion model that is normally used only as a prior on video frames, can also serve as a strong prior for enhancing temporal consistency by applying it on spatiotemporal slices. As we show Sliceditgenerates videos that retain the structure and motion of the original video without relying on explicit correspondence matching while adhering to the target text. Finally, in our most recent work, we will discuss ‘FlowEdit’ [3], a novel text-based image editing method that leverages the increasingly popular flow models without relying on inversion. Our method constructs an ODE that directly maps between the source and target distributions (corresponding to the source and target text prompts) and achieves a lower transport cost than the inversion approach. This leads to state-of-the-art results, as we illustrate with Stable Diffusion 3 and FLUX. [1] An Edit Friendly DDPM Noise Space: Inversion and Manipulations - CVPR24’ https://arxiv.org/abs/2304.06140 [2] Slicedit: Zero-Shot Video Editing With Text-to-Image Diffusion Models Using Spatio-Temporal Slices - ICML24’ https://arxiv.org/abs/2405.12211 [3] FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow Models – under review https://arxiv.org/abs/2412.08629 Bio: Vladimir Kulikov, PhD student at the Technion, under the supervision of Prof. Tomer Michaeli. Currently studying Deep Generative Models with emphasis on Computer Vision.

הרצאה