T2V-Turbo-v2: Enhancing Video Generation Model Post-Training Through Data, Reward, and Conditional Guidance Design

T2V-Turbo-v2: Enhancing Video Generation Model Post-Training Through Data, Reward, and Conditional Guidance Design

Jiachen Li1         Qian Long2         Jian Zheng3         Xiaofeng Gao3         Robinson Piramuthu3         Wenhu Chen4         William Yang Wang1
1UC Santa Barbara, 2UC Los Angeles, 3Amazon AGI, 4University of Waterloo

Abstract

In this paper, we focus on enhancing a diffusion-based text-to-video (T2V) model during the post-training phase by distilling a highly capable consistency model from a pretrained T2V model. Our proposed method, T2V-Turbo-v2, introduces a significant advancement by integrating various supervision signals, including high-quality training data, reward model feedback, and conditional guidance, into the consistency distillation process. Through comprehensive ablation studies, we highlight the crucial importance of tailoring datasets to specific learning objectives and the effectiveness of learning from diverse reward models for enhancing both the visual quality and text-video alignment. Additionally, we highlight the vast design space of conditional guidance strategies, which centers on designing an effective energy function to augment the teacher ODE solver. We demonstrate the potential of this approach by extracting motion guidance from the training datasets and incorporating it into the ODE solver, showcasing its effectiveness in improving the motion quality of the generated videos with the improved motion-related metrics from VBench and T2V-CompBench. Empirically, our T2V-Turbo-v2 establishes a new state-of-the-art result on VBench, with a Total score of 85.13, surpassing proprietary systems such as Gen-3 and Kling.

Overview of Training Pipeline

Training pipeline of our T2V-Turbo-v2. When augmenting the teacher PF-ODE solver with CFG and motion guidance, we extract motion prior \(A(\boldsymbol{z}_{t_{n+k}}^\text{ref})\) from the training videos and distill it into the student CM \(\boldsymbol{f}_{\theta}\) along with the CFG.

State-of-the-Art Performance on VBench

Ablation Studies on the Design of Training Datasets

Ablation Studies on the Deisng of Reward Models