视频生成/视频理解【文章汇总】SVD, Sora, Latte, VideoCrafter12, DiT...
- 【arXiv 2024】MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions
- 【CVPR 2024】VBench : Comprehensive Benchmark Suite for Video Generative Models
- 【arxiv 2024】T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation
- 【arxiv 2024】Latte: Latent Diffusion Transformer for Video Generation
- 【arxiv 2024】VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
- 【ACL 2024】Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models
- 【CVPR 2024】InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
- 【ECCV 2024】InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
- 【arxiv 2024】xxx
- 【arxiv 2024】xxx
- 【arxiv 2024】xxx
- 【arxiv 2024】xxx
- 【arxiv 2024】xxx
- 【arxiv 2024】xxx
数据集
指标
【arXiv 2024】MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions
Authors: Xuan Ju, Yiming Gao, Zhaoyang Zhang, Ziyang Yuan, Xintao Wang, Ailing Zeng, Yu Xiong, Qiang Xu, Ying Shan
Abstract Sora's high-motion intensity and long consistent videos have significantly impacted the field of video generation, attracting unprecedented attention. However, existing publicly available datasets are inadequate for generating Sora-like videos, as they mainly contain short videos with low motion intensity and brief captions. To address these issues, we prop