视频生成【文章汇总】SVD, Sora, Latte, VideoCrafter12, DiT...

视频生成【文章汇总】SVD, Sora, Latte, VideoCrafter12, DiT...

    • 数据集
    • 指标
  • 【arXiv 2024】MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions
  • 【CVPR 2024】VBench : Comprehensive Benchmark Suite for Video Generative Models
  • 【arxiv 2024】T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation
  • 【arxiv 2024】Latte: Latent Diffusion Transformer for Video Generation
  • 【arxiv 2024】xxx
  • 【arxiv 2024】xxx
  • 【arxiv 2024】xxx
  • 【arxiv 2024】xxx

数据集

指标

【arXiv 2024】MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions

Authors: Xuan Ju, Yiming Gao, Zhaoyang Zhang, Ziyang Yuan, Xintao Wang, Ailing Zeng, Yu Xiong, Qiang Xu, Ying Shan

Abstract Sora's high-motion intensity and long consistent videos have significantly impacted the field of video generation, attracting unprecedented attention. However, existing publicly available datasets are inadequate for generating Sora-like videos, as they mainly contain short videos with low motion intensity and brief captions. To address these issues, we propose MiraData, a high-quality video dataset that surpasses previous ones in video duration, caption detail, motion strength, and visual quality. We curate MiraData from diverse, manually selected sources and meticulously process the data to obtain semantically consistent clips. GPT-4V is employed to annotate structured captions, providing detailed descriptions from four different perspectives along with a summarized dense caption. To better assess temporal consistency and motion intensity in video generation, we introduce MiraBench, which enhances existing benchmarks by adding 3D consistency and tracking-based motion strength metrics. MiraBench includes 150 evaluation prompts and 17 metrics covering temporal consistency, motion strength, 3D consistency, visual quality, text-video alignment, and distribution similarity. To demonstrate the utility and effectiveness of MiraData, we conduct experiments using our DiT-based video generation model, MiraDiT. The experimental results on MiraBench demonstrate the superiority of MiraData, especially in motion strength.

【Paper】 > 【Github_Code】 > 【Project】 > 【中文解读,待续】
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述

【CVPR 2024】VBench : Comprehensive Benchmark Suite for Video Generative Models

Authors: Ziqi Huang, Yinan He, Jiashuo Yu, Fan Zhang, Chenyang Si, Yuming Jiang, Yuanhan Zhang, Tianxing Wu, Qingyang Jin, Nattapol Chanpaisit, Yaohui Wang, Xinyuan Chen, Limin Wang, Dahua Lin, Yu Qiao, Ziwei Liu

Abstract Video generation has witnessed significant advancements, yet evaluating these models remains a challenge. A comprehensive evaluation benchmark for video generation is indispensable for two reasons: 1) Existing metrics do not fully align with human perceptions; 2) An ideal evaluation system should provide insights to inform future developments of video generation. To this end, we present VBench, a comprehensive benchmark suite that dissects "video generation quality" into specific, hierarchical, and disentangled dimensions, each with tailored prompts and evaluation methods. VBench has three appealing properties: 1) Comprehensive Dimensions: VBench comprises 16 dimensions in video generation (e.g., subject identity inconsistency, motion smoothness, temporal flickering, and spatial relationship, etc). The evaluation metrics with fine-grained levels reveal individual models' strengths and weaknesses. 2) Human Alignment: We also provide a dataset of human preference annotations to validate our benchmarks' alignment with human perception, for each evaluation dimension respectively. 3) Valuable Insights: We look into current models' ability across various evaluation dimensions, and various content types. We also investigate the gaps between video and image generation models. We will open-source VBench, including all prompts, evaluation methods, generated videos, and human preference annotations, and also include more video generation models in VBench to drive forward the field of video generation.

【Paper】 > 【Github_Code】 > 【Project】 > 【中文解读】在这里插入图片描述
在这里插入图片描述

【arxiv 2024】T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation

Authors: Kaiyue Sun, Kaiyi Huang, Xian Liu, Yue Wu, Zihan Xu, Zhenguo Li, Xihui Liu

Abstract Text-to-video (T2V) generation models have advanced significantly, yet their ability to compose different objects, attributes, actions, and motions into a video remains unexplored. Previous text-to-video benchmarks also neglect this important ability for evaluation. In this work, we conduct the first systematic study on compositional text-to-video generation. We propose T2V-CompBench, the first benchmark tailored for compositional text-to-video generation. T2V-CompBench encompasses diverse aspects of compositionality, including consistent attribute binding, dynamic attribute binding, spatial relationships, motion binding, action binding, object interactions, and generative numeracy. We further carefully design evaluation metrics of MLLM-based metrics, detection-based metrics, and tracking-based metrics, which can better reflect the compositional text-to-video generation quality of seven proposed categories with 700 text prompts. The effectiveness of the proposed metrics is verified by correlation with human evaluations. We also benchmark various text-to-video generative models and conduct in-depth analysis across different models and different compositional categories. We find that compositional text-to-video generation is highly challenging for current models, and we hope that our attempt will shed light on future research in this direction.

【Paper】 > 【Github_Code】 > 【Project】 > 【中文解读】
在这里插入图片描述
在这里插入图片描述

【arxiv 2024】Latte: Latent Diffusion Transformer for Video Generation

Authors: Xin Ma, Yaohui Wang, Gengyun Jia, Xinyuan Chen, Ziwei Liu, Yuan-Fang Li, Cunjian Chen, Yu Qiao

Abstract We propose a novel Latent Diffusion Transformer, namely Latte, for video generation. Latte first extracts spatio-temporal tokens from input videos and then adopts a series of Transformer blocks to model video distribution in the latent space. In order to model a substantial number of tokens extracted from videos, four efficient variants are introduced from the perspective of decomposing the spatial and temporal dimensions of input videos. To improve the quality of generated videos, we determine the best practices of Latte through rigorous experimental analysis, including video clip patch embedding, model variants, timestep-class information injection, temporal positional embedding, and learning strategies. Our comprehensive evaluation demonstrates that Latte achieves state-of-the-art performance across four standard video generation datasets, i.e., FaceForensics, SkyTimelapse, UCF101, and Taichi-HD. In addition, we extend Latte to text-to-video generation (T2V) task, where Latte achieves comparable results compared to recent T2V models. We strongly believe that Latte provides valuable insights for future research on incorporating Transformers into diffusion models for video generation.

【Paper】 > 【Github_Code】 > 【Project】 > 【中文解读,待续】

【arxiv 2024】xxx

Authors:

Abstract

【Paper】 > 【Github_Code】 > 【Project】 > 【中文解读,待续】

【arxiv 2024】xxx

Authors:

Abstract

【Paper】 > 【Github_Code】 > 【Project】 > 【中文解读,待续】

【arxiv 2024】xxx

Authors:

Abstract

【Paper】 > 【Github_Code】 > 【Project】 > 【中文解读,待续】

【arxiv 2024】xxx

Authors:

Abstract

【Paper】 > 【Github_Code】 > 【Project】 > 【中文解读,待续】

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.rhkb.cn/news/385086.html

如若内容造成侵权/违法违规/事实不符,请联系长河编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

微信小程序支付流程

前端需要做的事情: 生成平台订单:前端调用接口,向后端传递购买的商品信息、收货人信息,(后端生成平台订单,返回订单编号)获取预付单信息:将订单编号发送给后端后,&#x…

我在Vscode学Java泛型(泛型设计、擦除、通配符)

Java泛型 一、泛型 Generics的意义1.1 在没有泛型的时候,集合如何存储数据1.2 引入泛型的好处1.3 注意事项1.3.1 泛型不支持基本数据类型1.3.2 当泛型指定类型,传递数据时可传入该类及其子类类型1.3.3 如果不写泛型,类型默认是Object 二、泛型…

.net 连接达梦数据库开发环境部署

.net 开发环境部署 1. 环境准备 测试工具 Visual Studio2022 数据库版本 dm8 2. 搭建过程 1 )创建新项目 2 )选择创建空项目 3 )配置新项目 4 )右键 DM1 新建一个项 5 )加 载 驱 动 , 新 建 …

SQL labs-SQL注入(五,使用sqlmap进行cookie注入)

本文仅作为学习参考使用,本文作者对任何使用本文进行渗透攻击破坏不负任何责任。 引言: Cookie 是一些数据, 存储于你电脑上的文本文件中。当 web 服务器向浏览器发送 web 页面时,在连接关闭后,服务端不会记录用户的信息。Cookie…

加速决策过程:企业级爬虫平台的实时数据分析

摘要 在当今数据驱动的商业环境中,企业如何才能在海量信息中迅速做出精准决策?本文将探讨企业级爬虫平台如何通过实时数据分析加速决策过程,实现数据到决策的无缝衔接。我们聚焦于技术如何赋能企业,提升数据处理效率,…

RK3568 Linux 平台开发系列讲解(内核入门篇):从内核的角度看外设芯片的驱动

在嵌入式 Linux 开发中,外设芯片的驱动是实现操作系统与硬件之间交互的关键环节。对于 RK3568 这样的处理器平台,理解如何从内核的角度构建和管理外设芯片的驱动程序至关重要。 1. 外设驱动的基础概念 外设驱动(Device Driver)是操作系统与硬件设备之间的桥梁。它负责控…

mysql面试(六)

前言 本章节详细讲解了一下mysql执行计划相关的属性释义,以及不同sql所出现的不同效果 执行计划 一条查询语句经过mysql查询优化器的各种基于成本和各种规则优化之后,会生成一个所谓的 执行计划,这个执行计划展示了这条查询语句具体查询方…

【计算机网络】TCP协议详解

欢迎来到 破晓的历程的 博客 ⛺️不负时光,不负己✈️ 文章目录 1、引言2、udp和tcp协议的异同3、tcp服务器3.1、接口认识3.2、服务器设计 4、tcp客户端4.1、客户端设计4.2、说明 5、再研Tcp服务端5.1、多进程版5.2、多线程版 5、守护进程化5.1、什么是守护进程5.2…

基于 HTML+ECharts 实现智慧安防数据可视化大屏(含源码)

构建智慧安防数据可视化大屏:基于 HTML 和 ECharts 的实现 随着科技的不断进步,智慧安防系统已经成为保障公共安全的重要工具。通过数据可视化,安防管理人员可以实时监控关键区域的安全状况、人员流动以及设备状态,从而提高应急响…

【element ui】input输入控件绑定粘贴事件,从 Excel 复制的数据粘贴到输入框(el-input)时自动转换为逗号分隔的数据

目录 1、需求2、实现思路:3、控件绑定粘贴事件事件修饰符说明: 4、代码实现🚀写在最后 1、需求 在 Vue 2 和 Element UI 中,要实现从 Excel 复制空格分隔的数据,并在粘贴到输入框(el-input)时自动转换为逗号分隔的数据…

代码随想录训练第三十天|01背包理论基础、01背包、LeetCode416.分割等和子集

文章目录 01背包理论基础01背包二维dp数组01背包一维dp数组(滚动数组) 416.分割等和子集思路 01背包理论基础 背包问题的理论基础重中之重是01背包,一定要理解透! leetcode上没有纯01背包的问题,都是01背包应用方面的题目,也就是…

领略诗词之妙,发觉生活之美。

文章目录 引言落霞与孤鹜齐飞,秋水共长天一色。野渡无人舟自横。吹灭读书灯,一身都是月。我醉欲眠卿且去,明朝有意抱琴来。赌书消得泼茶香,当时只道是寻常。月上柳梢头,人约黄昏后。最是人间留不住,朱颜辞镜花辞树。山中何事?松花酿酒,春水煎茶。似此星辰非昨夜,为谁风…

Github 2024-07-26开源项目日报 Top10

根据Github Trendings的统计,今日(2024-07-26统计)共有10个项目上榜。根据开发语言中项目的数量,汇总情况如下: 开发语言项目数量Java项目2TypeScript项目2C++项目2HTML项目1Python项目1C#项目1Lua项目1JavaScript项目1Vue项目1C项目1免费编程学习平台:freeCodeCamp.org 创…

vuepress搭建个人文档

vuepress搭建个人文档 文章目录 vuepress搭建个人文档前言一、VuePress了解二、vuepress-reco主题个人博客搭建三、vuepress博客部署四、vuepress后续补充 总结 vuepress搭建个人文档 所属目录&#xff1a;项目研究创建时间&#xff1a;2024/7/23作者&#xff1a;星云<Xing…

c++语言学习注意事项

当学习C语言时&#xff0c;有几个重要的注意事项可以帮助初学者更有效地掌握这门强大的编程语言&#xff1a; 1. 理解基本概念和语法 C 是一门复杂且功能强大的编程语言&#xff0c;因此理解其基本概念和语法至关重要。初学者应该重点掌握以下几个方面&#xff1a; 基本语法和…

WordPress原创插件:自定义文章标题颜色

插件设置截图 文章编辑时&#xff0c;右边会出现一个标题颜色设置&#xff0c;可以设置为任何颜色 更新记录&#xff1a;从输入颜色css代码&#xff0c;改为颜色选择器&#xff0c;更方便&#xff01; 插件免费下载 https://download.csdn.net/download/huayula/89585192…

网络基础之(11)优秀学习资料

网络基础之(11)优秀学习资料 Author&#xff1a;Once Day Date: 2024年7月27日 漫漫长路&#xff0c;有人对你笑过嘛… 全系列文档可参考专栏&#xff1a;通信网络技术_Once-Day的博客-CSDN博客。 参考文档&#xff1a; 网络工程初学者的学习方法及成长之路&#xff08;红…

02、爬虫数据解析-Re解析

数据解析的目的是不拿到页面的全部内容&#xff0c;只拿到部分我们想要的内容内容。 Re解析就是正则解析&#xff0c;效率高准确性高。学习本节内容前需要学会基础的正则表达式。 一、正则匹配规则 1、常用元字符 . 匹配除换行符以外的字符 \w 匹配字母或数字或下划…

我当初装anaconda的时候浏览器默认下载路径在d盘,现在想用ipython的时候用不了了咋办?...

点击上方“Python爬虫与数据挖掘”&#xff0c;进行关注 回复“书籍”即可获赠Python从入门到进阶共10本电子书 今 日 鸡 汤 红豆生南国&#xff0c;春来发几枝。 大家好&#xff0c;我是Python进阶者。 一、前言 前几天在Python白银交流群【041】问了一个Python环境的问题&am…