【读论文】【精读】3D Gaussian Splatting for Real-Time Radiance Field Rendering

文章目录

    • 1. What:
    • 2. Why:
    • 3. How:
      • 3.1 Real-time rendering
      • 3.2 Adaptive Control of Gaussians
      • 3.3 Differentiable 3D Gaussian splatting
    • 4. Self-thoughts

1. What:

What kind of thing is this article going to do (from the abstract and conclusion, try to summarize it in one sentence)

To simultaneously satisfy the requirements of efficiency and quality, this article begins by establishing a foundation with sparse points using 3D Gaussian distributions to preserve desirable space. It then progresses to optimizing anisotropic covariance to achieve an accurate representation. Lastly, it introduces a cutting-edge, visibility-aware rendering algorithm designed for rapid processing, thereby achieving state-of-the-art results in the field.

2. Why:

Under what conditions or needs this research plan was proposed (Intro), what problems/deficiencies should be solved at the core, what others have done, and what are the innovation points? (From Introduction and related work)

Maybe contain Background, Question, Others, Innovation:

Three aspects of related work can explain this question.

  1. Traditional reconstructions such as SfM and MVS need to re-project and
    blend the input images into the novel view camera, and use the
    geometry to guide this re-projection(From 2D to 3D).

    Sad: Cannot completely recover from unreconstructed regions, or from “over-reconstruction”, when MVS generates inexistent geometry.

  2. Neural Rendering and Radiance Fields

    Neural rendering represents a broader category of techniques that leverage deep learning for image synthesis, while radiance field is a specific technique within neural rendering focused on the scene representation of light and color in 3D spaces.

  • Deep Learning was mainly used on MVS-based geometry before, which is also its major drawback.

  • Nerf is along the way of volumetric representation, which introduced positional encoding and importance sampling.

  • Faster training methods focus on the use of spatial data structures to store (neural) features that are subsequently interpolated during volumetric ray-marching, different encodings, and MLP capacity.

  • Today, notable works include InstantNGP and Plenoxels both rely on Spherical Harmonics.

    Understand Spherical Harmonics as a set of basic functions to fit a geometry in a 3D spherical coordinate system.

    球谐函数介绍(Spherical Harmonics) - 知乎 (zhihu.com)

  1. Point-Based Rendering and Radiance Fields
  • The methods in human performance capture inspired the choice of 3D Gaussians as scene representation.
  • Point-based and spherical rendering is achieved before.

3. How:

请添加图片描述

Through the Gradient Flow in this paper’s pipeline, we are trying to connect Part4, 5, and 6 in this paper.

Firstly, start from the loss function, which is combined by a L 1 {\mathcal L}_{1} L1 loss and a S S I M SSIM SSIM index, just as shown below:

L = ( 1 − λ ) L 1 + λ L D − S S I M . (1) {\mathcal L}=(1-\lambda){\mathcal L}_{1}+\lambda{\mathcal L}_{\mathrm{D-SSIM}}.\tag{1} L=(1λ)L1+λLDSSIM.(1)

It found a relation between the actual image and the rendering image. So to finish the optimization, we need to dive into the process of rendering. From the chapter on related work, we know Point-based α \alpha α-blending and NeRF-style volumetric rendering share essentially the same image formation model. That is

C = ∑ i = 1 N T i ( 1 − exp ⁡ ( − σ i δ i ) ) c i w i t h T i = exp ⁡ ( − ∑ j = 1 i − 1 σ j δ j ) . (2) C=\sum_{i=1}^{N}T_{i}(1-\exp(-\sigma_{i}\delta_{i}))c_{i}\quad\mathrm{with}\quad T_{i}=\exp\left(-\sum_{j=1}^{i-1}\sigma_{j}\delta_{j}\right).\tag{2} C=i=1NTi(1exp(σiδi))ciwithTi=exp(j=1i1σjδj).(2)

And this paper actually uses a typical neural point-based approach just like (2), which can be represented as:

C = ∑ i ∈ N c i α i ∏ j = 1 i − 1 ( 1 − α j ) (3) C=\sum_{i\in N}c_{i}\alpha_{i}\prod_{j=1}^{i-1}(1-\alpha_{j}) \tag{3} C=iNciαij=1i1(1αj)(3)

From this formulation, we can know what the representation of volume should contain the information of color c c c and transparency α \alpha α. These are attached to the gaussian, where Spherical Harmonics was used to represent color, just like Plenoxels. The other attributes used are the position and covariance matrix. So, now we have introduced the four attributes to represent the scene, that is positions 𝑝, 𝛼, covariance Σ, and SH coefficients representing color 𝑐 of each Gaussian.
After knowing the basic elements we need to use, now let’s work backward, starting with rendering, which was addressed in the author’s previous paper.

3.1 Real-time rendering

This method is independent of the propagation of gradients but is critical for real-time performance, which was published in the author’s paper before.
在这里插入图片描述

In the previous game, someone had tried to model the world in ellipsoid and render it. This is the same as the render process of Gaussian splatting. But the latter uses lots of techniques in the utilization of threads and GPU.

  • Firstly, it starts by splitting the screen into 16×16 tiles and then proceeds to cull 3D Gaussians against the view frustum and each tile, only keeping Gaussians with a 99% confidence interval intersecting the view frustum.
  • Then instantiate each Gaussian according to the number of tiles they overlap and assign each instance a key that combines view space depth and tile ID.
  • Then sort Gaussians based on these keys using a single fast GPU Radix sort.
  • Finally, launching one thread block for each tile, for a given pixel, accumulate color and transparency values by traversing the lists front-to-back, until α \alpha α goes to one.

3.2 Adaptive Control of Gaussians

In the process of fitting gaussian to the scene, we should utilize the number and volume of gaussian to strengthen the representation of the scene. It contained two methods named clone and split, as shown below.

在这里插入图片描述

These were judged by the view-space positional gradients. Both under-reconstruction and over-construction have large view-space positional gradients. We will clone or split the gaussian according to different conditions.

3.3 Differentiable 3D Gaussian splatting

We have known the process of rendering and control of gaussian. Finally, we will talk about how to backward the gradients to where we can optimize. This is mainly about the processing of Gaussian function.

The basic simplified formulation of 3D Gaussain can be represented as:

G ( x ) = e − 1 2 ( x ) T Σ − 1 ( x ) . (4) G(x)=e^{-\frac{1}{2}(x)^{T}\Sigma^{-1}(x)}.\tag{4} G(x)=e21(x)TΣ1(x).(4)

We will use α \alpha α-blending to combine it to generate the rendering picture, so that we can calculate the loss function and finish the optimization. So now we need to know how to optimize and calculate the gradients of Gaussian.

When rasterizing, the three-dimensional scene needs to be transformed into a two-dimensional space. The author hopes that the 3D Gaussian will maintain its distribution during the transformation (otherwise, if the raster finish has nothing to do with Gaussian, all the efforts will be in vain). So we should choose a method to transfer the covariance matrix to camera coordinate without change the affine relation. That is

Σ ′ = J W Σ W T J T , (5) \Sigma'=JW\Sigma W^{T}J^{T},\tag{5} Σ=JWΣWTJT,(5)

where J J J is the Jacobian of the affine approximation of the projective transformation.

Another problem is that the covariance matrix must be semi-definite. So we use a scaling matrix 𝑆 and rotation matrix 𝑅 to assure it. That is

Σ = R S S T R T (6) \Sigma=RSS^{T}R^{T}\tag{6} Σ=RSSTRT(6)

And then we can use a 3D vector 𝑠 for scaling and a quaternion 𝑞 to represent rotation. The gradients will backward to them. These are the whole process of optimization.

4. Self-thoughts

  1. Summary of different representation
  • Explicit representation: Mesh, Point Cloud
  • Implicit representation
    • Volumetric representation: Nerf

      The density value returned by the sample points reflects whether there is geometric occupancy here.

    • Surface representation: SDF(Signed Distance Function)

      Outputs the distance to the nearest surface in the space from this point, where a positive value indicates outside the surface, and a negative value indicates inside the surface.

Refer:

[1]: 3D Gaussian Splatting:用于实时的辐射场渲染-CSDN博客

[2]: 【三维重建】3D Gaussian Splatting:实时的神经场渲染-CSDN博客

[3]: 3D Gaussian Splatting中的数学推导 - 知乎 (zhihu.com)

[4]: [NeRF坑浮沉记]3D Gaussian Splatting入门:如何表达几何 - 知乎 (zhihu.com)

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.rhkb.cn/news/276185.html

如若内容造成侵权/违法违规/事实不符,请联系长河编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

普林斯顿算法讲义(三)

原文:普林斯顿大学算法课程 译者:飞龙 协议:CC BY-NC-SA 4.0 4.2 有向图 原文:algs4.cs.princeton.edu/42digraph 译者:飞龙 协议:CC BY-NC-SA 4.0 有向图。 一个有向图(或有向图)是…

Docker常见指令

1.docker search mysql :从docker镜像仓库搜索和mysql有关的镜像 docker search mysql 2.docker pull mysql :从docker仓库拉取mysql镜像 docker pull mysql 3.docker run mysql :启动mysql镜像 docker run mysql 4.docker ps &#xff…

Spring Boot 中@Scheduled是单线程还是多线程?

在开发Spring Boot应用程序时,定时任务是一项常见的需求。Spring Boot提供了Scheduled注解,可用于将方法标记为定时任务,并在预定的时间间隔内执行。那么Scheduled注解的执行方式是单线程执行,还是多线程执行?Schedule…

GPT-SoVITS开源音色克隆框架的训练与调试

GPT-SoVITS开源框架的报错与调试 遇到的问题解决办法 GPT-SoVITS是一款创新的跨语言音色克隆工具,同时也是一个非常棒的少样本中文声音克隆项目。 它是是一个开源的TTS项目,只需要1分钟的音频文件就可以克隆声音,支持将汉语、英语、日语三种…

HNU计算机系统·汇编进阶

知识回顾: 寻址: 其中,比例因子S,只能是1,2,4,8中的数,这是因为在LEA的独立电路中使用移位寄存器 上节课的补充: mov部分: mov value , %eax mov $value , %eax 第一条…

Day34:安全开发-JavaEE应用反射机制攻击链类对象成员变量方法构造方法

目录 Java-反射-Class对象类获取 Java-反射-Field成员变量类获取 Java-反射-Method成员方法类获取 Java-反射-Constructor构造方法类获取 Java-反射-不安全命令执行&反序列化链构造 思维导图 Java知识点 功能:数据库操作,文件操作,…

Hadoop伪分布式配置--没有DataNode或NameNode

一、原因分析 重复格式化NameNode 二、解决方法 1、输入格式化NameNode命令,找到data和name存放位置 ./bin/hdfs namenode -format 2、删除data或name(没有哪个删哪个) sudo rm -rf data 3、重新格式化NameNode 4、重新启动即可。

Linux搭建我的世界(MC)整合包服务器,All the Mods 9(ATM9)整合包开服教程

Linux使用MCSM面板搭建我的世界(Minecraft)整合包服务器,MC开服教程,All the Mods 9(ATM9)整合包搭建服务器的教程。 本教程使用Docker来运行mc服,可以方便切换不同Java版本,方便安装多个mc服版本。 视频教程:https:…

MySQL 数据库 下载地址 国内阿里云站点

mysql安装包下载_开源镜像站-阿里云 以 MySQL 5.7 为例 mysql-MySQL-5.7安装包下载_开源镜像站-阿里云

2024年AI辅助研发:技术革新引领研发新纪元

文章目录 📑前言一、AI辅助研发的技术进展二、行业应用案例三、面临的挑战与机遇四、未来趋势预测全篇总结 📑前言 随着科技的飞速发展,人工智能(AI)已逐渐成为推动社会进步的重要力量。特别是在研发领域,A…

AIGC: 2 语音转换新纪元-Whisper技术在全球客服领域的创新运用

背景 现实世界,人跟人的沟通相当一部分是语音沟通,比如打电话,聊天中发送语音消息。 而在程序的世界,大部分以处理字符串为主。 所以,把语音转换成文字就成为了编程世界非常普遍的需求。 Whisper 是由 OpenAI 开发…

【Java基础】IO流(二)字符集知识

目录 字符集知识 1、GBK字符集 2、Unicode字符集(万国码) 3、乱码 4、Java中编码和解码的方法 字符集知识 字符(Character):在计算机和电信技术中,一个字符是一个单位的字形、类字形单位或符号的基本信…

手写Mybatis自动填充插件

目录 一、Mybatis插件简介🥙二、工程创建及前期准备工作🥫实现代码配置文件 三、插件核心代码实现🍗四、测试🥓 一、Mybatis插件简介🥙 Mybatis插件运行原理及自定义插件_简述mybatis的插件运行原理,以及如何编写一个…

macOS Ventura 13.6.5 (22G621) Boot ISO 原版可引导镜像下载

macOS Ventura 13.6.5 (22G621) Boot ISO 原版可引导镜像下载 3 月 8 日凌晨,macOS Sonoma 14.4 发布,同时带来了 macOS Ventru 13.6.5 和 macOS Monterey 12.7.4 安全更新。 macOS Ventura 13.6 及更新版本,如无特殊说明皆为安全更新&…

springboot学习(八十六) springboot使用graalvm编译native程序

一、windows环境下 1.下载graalvm的jdk https://injdk.cn/ 下载windows版本 配置java环境变量,配置过程略 2.下载visual Studio Build Tools 下载地址:https://aka.ms/vs/17/release/vs_BuildTools.exe 安装后选择组件: 其中windows S…

ChatGPT-Next-Web SSRF漏洞+XSS漏洞复现(CVE-2023-49785)

0x01 产品简介 ChatGPT-Next-Web 是一种基于 OpenAI 的 GPT-3.5 、GPT-4.0语言模型的产品。它是设计用于 Web 环境中的聊天机器人,旨在为用户提供自然语言交互和智能对话的能力。 0x02 漏洞概述 2024年3月,互联网上披露CVE-2023-49785 ChatGPT-Next-Web SSRF/XSS 漏洞,未经…

信号与系统学习笔记——信号的分类

目录 一、确定与随机 二、连续与离散 三、周期与非周期 判断是否为周期函数 离散信号的周期 结论 四、能量与功率 定义 结论 五、因果与反因果 六、阶跃函数 定义 性质 七、冲激函数 定义 重要关系 作用 一、确定与随机 确定信号:可以确定时间函数…

Arcgis新建位置分配求解最佳商店位置

背景 借用Arcgis帮助文档中的说明:在本练习中,您将为连锁零售店选择可以获得最大业务量的商店位置。主要目标是要将商店定位在人口集中地区附近,因为这种区域对商店的需求量较大。设立这一目标的前提是假设人们往往更多光顾附近的商店,而对于距离较远的商店则较少光顾。您…

【Redis】Redis常用命令之Hash

1.hset:设置hash中指定的字段(field)的值(value)。 HSET key field value [field value ...]时间复杂度:插⼊⼀组field为O(1),插⼊N组field为O(N)。 返回值:添加的字段的个数。 2.hget&#xf…

MySQL删除数据 文件大小不变的原因以及处理空洞问题

数据删除流程 InnoDB 里的数据都是用 B 树的结构组织的。 假设,我们要删掉 R4 这个记录,InnoDB 引擎只会把 R4 这个记录标记为删除。如果之后要再插入一个 ID 在 300 和 600 之间的记录时,可能会复用这个位置。但是,磁盘文件的大…