Pixel-GS:用于3D高斯溅射的具有像素感知梯度的密度控制

Pixel-GS: Density Control with Pixel-aware Gradient for 3D Gaussian Splatting
Pixel-GS:用于3D高斯溅射的具有像素感知梯度的密度控制

Zheng Zhang  Wenbo Hu†  Yixing Lao  
老宜兴市郑张文博胡 †
Tong He  Hengshuang Zhao†
赵同和恒双 †1122113311
Abstract 摘要         [2403.15530] Pixel-GS: Density Control with Pixel-aware Gradient for 3D Gaussian Splatting

3D Gaussian Splatting (3DGS) has demonstrated impressive novel view synthesis results while advancing real-time rendering performance. However, its efficacy heavily relies on the quality of the initial point cloud, leading to blurring and needle-like artifacts in regions with inadequate initializing points. This issue is mainly due to the point cloud growth condition, which only considers the average gradient magnitude of points from observable views, thereby failing to grow for large Gaussians that are observable for many viewpoints while many of them are only covered in the boundaries. To address this, we introduce Pixel-GS, a novel approach to take into account the number of pixels covered by the Gaussian in each view during the computation of the growth condition. We regard the covered pixel numbers as the weights to dynamically average the gradients from different views, such that the growth of large Gaussians can be prompted. As a result, points within the areas with insufficient initializing points can be grown more effectively, leading to a more accurate and detailed reconstruction. In addition, we propose a simple yet effective strategy to scale the gradient field according to the distance to the camera, to suppress the growth of floaters near the camera. Extensive qualitative and quantitative experiments confirm that our method achieves state-of-the-art rendering quality while maintaining real-time speeds, outperforming on challenging datasets such as Mip-NeRF 360 and Tanks & Temples. Code and demo are available at: https://pixelgs.github.io
3D高斯溅射(3DGS)已经展示了令人印象深刻的新颖的视图合成结果,同时提高了实时渲染性能。然而,它的有效性严重依赖于初始点云的质量,导致在初始化点不足的区域中出现模糊和针状伪影。这个问题主要是由于点云增长条件,它只考虑来自可观察视图的点的平均梯度幅度,从而无法增长对于许多视点可观察的大高斯,而其中许多仅覆盖在边界中。为了解决这个问题,我们引入了Pixel-GS,这是一种新的方法,可以在计算生长条件的过程中考虑每个视图中高斯覆盖的像素数量。我们将覆盖像素数作为权重,动态平均来自不同视图的梯度,从而可以促进大高斯的增长。 结果,可以更有效地生长初始化点不足的区域内的点,从而导致更准确和详细的重建。此外,我们提出了一个简单而有效的策略,根据到相机的距离来缩放梯度场,以抑制相机附近漂浮物的增长。大量的定性和定量实验证实,我们的方法实现了最先进的渲染质量,同时保持实时速度,在具有挑战性的数据集,如Mip-NeRF 360和坦克和寺庙。代码和演示可在:https://pixelgs.github。io

Keywords: 
View Synthesis Point-based Radiance Field Read-time Rendering 3D Gaussian Splatting Adaptive Density Control
关键词:视图合成基于点的辐射场实时绘制三维高斯溅射自适应密度控制

††Corresponding author.
† 通讯作者。

1Introduction 1介绍

Novel View Synthesis (NVS) is a fundamental problem in computer vision and graphics. Recently, 3D Gaussian Splatting (3DGS) [21] has drawn increasing attention for its explicit point-based representation of 3D scenes and real-time rendering performance.
新视图合成是计算机视觉和图形学中的一个基本问题。最近,3D高斯溅射(3DGS)[ 21]因其显式的基于点的3D场景表示和实时渲染性能而受到越来越多的关注。

Refer to caption

Refer to caption

Refer to caption

Refer to caption

Refer to caption

Refer to caption

Refer to caption

Refer to caption

(a) Ground Truth (a)地面实况 (b) 3DGS∗ (original threshold)
(b)3DGS(原始阈值)
(c) 3DGS∗ (lower threshold)
(c)3DGS(低阈值)
(d) Pixel-GS (Ours) (d)Pixel-GS(我们的)

To convert b to d, adjust densification from ∑‖𝐠‖∑1>�pos to ∑pixel⋅‖𝐠‖∑pixel>�pos.
为了将B转换为d,将致密化从 ∑‖𝐠‖∑1>�pos 调整为 ∑pixel⋅‖𝐠‖∑pixel>�pos 。

Figure 1: Our Pixel-GS effectively grows points in areas with insufficient initializing points (a), leading to a more accurate and detailed reconstruction (d). In contrast, 3D Gaussian Splatting (3DGS) suffers from blurring and needle-like artifacts in these areas, even with a lower threshold of splitting and cloning to encourage more grown points (c). The rendering quality (in LPIPS ↓) and memory consumption are shown in the results. 3DGS∗ is our retrained 3DGS model with better performance.
图一:我们的Pixel-GS有效地在初始化点不足的区域中增加点(a),从而实现更准确和详细的重建(d)。相比之下,3D高斯溅射(3DGS)在这些区域中遭受模糊和针状伪影,即使具有较低的分裂和克隆阈值以鼓励更多的生长点(c)。渲染质量(以LPIPS ↓ 为单位)和内存消耗将显示在结果中。3DGS ∗ 是我们重新训练的3DGS模型,具有更好的性能。

3DGS represents the scene as a set of points associated with geometry (Gaussian scales) and appearance (opacities and colors) attributes. These attributes can be effectively learned by the differentiable rendering, while the optimization of the point cloud’s density is challenging. 3DGS carefully initializes the point cloud using the sparse points produced by the Structure from Motion (SfM) process and presents an adaptive density control mechanism to split or clone the points during the optimization process. However, this mechanism relies heavily on the initial point cloud’s quality and cannot effectively grow points in areas where the initial point cloud is sparse, resulting in blurry or needle-like artifacts in the synthesized images. In practice, the initial point cloud from SfM unavoidably suffers from insufficient points in areas with repetitive textures and few observations. As shown in the first and second columns of Figure 1, the blurry regions in the RGB images are well aligned with the areas where few points are initialized, and 3DGS fails to generate enough points in these areas.
3DGS将场景表示为与几何体(高斯比例)和外观(不透明度和颜色)属性相关联的一组点。这些属性可以通过可微绘制有效地学习,而点云密度的优化是具有挑战性的。3DGS使用由运动恢复结构(SfM)过程产生的稀疏点仔细地对点云进行优化,并提出了一种自适应密度控制机制来在优化过程中分割或克隆点。然而,这种机制严重依赖于初始点云的质量,并且不能有效地在初始点云稀疏的区域中生长点,从而导致合成图像中的模糊或针状伪影。在实践中,来自SfM的初始点云必然会在具有重复纹理和很少观测的区域中遭受点不足的问题。 如图1的第一列和第二列所示,RGB图像中的模糊区域与初始化很少的点的区域对齐良好,并且3DGS无法在这些区域中生成足够的点。

In essence, this issue is mainly attributed to the condition of when to split or clone a point. 3DGS decides it by checking whether the average gradient magnitude of the points in the Normalized Device Coordinates (NDC) is larger than a threshold. The magnitude of the gradient is equally averaged across different viewpoints, and the threshold is fixed. Large Gaussians are usually visible in many viewpoints, and the size of their projection area varies significantly across views, leading to the number of pixels involved in the gradient calculation varies significantly. According to the mathematical form of the Gaussian distribution, a few pixels near the center of the projected Gaussian contribute much more to the gradient than the pixels far away from the center. Larger Gaussians often have many viewpoints where the area near the projected center point is not within the screen space, thereby lowering the average gradient, making them difficult to split or clone. This issue cannot be solved by merely lowering the threshold, as it would more likely encourage growing points in areas with sufficient points, as shown in the third column of Figure 1, still leaving blurry artifacts in the areas with insufficient points.
从本质上讲,这个问题主要归因于何时分割或克隆一个点的条件。3DGS通过检查归一化设备坐标(NDC)中的点的平均梯度幅度是否大于阈值来决定它。梯度的大小在不同视点之间相等地平均,并且阈值是固定的。大高斯通常在许多视点中可见,并且它们的投影区域的大小在视图之间变化很大,导致梯度计算中涉及的像素数量变化很大。根据高斯分布的数学形式,靠近投影高斯中心的几个像素比远离中心的像素对梯度的贡献大得多。较大的高斯曲线通常有许多视点,其中投影中心点附近的区域不在屏幕空间内,从而降低了平均梯度,使其难以分割或克隆。 这个问题不能仅仅通过降低阈值来解决,因为它更可能鼓励在具有足够点的区域中的增长点,如图1的第三列所示,仍然在具有不足点的区域中留下模糊伪影。

In this paper, we propose to consider the calculation of the mean gradient magnitude of points from the perspective of pixels. During the computation of the average gradient magnitude for a Gaussian, we take into account the number of pixels covered by the Gaussian in each view by replacing the averaging across views with the weighted average across views by the number of covered pixels. The motivation behind this is to amplify the gradient contribution of large Gaussians while leaving the conditions for splitting or cloning small Gaussians unchanged, such that we can effectively grow points in the areas with large Gaussians. In the meanwhile, for small Gaussians, the weighted average only slightly impacts the final gradient since the variation of covered pixel numbers across different viewpoints is minimal. Therefore, the final number of points in areas with sufficient initial points would not change significantly to avoid unnecessary memory consumption and processing time, but importantly, points in areas with insufficient initial points can be effectively grown to reconstruct fine-grained details. As shown in the last column of Figure 1, our method effectively grows points in areas with insufficient initial points and renders high-fidelity images, while directly lowering the threshold in 3DGS to maintain a similar number of final points fails to render blurring-free results. Besides, we observe that “floaters” tend to appear near the camera, which are points that are not well aligned with the scene geometry and are not contributing to the final rendering. To this end, we propose to scale the gradient field in NDC space according to the depth value of the points, thereby suppressing the growth of “floaters” near the camera.
在本文中,我们建议考虑从像素的角度计算点的平均梯度幅度。在高斯平均梯度幅度的计算过程中,我们考虑到每个视图中高斯覆盖的像素数量,通过将视图间的平均值替换为覆盖像素数量的视图间的加权平均值。这背后的动机是放大大高斯的梯度贡献,同时保持分裂或克隆小高斯的条件不变,这样我们就可以在具有大高斯的区域中有效地增长点。同时,对于小高斯,加权平均值仅轻微影响最终梯度,因为不同视点之间覆盖像素数量的变化是最小的。 因此,具有足够初始点的区域中的点的最终数目不会显著改变以避免不必要的存储器消耗和处理时间,但重要的是,具有不足够初始点的区域中的点可以有效地增长以重构细粒度细节。如图1的最后一列所示,我们的方法有效地在初始点不足的区域中增加点并渲染高保真图像,而直接降低3DGS中的阈值以保持类似数量的最终点无法渲染无模糊的结果。此外,我们观察到,“浮动”往往出现在相机附近,这是点,没有很好地与场景几何对齐,并没有贡献的最终渲染。为此,我们建议根据点的深度值来缩放NDC空间中的梯度场,从而抑制相机附近的“漂浮物”的增长。

To evaluate the effectiveness of our method, we conducted extensive experiments on the challenging Mip-NeRF 360 [3] and Tanks & Temples [22] datasets. Experimental results validate that our method consistently outperforms the original 3DGS, both quantitatively (17.8% improvement in terms of LPIPS) and qualitatively. We also demonstrate that our method is more robust to the sparsity of the initial point cloud by manually discarding a certain proportion (up to 99%) of the initial SfM point clouds. In summary, we make the following contributions:
为了评估我们方法的有效性,我们对具有挑战性的Mip-NeRF 360 [ 3]和Tanks & Temples [ 22]数据集进行了广泛的实验。实验结果验证了我们的方法始终优于原来的3DGS,无论是定量(17.8%的LPIPS方面的改善)和定性。我们还证明了我们的方法是更强大的初始点云的稀疏手动丢弃一定比例(高达99%)的初始SfM点云。总之,我们做出了以下贡献:

  1. – 

    We analyzed the reason for the blurry artifacts in 3DGS and propose to optimize the number of points from the perspective of pixels, thereby enabling effectively growing points in areas with insufficient initial points.


    - 我们分析了3DGS中模糊伪影的原因,并建议从像素的角度优化点的数量,从而在初始点不足的区域中有效地增长点。
  2. – 

    We present a simple yet effective gradient scaling strategy to suppress the “floater” artifacts near the camera.


    - 我们提出了一个简单而有效的梯度缩放策略,以抑制相机附近的“浮动”伪影。
  3. – 

    Our method achieves state-of-the-art performance on the challenging Mip-NeRF 360 and Tanks & Temples datasets and is more robust to the quality of initial points.


    - 我们的方法在具有挑战性的Mip-NeRF 360和Tanks & Temples数据集上实现了最先进的性能,并且对初始点的质量更具鲁棒性。

2Related Work 2相关工作

Novel view synthesis. The task of novel view synthesis refers to the process of generating images from perspectives different from the original input viewpoints. Recently, NeRF [35] has achieved impressive results in novel view synthesis by using neural networks to approximate the radiance field and employing volumetric rendering [10, 27, 32, 33] techniques for rendering. These approaches use implicit functions (such as MLPs [35, 2, 3], feature grid-based representations [6, 13, 29, 37, 46], or feature point-based representations [21, 50]) to fit the scene’s radiance field and utilize a rendering formula for rendering. Due to the requirement to process each sampled point along a ray through an MLP to obtain its density and color, during the volume rendering, these works significantly suffer from low rendering speed. Subsequent methods [15, 41, 42, 56, 58] have refined a pre-trained NeRF into a sparse representation, thus achieving real-time rendering of NeRF. Although some advanced scene representations [6, 7, 13, 29, 25, 37, 46, 2, 3, 4, 16] have been proposed to improve one or more aspects of NeRF, such as training cost, rendering results, and rendering speed, 3D Gaussian Splatting (3DGS) [21] still draws increasing attention due to its explicit representation, high-fidelity results, and real-time rendering speed. Some subsequent works on 3DGS have further improved it from perspectives such as anti-aliasing [59, 51], reducing memory usage [12, 39, 38, 26, 36, 30], replacing spherical harmonics functions to enhance the modeling capability of high-frequency signals based on reflective surfaces [54], and modeling dynamic scenes [31, 55, 11, 49, 53, 20, 24, 17]. However, 3DGS still tends to exhibit blurring and needle-like artifacts in areas where the initial points are sparse. This is because 3DGS initializes the scale of each Gaussian based on the distance to neighboring Gaussians, making it challenging for the point cloud growth mechanism of 3DGS to generate sufficient points to accurately model these areas.
新颖的视图合成。新视角合成的任务是指从不同于原始输入视点的视角生成图像的过程。最近,NeRF [ 35]通过使用神经网络来近似辐射场并采用体积渲染[ 10,27,32,33]技术进行渲染,在新颖的视图合成中取得了令人印象深刻的结果。这些方法使用隐式函数(例如MLP [35,2,3],基于特征网格的表示[6,13,29,37,46]或基于特征点的表示[21,50])来拟合场景的辐射场并利用渲染公式进行渲染。由于体绘制过程中需要对沿着穿过MLP的射线上的每个采样点进行处理以获得其密度和颜色,因此这些工作的绘制速度明显较低。随后的方法[15,41,42,56,58]将预先训练的NeRF细化为稀疏表示,从而实现NeRF的实时渲染。 尽管已经提出了一些先进的场景表示[6,7,13,29,25,37,46,2,3,4,16]来改善NeRF的一个或多个方面,例如训练成本,渲染结果和渲染速度,但3D高斯飞溅(3DGS)[ 21]仍然由于其显式表示,高保真度结果,和实时渲染速度。3DGS的一些后续工作从抗混叠[ 59,51],减少内存使用[ 12,39,38,26,36,30],替换球谐函数以增强基于反射表面的高频信号的建模能力[ 54],以及建模动态场景[ 31,55,11,49,53、20、24、17]。然而,3DGS仍然倾向于在初始点稀疏的区域中表现出模糊和针状伪影。 这是因为3DGS基于与相邻高斯的距离来调整每个高斯的尺度,使得3DGS的点云增长机制难以生成足够的点来准确地对这些区域进行建模。

Point-based radiance field. Point-based representations (such as point clouds) commonly represent scenes using fixed-size, unstructured points, and are rendered by rasterization using GPUs [5, 43, 45]. Although this is a simple and convenient solution to address topological changes, it often results in holes or outliers, leading to artifacts during rendering. To mitigate issues of discontinuity, researchers have proposed differentiable rendering based on points, utilizing points to model local domains [14, 18, 28, 57, 50, 21, 48]. Among these approaches,  [1, 23] employs neural networks to represent point features and utilizes 2D CNNs for rendering. Point-NeRF [50] models 3D scenes using neural 3D points and presents strategies for pruning and growing points to repair common holes and outliers in point-based radiance fields. 3DGS [21] renders using a rasterization approach, which significantly speeds up the rendering process. It starts with a sparse point cloud initialization from SfM and fits each point’s influence area and color features using three-dimensional Gaussian distributions and spherical harmonics functions, respectively. To enhance the representational capability of this point-based spatial function, 3DGS introduces a density control mechanism based on the gradient of each point’s NDC (Normalized Device Coordinates) coordinates and opacity, managing the growth and elimination of the point cloud. Recent work [8] on 3DGS has improved the point cloud growth process by incorporating depth and normals to enhance the fitting ability in low-texture areas. In contrast, our Pixel-GS does not require any additional priors or information resources, e.g. depths and normals, and can directly grow points in areas with insufficient initializing points, reducing blurring and needle-like artifacts.
基于点的辐射场。基于点的表示(如点云)通常使用固定大小的非结构化点表示场景,并使用GPU通过光栅化渲染[ 5,43,45]。虽然这是解决拓扑变化的一种简单方便的解决方案,但它通常会导致空洞或离群值,从而导致渲染过程中出现伪影。为了减轻不连续性的问题,研究人员提出了基于点的可微分渲染,利用点来建模局部域[ 14,18,28,57,50,21,48]。在这些方法中,[ 1,23]采用神经网络来表示点特征,并利用2D CNN进行渲染。Point-NeRF [ 50]使用神经3D点对3D场景进行建模,并提出了修剪和生长点的策略,以修复基于点的辐射场中的常见孔和离群值。3DGS [ 21]使用光栅化方法进行渲染,这显着加快了渲染过程。 它从SfM的稀疏点云初始化开始,分别使用三维高斯分布和球谐函数拟合每个点的影响区域和颜色特征。为了增强这种基于点的空间函数的表示能力,3DGS引入了一种基于每个点的NDC(归一化设备坐标)坐标梯度和不透明度的密度控制机制,管理点云的增长和消除。最近的3DGS工作[ 8]通过结合深度和法线来提高低纹理区域的拟合能力,从而改进了点云生长过程。相比之下,我们的Pixel-GS不需要任何额外的先验或信息资源,例如深度和法线,并且可以在初始化点不足的区域中直接生长点,减少模糊和针状伪影。

Floater artifacts. Most radiance field scene representation methods encounter floater artifacts, which predominantly appear near the camera and are more severe with sparse input views. Some papers [44, 9] address floaters by introducing depth priors. NeRFshop [19] proposes an editing method to remove floaters. Mip-NeRF 360 [3] introduces a distortion loss by adding a prior that the density distribution along each ray is unimodal, effectively reducing floaters near the camera. NeRF in the Dark [34] suggests a variance loss of weights to decrease floaters. FreeNeRF [52] introduces a penalty term for the density of points close to the camera as a loss to reduce floaters near the camera. Most of these methods suppress floaters by incorporating priors through loss or editing methods, while “Floaters No More” [40] attempts to explore the fundamental reason for the occurrence of floaters and points out that floaters primarily arise because, for two regions of the same volume and shape, the number of pixels involved in the computation is proportional to the inverse square of each region’s distance from the camera. Under the same learning rate, areas close to the camera rapidly complete optimization and, after optimization, block the optimization of areas behind them, leading to an increased likelihood of floaters near the camera. Our method is inspired by this analysis and deals with floaters by a simple yet effective strategy, i.e., scaling the gradient field by the distance to the camera.
浮尸藏物。大多数辐射场场景表示方法遇到漂浮物伪影,主要出现在相机附近,并且在稀疏输入视图中更严重。一些论文[ 44,9]通过引入深度先验来解决浮动。NeRFshop [ 19]提出了一种编辑方法来删除浮动项。Mip-NeRF 360 [ 3]通过添加一个先验来引入失真损失,即沿沿着每条射线的密度分布是单峰的,从而有效地减少了相机附近的漂浮物。NeRF在黑暗中[ 34]建议方差损失的重量,以减少浮动。FreeNeRF [ 52]引入了一个惩罚项,用于接近摄像机的点的密度,作为减少摄像机附近漂浮物的损失。 这些方法中的大多数通过丢失或编辑方法合并先验来抑制浮动,而“Floaters No More”[ 40]试图探索浮动发生的根本原因,并指出浮动主要是因为,对于相同体积和形状的两个区域,计算中涉及的像素数量与每个区域到相机的距离的平方成反比。在相同的学习率下,靠近摄像头的区域快速完成优化,优化后会阻碍后面区域的优化,导致摄像头附近出现飞蚊的可能性增加。我们的方法受到这种分析的启发,并通过一种简单而有效的策略来处理漂浮物,即,通过到相机的距离来缩放梯度场。

3Method 3方法

We first review the point cloud growth condition of “Adaptive density control” in 3DGS. Then, we propose a method for calculating the average gradient magnitude in the point cloud growth condition from a pixel perspective, significantly enhancing the reconstruction capability in areas with insufficient initial points. Finally, we show that by scaling the spatial gradient field that controls point growth, floaters near the input cameras can be significantly suppressed.
首先回顾了3DGS中“自适应密度控制”的点云生长条件。然后,我们提出了一种从像素角度计算点云增长条件下的平均梯度幅值的方法,显著增强了初始点不足区域的重建能力。最后,我们表明,通过缩放空间梯度场,控制点的增长,输入摄像机附近的浮动可以显着抑制。

3.1Preliminaries

In 3D Gaussian Splatting, Gaussian � under viewpoint � generates a 2D covariance matrix Σ2​��,�=(��,���,���,���,�), and the corresponding influence range radius ��� can be determined by:
在3D高斯溅射中,视点 � 下的高斯 � 生成2D协方差矩阵 Σ2​��,�=(��,���,���,���,�) ,对应的影响范围半径 ��� 可以由下式确定:

���=3×(��,�+��,�2+(��,�+��,�2)2−(��,�​��,�−(��,�)2)),(1)

which covers 99% of the probability in the Gaussian distribution. For Gaussian �, under viewpoint �, the coordinates in the camera coordinate system are (��,��,�,��,��,�,��,��,�), and in the pixel coordinate system, they are (��,��,�,��,��,�,��,��,�). With the image width being � pixels and the height � pixels, Gaussian � participates in the calculation for viewpoint � when it simultaneously satisfies the following six conditions:
它覆盖了高斯分布中的99 % 概率。对于高斯 � ,在视点 � 下,相机坐标系中的坐标是 (��,��,�,��,��,�,��,��,�) ,并且在像素坐标系中,它们是 (��,��,�,��,��,�,��,��,�) 。在图像宽度为 � 像素且高度为 � 像素的情况下,当高斯 � 同时满足以下六个条件时,高斯 � 参与视点 � 的计算:

{���>0,��,��,�>0.2,−���−0.5<��,��,�<���+�−0.5,−���−0.5<��,��,�<���+�−0.5.(2)

In 3D Gaussian Splatting, whether a point is split or cloned is determined by the average magnitude of the gradient of the NDC coordinates for the viewpoints in which the Gaussian participates in the calculation. Specifically, for Gaussian � under viewpoint �, the NDC coordinate is (�ndc,x�,�,�ndc,y�,�,�ndc,z�,�), and the loss under viewpoint � is ��. During “Adaptive Density Control” every 100 iterations, Gaussian � participates in the calculation for �� viewpoints. The threshold �pos is set to 0.0002 in 3D Gaussian Splatting. When Gaussian satisfies
在3D高斯飞溅中,点是被分割还是克隆由高斯参与计算的视点的NDC坐标的梯度的平均幅度确定。具体地,对于视点 � 下的高斯 � ,NDC坐标是 (�ndc,x�,�,�ndc,y�,�,�ndc,z�,�) ,并且视点 � 下的损失是 �� 。在每100次迭代的“自适应密度控制”期间,高斯 � 参与 �� 视点的计算。在3D高斯溅射中,阈值 �pos 被设置为0.0002。当Gaussian满足

∑�=1��(∂��∂�ndc,x�,�)2+(∂��∂�ndc,y�,�)2��>�pos,(3)

it is transformed into two Gaussians.
它被转换成两个高斯。

Refer to caption

∑‖𝐠�‖∑1>�pos∑p�⋅�​(‖𝐠�‖)∑p�>�pos�depth𝐠�𝐠�

Figure 2:Pipeline of Pixel-GS. p� represents the number of pixels participating in the calculation for the Gaussian from this viewpoint, and 𝐠� represents the gradient of the Gaussian’s NDC coordinates. We changed the condition for deciding whether a Gaussian should split or clone from the left to the right side.
图2:Pixel-GS的流水线。 p� 表示从该视点参与高斯计算的像素数, 𝐠� 表示高斯的NDC坐标的梯度。我们改变了决定高斯是否应该从左侧分裂或克隆到右侧的条件。

3.2Pixel-aware Gradient 3.2像素感知渐变

Although the current criteria used to decide whether a point should split or clone are sufficient for appropriately distributing Gaussians in most areas, artifacts tend to occur in regions where initial points are sparse. In 3DGS, the lengths of the three axes of the ellipsoid corresponding to Gaussian � are initialized using the values calculated by:
虽然用于决定点是否应该分裂或克隆的当前标准足以在大多数区域中适当地分布高斯,但伪影往往发生在初始点稀疏的区域中。在3DGS中,对应于高斯 � 的椭圆体的三个轴的长度使用由下式计算的值来初始化:

��=(�1�)2+(�2�)2+(�3�)23,(4)

where �1�, �2�, and �3� are the distances to the three nearest points to Gaussian �, respectively. We observed that areas inadequately modeled often have very sparse initial SfM point clouds, leading to the initialization of Gaussians in these areas with ellipsoids having larger axis lengths. This results in their involvement in the computation from too many viewpoints. These Gaussians exhibit larger gradients only in viewpoints where the center point, after projection, is within or near the pixel space. This implies that, from these viewpoints, the large Gaussians cover a larger area in the pixel space after projection. This results in these points having a smaller average gradient size of their NDC coordinates during the “Adaptive Density Control” process every 100 iterations (Eq. 3), because they participate in the computation from too many viewpoints and only have significant gradient sizes in individual viewpoints. Consequently, it is difficult for these points to split or clone, leading to poor modeling in these areas.
其中 �1� 、 �2� 和 �3� 分别是到高斯 � 的三个最近点的距离。我们观察到,未充分建模的区域通常具有非常稀疏的初始SfM点云,导致这些区域中的高斯初始化具有较大轴长的椭球。这导致他们从太多的观点参与计算。这些高斯曲线仅在投影后中心点位于像素空间内或附近的视点中表现出较大的梯度。这意味着,从这些观点来看,大高斯在投影后覆盖像素空间中的更大区域。这导致这些点在每100次迭代的“自适应密度控制”过程期间具有它们的NDC坐标的较小平均梯度大小(等式2)。3),因为它们从太多的视点参与计算,并且仅在各个视点中具有显著的梯度大小。 因此,这些点很难分割或克隆,导致这些区域的建模效果不佳。

Below, we analyze through equations why the Gaussians in the previously mentioned sparser areas can only obtain larger NDC coordinate gradients from viewpoints with sufficient coverage, whereas for viewpoints that only affect the edge areas, the NDC coordinate gradients are smaller. The contribution of a pixel under viewpoint � to the NDC coordinate gradient of Gaussian � can be computed as:
下面,我们通过方程来分析为什么在前面提到的稀疏区域中的高斯只能从具有足够覆盖的视点获得较大的NDC坐标梯度,而对于仅影响边缘区域的视点,NDC坐标梯度较小。视点 � 下的像素对高斯 � 的NDC坐标梯度的贡献可以计算为:

(∂��∂�ndc,x�,�∂��∂�ndc,y�,�)=∑�​�​�=1���∑�=13(∂��∂���​�​�×∂���​�​�∂��,�​�​��×(∂��,�​�​��∂�ndc,x�,�∂��,�​�​��∂�ndc,y�,�)),(5)

where both ∂��,�​�​��∂�ndc,x�,� and ∂��,�​�​��∂�ndc,y�,� contain factor ���, which can be calculated as:
其中 ∂��,�​�​��∂�ndc,x�,� 和 ∂��,�​�​��∂�ndc,y�,� 都包含因子 ��� ,其可以计算为:

��,�​�​��=��×exp⁡(−12​(�​�​��−��,��,��​�​��−��,��,�)�​(Σ2​��,�)−1​(�​�​��−��,��,��​�​��−��,��,�)),(6)

where ���​�​� represents the color of the �th channel of the current pixel, and ��� represents the number of pixels involved in the calculation for Gaussian � under viewpoint �. ��,�​�​�� as a function of the distance between the center of the projected Gaussian and the pixel center, exhibits exponential decay as the distance increases.
其中 ���​�​� 表示当前像素的第 � 通道的颜色,并且 ��� 表示在视点 � 下的高斯 � 的计算中涉及的像素的数量。作为投影高斯的中心与像素中心之间的距离的函数,随着距离的增加呈现指数衰减。

This results in a few pixels close to the center position of the projected Gaussian making a primary contribution to the NDC coordinate gradient of this Gaussian. For large Gaussians, many viewpoints will only affect the edge areas, projecting onto pixels in these viewpoints, leading to the involvement of these viewpoints in the calculation but with very small NDC coordinate gradients. On the other hand, we observe that for these points, for a given viewpoint, when a large number of pixels are involved in the calculation after projection, these points often exhibit larger gradients of NDC coordinates in this viewpoint. This is easy to understand because, when a large number of pixels are involved in the calculation after projection, the projected center point tends to be within the pixel plane, and according to previous calculations, a few pixels near the center point are the main contributors to the gradient of the NDC coordinates.
这导致靠近投影高斯的中心位置的几个像素对该高斯的NDC坐标梯度做出主要贡献。对于大高斯,许多视点将仅影响边缘区域,投影到这些视点中的像素上,导致这些视点参与计算,但具有非常小的NDC坐标梯度。另一方面,我们观察到,对于这些点,对于给定的视点,当投影后的计算中涉及大量像素时,这些点在该视点中往往表现出较大的NDC坐标梯度。这很容易理解,因为当投影后的计算中涉及大量像素时,投影的中心点往往在像素平面内,根据之前的计算,中心点附近的几个像素是NDC坐标梯度的主要贡献者。

To solve this problem, we assign a weight to the gradient size of the NDC coordinates for each Gaussian at every viewpoint, where the weight is the number of pixels involved in the computation for that Gaussian from the corresponding viewpoint. The advantage of this computational approach is that, for large Gaussians, the number of pixels involved in the calculations varies significantly across different viewpoints. According to previous derivations, these large Gaussians only receive larger gradients in viewpoints where a higher number of pixels are involved in the calculations. Weighting the magnitude of gradients by the number of participating pixels in an average manner can more rationally promote the splitting or cloning of these Gaussians. Additionally, for smaller Gaussians, the variation in the number of pixels involved across different viewpoints is minimal. The current averaging method does not produce a significant change compared to the original conditions and does not result in excessive additional memory consumption. The modified equation to decide whether a Gaussian undergoes split or clone is given by:
为了解决这个问题,我们为每个视点处每个高斯的NDC坐标的梯度大小分配一个权重,其中权重是从相应视点计算该高斯时所涉及的像素数。这种计算方法的优点是,对于大高斯,计算中涉及的像素数量在不同的视点之间变化很大。根据先前的推导,这些大高斯仅在计算中涉及更多像素的视点中接收更大的梯度。以平均方式通过参与像素的数量来加权梯度的大小可以更合理地促进这些高斯的分裂或克隆。此外,对于较小的高斯,跨不同视点所涉及的像素数量的变化是最小的。 与原始条件相比,电流平均方法不会产生显著变化,也不会导致额外的内存消耗。 决定高斯是分裂还是克隆的修改后的方程由下式给出:

∑�=1�����×(∂��∂�ndc,x�,�)2+(∂��∂�ndc,y�,�)2∑�=1�����>�pos,(7)

where �� is the number of viewpoints in which Gaussian � participates in the computation during the corresponding 100 iterations of “Adaptive Density Control”, ��� is the number of pixels Gaussian � participates in at viewpoint �, and ∂��∂�ndc,x�,� and ∂��∂�ndc,y�,� respectively represent the gradients of Gaussian � in the � and � directions of NDC space at viewpoint �. The conditions under which a Gaussian participates in the computation for a pixel is given by:
其中 �� 是在“自适应密度控制”的对应100次迭代期间高斯 � 参与计算的视点的数目, ��� 是高斯 � 在视点 � 处参与的像素的数目,而 ∂��∂�ndc,x�,� 和 ∂��∂�ndc,y�,� 分别表示在视点 � 处高斯 � 在NDC空间的 � 和 � 方向上的梯度。高斯参与像素计算的条件由下式给出:

{(�​�​��−��,��,�)2+(�​�​��−��,��,�)2<���,∏�=1�(1−��,�​�​��)⩾10−4,��,�​�​��⩾1255,(8)

while the conditions under which a Gaussian participates in the computation from a viewpoint is given by Eq. 2.
而高斯参与计算的条件从一个观点由方程给出。2.

3.3Scaled Gradient Field 3.3缩放梯度场

While using “Pixel-aware Gradient” to decide whether a point should split or clone (Eq. 7) can address artifacts in modeling areas with insufficient viewpoints and repetitive texture, we found that this condition for point cloud growth also exacerbates the presence of floaters near the camera. This is mainly because floaters near the camera occupy a large screen space and have significant gradients in their NDC coordinates, leading to an increasing number of floaters during the point cloud growth process. To address this issue, we scale the gradient field of the NDC coordinates.
当使用“像素感知梯度”来决定一个点是否应该分裂或克隆时(等式10),7)可以解决建模区域中视点不足和重复纹理的伪影,我们发现点云增长的这种情况也加剧了相机附近的漂浮物的存在。这主要是因为摄像机附近的漂浮物占据了很大的屏幕空间,并且在其NDC坐标中具有显著的梯度,导致在点云增长过程中漂浮物的数量不断增加。为了解决这个问题,我们缩放NDC坐标的梯度场。

Specifically, we use the radius to determine the scale of the scene, where the radius is calculated by:
具体来说,我们使用半径来确定场景的比例,其中半径通过以下公式计算:

radius=1.1⋅max�⁡{‖𝐂�−1�​∑�=1�𝐂�‖2}.(9)

In the training set, there are � viewpoints, with 𝐂� representing the coordinates of the �th viewpoint’s camera in the world coordinate system. We scale the gradient of the NDC coordinates for each Gaussian � under the �th viewpoint, with the scaling factor �​(�,�) being calculated by:
在训练集中,有 � 个视点,其中 𝐂� 表示世界坐标系中第 � 个视点的相机的坐标。我们在第 � 个视点下缩放每个高斯 � 的NDC坐标的梯度,其中缩放因子 �​(�,�) 通过下式计算:

�​(�,�)=clip​((��,��,��depth×radius)2,0,1),(10)

where ��,��,� is the z-coordinate of Gaussian � in the camera coordinate system under the �th viewpoint, indicating the depth of this Gaussian from the viewpoint, and �depth is a hyperparameter set manually.
其中, ��,��,� 是第 � 视点下的相机坐标系中的高斯 � 的z坐标,指示该高斯距视点的深度,并且 �depth 是手动设置的超参数。

The primary inspiration for using squared terms as scaling coefficients in Eq. 10 comes from “Floaters No More” [40]. This paper notes that floaters in NeRF [35] are mainly due to regions close to the camera occupying more pixels after projection, which leads to receiving more gradients during optimization. This results in these areas being optimized first, consequently obscuring the originally correct spatial positions from being optimized. The number of pixels occupied is inversely proportional to the square of the distance to the camera, hence the scaling of gradients by the squared distance.
在方程中使用平方项作为比例系数的主要灵感。10来自“不再漂浮”[ 40]。这篇论文指出,NeRF [ 35]中的浮动主要是由于靠近相机的区域在投影后占据更多像素,这导致在优化过程中接收更多梯度。这导致这些区域首先被优化,从而使最初正确的空间位置无法被优化。所占用的像素数与到相机的距离的平方成反比,因此梯度的缩放比例为平方距离。

In summary, a major issue with pixel-based optimization is the imbalance in the spatial gradient field, leading to inconsistent optimization speeds across different areas. Adaptive scaling of the gradient field in different spatial regions can effectively address this problem. Therefore, the final calculation equation that determines whether a Gaussian undergoes a “split” or “clone” is given by:
总之,基于像素的优化的一个主要问题是空间梯度场的不平衡,导致不同区域的优化速度不一致。梯度场在不同空间区域的自适应缩放可以有效地解决这个问题。因此,确定高斯是否经历“分裂”或“克隆”的最终计算方程由下式给出:

∑�=1�����×�​(�,�)×(∂��∂�ndc,x�,�)2+(∂��∂�ndc,y�,�)2∑�=1�����>�pos.(11)

4Experiments 4实验

4.1Experimental Setup 4.1实验装置

Datasets and benchmarks. We evaluated our method across a total of 30 real-world scenes, including all scenes from Mip-NeRF 360 (9 scenes) [3] and Tanks & Temples (21 scenes) [22], which are two most widely used datasets in the field of 3D reconstruction. They contain both bounded indoor scenes and unbounded outdoor scenes, allowing for a comprehensive evaluation of our method’s performance.
数据集和基准。我们在总共30个真实世界场景中评估了我们的方法,包括来自Mip-NeRF 360(9个场景)[ 3]和Tanks & Temples(21个场景)[ 22]的所有场景,这是3D重建领域最广泛使用的两个数据集。它们包含有界的室内场景和无界的室外场景,允许我们的方法的性能进行全面的评估。

Evaluation metrics. We assess the quality of reconstruction through PSNR↑, SSIM↑ [47], and LPIPS↓ [60]. Among them, PSNR reflects pixel-aware errors but does not quite correspond to human visual perception as it treats all errors as noise without distinguishing between structural and non-structural distortions. SSIM accounts for structural transformations in luminance, contrast, and structure, thus more closely mirroring human perception of image quality. LPIPS uses a pre-trained deep neural network to extract features and measures the high-level semantic differences between images, offering a similarity that is closer to human perceptual assessment compared to PSNR and SSIM.
评价指标。我们通过PSNR ↑ 、SSIM ↑ [ 47]和LPIPS ↓ [ 60]评估重建质量。其中,PSNR反映了像素感知误差,但并不完全对应于人类视觉感知,因为它将所有误差视为噪声,而不区分结构性和非结构性失真。SSIM解释了亮度、对比度和结构的结构转换,从而更接近地反映了人类对图像质量的感知。LPIPS使用预训练的深度神经网络来提取特征并测量图像之间的高级语义差异,与PSNR和SSIM相比,提供更接近人类感知评估的相似性。

Implementation details. Our method only requires minor modifications to the original code of 3DGS, so it is compatible with almost all subsequent works on 3DGS. We use the default parameters of 3DGS to ensure consistency with the original implementation, including maintaining the same threshold ��​�​� for splitting and cloning points as in the original 3DGS. For all scenes, we set a constant �depth value in Eq. 10 as 0.37 which is obtained through experimentations. All experiments were conducted on one RTX 3090 GPU with 24GB memory.
实施细节。我们的方法只需要对3DGS的原始代码进行微小的修改,因此它与几乎所有的3DGS后续工作兼容。我们使用3DGS的默认参数来确保与原始实现的一致性,包括保持与原始3DGS中相同的分割和克隆点的阈值 ��​�​� 。对于所有场景,我们在等式中设置恒定的 �depth 值。10为0.37,通过实验得到。所有实验均在具有24 GB内存的RTX 3090 GPU上进行。

4.2Main Results 4.2主要结果

We select several representative methods for comparison, including the NeRF methods, e.g., Plenoxels [13], INGP [37], and Mip-NeRF 360 [3], and the 3DGS method [21]. We used the official implementation for all of the compared methods, and the same training/testing split as Mip-NeRF 360, selecting one out of every eight photos for testing.
我们选择了几种有代表性的方法进行比较,包括NeRF方法,例如,Plenoxels [ 13],INGP [ 37]和Mip-NeRF 360 [ 3]以及3DGS方法[ 21]。我们对所有比较的方法都使用了官方实现,并使用了与Mip-NeRF 360相同的训练/测试划分,每八张照片中选择一张进行测试。

Quantitative results. The quantitative results (PSNR, SSIM, and LPIPS) on the Mip-NeRF 360 and Tanks & Temples datasets are presented in Tables 1 and 2, respectively. We also provide the results of three challenging scenes for each dataset for more detailed information. Here, we retrained the 3DGS (noted as 3DGS∗) as doing so yields a better performance than the original 3DGS (noted as 3DGS). We can see that our method consistently outperforms all the other methods, especially in terms of the LPIPS metric, while maintaining real-time rendering speed (to be discussed later). Besides, compared to 3DGS, our method shows significant improvements in the three challenging scenes in both datasets and achieves better performance over the entire dataset. It quantitatively validates the effectiveness of our method in improving the quality of reconstruction.
定量结果。Mip-NeRF 360和Tanks & Temples数据集的定量结果(PSNR、SSIM和LPIPS)分别见表1和表2。我们还为每个数据集提供了三个具有挑战性的场景的结果,以获得更详细的信息。在这里,我们重新训练了3DGS(标记为3DGS ∗ ),因为这样做会产生比原始3DGS(标记为3DGS)更好的性能。我们可以看到,我们的方法始终优于所有其他方法,特别是在LPIPS指标方面,同时保持实时渲染速度(稍后讨论)。此外,与3DGS相比,我们的方法在两个数据集中的三个具有挑战性的场景中表现出显着的改进,并在整个数据集上实现了更好的性能。定量验证了该方法在提高重建质量方面的有效性。

Table 1:Quantitative results on the Mip-NeRF 360 dataset. Cells are highlighted as follows: bestsecond best, and third best. We also show the results of three challenging scenes. 3DGS∗ is our retrained 3DGS model with better performance.
表1:Mip-NeRF 360数据集的定量结果。单元格突出显示如下:最佳、第二佳和第三佳。我们还展示了三个具有挑战性的场景的结果。3DGS ∗ 是我们重新训练的3DGS模型,具有更好的性能。

Mip-NeRF 360 (all scenes)
Mip-NeRF 360(所有场景)
Flowers 花Bicycle 自行车Stump 残端
Method 方法PSNR↑SSIM↑LPIPS↓PSNR↑SSIM↑LPIPS↓PSNR↑SSIM↑LPIPS↓PSNR↑SSIM↑LPIPS↓
Plenoxels [13] 23.080.6250.46320.100.4310.52121.910.4960.50620.660.5230.503
INGP-Base [37] INGP-基础[ 37] 25.300.6710.37120.350.4500.48122.190.4910.48723.630.5740.450
INGP-Big [37] [ 37]第三十七话 25.590.6990.33120.650.4860.44122.170.5120.44623.470.5940.421
Mip-NeRF 360 [3] 27.690.7920.23721.730.5830.34424.370.6850.30126.400.7440.261
3DGS [21] 27.210.8150.21421.520.6050.33625.250.7710.20526.550.7750.210
3DGS∗ [21] 27.710.8260.20221.890.6220.32825.630.7780.20426.900.7850.207
Pixel-GS (Ours) Pixel-GS(我们的)27.880.8340.17621.940.6520.25125.740.7930.17327.110.7960.181

Table 2:Quantitative results on the Tanks & Temples dataset. We also show the results of three challenging scenes. ∗ indicates retraining for better performance.
表2:Tanks & Temples数据集的定量结果。我们还展示了三个具有挑战性的场景的结果。 ∗ 表示重新培训以获得更好的性能。

Tanks & Temples (all scenes)
坦克和寺庙(所有场景)
Train 火车Barn 谷仓Caterpillar
Method 方法PSNR↑SSIM↑LPIPS↓PSNR↑SSIM↑LPIPS↓PSNR↑SSIM↑LPIPS↓PSNR↑SSIM↑LPIPS↓
3DGS∗ [21] 24.190.8440.19422.020.8120.20928.460.8690.18223.790.8090.211
Pixel-GS (Ours) Pixel-GS(我们的)24.380.8500.17822.130.8230.18029.000.8880.14424.080.8320.173

Qualitative results. In Figures 1 and 3, we showcase the comparisons between our method and 3DGS∗. We can see our approach significantly reduces the blurring and needle-like artifacts, e.g. the region of the flowers in the second row and the blow-up region in the last row, compared against the 3DGS∗. These regions are initialized with insufficient points from SfM, and our method effectively grows points in these areas, leading to a more accurate and detailed reconstruction. Please refer to the supplemental materials for the point cloud comparison. These examples clearly validate that our method is more robust to the quality of initialization point clouds and can reconstruct high-fidelity details.
定性结果。在图1和图3中,我们展示了我们的方法和3DGS ∗ 之间的比较。我们可以看到,与3DGS ∗ 相比,我们的方法显着减少了模糊和针状伪影,例如第二行中的花朵区域和最后一行中的放大区域。这些区域是用SfM中的不足点初始化的,我们的方法有效地在这些区域中增加点,从而实现更准确和更详细的重建。点云比较请参考补充资料。这些例子清楚地验证了我们的方法是更强大的初始化点云的质量,可以重建高保真的细节。

4.3Ablation Studies 4.3消融研究

To evaluate the effectiveness of individual components of our method, i.e. the pixel-aware gradient and the scaled gradient field, we conducted ablation studies on the Mip-NeRF 360 and Tanks & Temples datasets. The quantitative and qualitative results are presented in Table 3 and Figure 4, respectively. We can see that both the pixel-aware gradient and the scaled gradient field contribute to the improvement of the reconstruction quality in the Mip-NeRF 360 dataset. However, the pixel-aware gradient strategy reduces the reconstruction quality in the Tanks & Temples dataset. This is mainly due to floaters that tend to appear near the camera in some large scenes in Tanks & Temples and the pixel-aware gradient encourages more Gaussians, as shown in column (b) of Figure 4. Notably, this phenomenon also exists for the 3DGS when the threshold �pos is lowered, which also promots more Gaussians, as shown in Table 4. But importantly, the combination of both proposed strategies achieves the best performance in the Tanks & Temples dataset, as shown in Table 3, since the scaled gradient field can suppress the growth of floaters near the camera. In summary, the ablation studies demonstrate the effectiveness of our proposed individual components and the necessity of combining them to achieve the best performance.
为了评估我们的方法的各个组成部分的有效性,即像素感知梯度和缩放梯度场,我们对Mip-NeRF 360和Tanks & Temples数据集进行了消融研究。定量和定性结果分别见表3和图4。我们可以看到,像素感知梯度和缩放梯度场都有助于提高Mip-NeRF 360数据集中的重建质量。然而,像素感知梯度策略降低了Tanks & Temples数据集的重建质量。这主要是由于在坦克和寺庙中的一些大型场景中,漂浮物往往出现在相机附近,并且像素感知梯度鼓励更多的高斯,如图4的列(B)所示。值得注意的是,当阈值 �pos 降低时,3DGS也存在这种现象,这也促进了更多的高斯,如表4所示。 但重要的是,这两种策略的组合在Tanks & Temples数据集中实现了最佳性能,如表3所示,因为缩放的梯度场可以抑制相机附近漂浮物的增长。总之,消融研究证明了我们提出的单个组件的有效性以及将它们组合以实现最佳性能的必要性。

Refer to caption

Refer to caption

Refer to caption

Refer to caption

Refer to caption

Refer to caption

Refer to caption

Refer to caption

Refer to caption

Refer to caption

Refer to caption

Refer to caption

Refer to caption

Refer to caption

Refer to caption

Refer to caption

Refer to caption

Refer to caption

Refer to caption

Refer to caption

Refer to caption

(a) Ground Truth (a)地面实况(b) Pixel-GS (Ours) (b)Pixel-GS(我们的)(c) 3DGS∗ [21]
(c)3DGS ∗ [ 21]

Figure 3: Qualitative comparison between Pixel-GS (Ours) and 3DGS∗. The first three scenes are from the Mip-NeRF 360 dataset (Bicycle, Flowers, and Treehill), while the last four scenes are from the Tanks & Temples dataset (Barn, Caterpillar, Playground, and Train). The blow-up regions or arrows highlight the parts with distinct differences in quality. 3DGS∗ is our retrained 3DGS model with better performance.
图3:Pixel-GS(我们的)和3DGS ∗ 之间的定性比较。前三个场景来自Mip-NeRF 360数据集(自行车,鲜花和树丘),而最后四个场景来自坦克和寺庙数据集(谷仓,卡特彼勒,游乐场和火车)。放大区域或箭头突出显示具有明显质量差异的零件。3DGS ∗ 是我们重新训练的3DGS模型,具有更好的性能。

Refer to caption

Refer to caption

Refer to caption

Refer to caption

Refer to caption

Refer to caption

Refer to caption

Refer to caption

Refer to caption

Refer to caption

Refer to caption

Refer to caption

Refer to caption

Refer to caption

Refer to caption

Refer to caption

(a) 3DGS (a)3DGS∗ (b) Pixel-aware Gradient (b)像素感知渐变(c) Scaled Gradient Field
(c)缩放梯度场
(d) Complete Model (d)完整模型

Figure 4:Qualitative results of the ablation study. The PSNR↑ results are shown on the corresponding images.
图4:消融研究的定性结果。PSNR ↑ 结果显示在相应的图像上。 Table 3:Ablation study. The metrics are derived from the average values across all scenes of the Mip-NeRF 360 and Tanks & Temples datasets, respectively.
表3:消融研究。这些指标分别来自Mip-NeRF 360和Tanks & Temples数据集所有场景的平均值。

Mip-NeRF 360Tanks & Temples 坦克和寺庙
Method 方法PSNR↑SSIM↑LPIPS↓PSNR↑SSIM↑LPIPS↓
3DGS∗ [21] 27.710.8260.20224.230.8440.194
Pixel-aware Gradient 像素感知渐变27.740.8330.17621.800.7910.239
Scaled Gradient Field 缩放梯度场27.720.8250.20224.340.8430.198
Complete Model 完整模型27.880.8340.17624.380.8500.178

Table 4:Impact of lowering �pos. We show the corresponding quality and efficiency metrics when lowering the threshold �pos of point growth for 3DGS∗ and our method.
表4:降低 �pos 的影响。当降低3DGS ∗ 和我们的方法的点增长的阈值 �pos 时,我们显示了相应的质量和效率指标。

Dataset 数据集Strategy 战略PSNR↑SSIM↑LPIPS↓Train 火车FPSMemory 存储器
Mip-NeRF 3603DGS∗ (�pos=2​�−4)27.710.8260.20225m40s1260.72GB
3DGS∗ (�pos=1.28​�−4) 27.830.8330.18143m23s901.4GB
Ours (�pos=2​�−4) 我们的 (�pos=2​�−4)27.880.8340.17641m25s891.2GB
Tanks & Temples 坦克和寺庙3DGS∗ (�pos=2​�−4)24.190.8440.19416m3s 16 m3秒1350.41GB
3DGS∗ (�pos=1​�−4) 23.860.8420.18727m59s870.94GB
Ours (�pos=2​�−4) 熊 (�pos=2​�−4)24.380.8500.17826m36s 26米36秒920.84GB

Refer to caption

Refer to caption

Refer to caption

Figure 5: Reconstruction quality (PSNR↑, SSIM↑, and LPIPS↓) vs. Dropping rate of initializing points. Here, the dropping rate refers to the percentage of points dropped from the original SfM point clouds for initializing Gaussians. The results are obtained on the Mip-NeRF 360 dataset.
图5:重建质量(PSNR ↑ 、SSIM ↑ 和LPIPS ↓ )与初始化点丢弃率的关系。这里,丢弃率是指从用于初始化高斯的原始SfM点云丢弃的点的百分比。结果是在Mip-NeRF 360数据集上获得的。

4.4Analysis

The impact of lowering the threshold �pos. As the blurring and needle-like artifacts in 3DGS mainly occur in areas with insufficient initializing points, one straightforward solution would be to lower the threshold �pos to encourage the growth of more points. To verify this, we experimented on the Mip-NeRF 360 and Tanks & Temples datasets by lowering the threshold �pos from 2​�−4 to 1.28​�−4 for 3DGS to make the final optimized number of points comparable to ours. From Table 4, we can see that lowering the threshold �pos for 3DGS significantly increases the memory consumption and decreases the rendering speed, while still falling behind ours in terms of reconstruction quality. As can be seen from the qualitative comparison in Figure 1, this is because the point cloud growth mechanism of 3DGS struggles to generate points in areas with insufficient initializing points and only yields unnecessary points in areas where the initial SfM point cloud is already dense. In contrast, although our method also results in additional memory consumption, our method’s point cloud distribution is more uniform, enabling effectively growing points in areas with insufficient initializing points, thereby leading to a more accurate and detailed reconstruction while still maintaining real-time rendering speed.
降低阈值 �pos 的影响。由于3DGS中的模糊和针状伪影主要发生在初始化点不足的区域中,因此一种直接的解决方案是降低阈值 �pos 以鼓励更多点的增长。为了验证这一点,我们在Mip-NeRF 360和Tanks & Temples数据集上进行了实验,将3DGS的阈值 �pos 从 2​�−4 降低到 1.28​�−4 ,以使最终优化的点数与我们的点数相当。从表4中,我们可以看到,降低3DGS的阈值 �pos 会显著增加内存消耗并降低渲染速度,但在重建质量方面仍然落后于我们。从图1中的定性比较可以看出,这是因为3DGS的点云增长机制难以在初始化点不足的区域中生成点,并且仅在初始SfM点云已经密集的区域中生成不必要的点。 相比之下,虽然我们的方法也会导致额外的内存消耗,但我们的方法的点云分布更均匀,能够在初始化点不足的区域有效地增长点,从而在保持实时渲染速度的同时实现更准确和详细的重建。

Robustness to the quality of initialization point clouds. Finally, SfM algorithms often fail to produce high-quality point clouds in some areas, e.g., too few observations, repetitive textures, or low textures. The point cloud produced by SfM is usually the necessary input for 3DGS and our method. Therefore, we explored the robustness of our method to the quality of initialization point clouds by randomly dropping points from the SfM point clouds used for initialization and compared the results with that of 3DGS. Figure 5 shows how the reconstruction quality varies with the proportion of dropped points. We can see that our method consistently outperforms 3DGS in terms of all the metrics (PSNR, SSIM, and LPIPS). And more importantly, our method is less affected by the dropping rate than 3DGS. Notably, even though the 99% initializing points have been dropped, the reconstruction quality of our method still surpasses that of 3DGS initialized with complete SfM point clouds, in terms of LPIPS. These results demonstrate the robustness of our method to the quality of initialization point clouds, which is crucial for real-world applications.
对初始化点云质量的鲁棒性。最后,SfM算法通常无法在某些区域产生高质量的点云,例如,太少的观察,重复的纹理,或低纹理。由SfM产生的点云通常是3DGS和我们的方法的必要输入。因此,我们探讨了我们的方法的鲁棒性的初始化点云的质量随机下降点从SfM点云用于初始化,并比较结果与3DGS。图5显示了重建质量如何随丢弃点的比例而变化。我们可以看到,我们的方法在所有指标(PSNR,SSIM和LPIPS)方面始终优于3DGS。更重要的是,我们的方法比3DGS受下降率的影响更小。 值得注意的是,即使 99% 初始化点已经被丢弃,我们的方法的重建质量仍然超过了用完整的SfM点云初始化的3DGS的重建质量,就LPIPS而言。这些结果表明,我们的方法的鲁棒性的初始化点云的质量,这是至关重要的现实世界中的应用。

5Conclusion 5结论

The blurring and needle-like artifacts in 3DGS are mainly attributed to its inability to grow points in areas with insufficient initializing points. To address this issue, we propose Pixel-GS, which considers the number of pixels covered by a Gaussian in each view to dynamically weigh the gradient of each view during the computation of the growth condition. This strategy effectively grows Gaussians with large scales, which are more likely to exist in areas with insufficient initializing points, such that our method can adaptively grow points in these areas while avoiding unnecessary growth in areas with enough points. We also introduce a simple yet effective strategy to deal with floaters, i.e., scaling the gradient field by the distance to the camera. Extensive experiments demonstrate that our method significantly reduces blurring and needle-like artifacts and effectively suppresses floaters, achieving state-of-the-art performance in terms of rendering quality. Meanwhile, although our method consumes slightly more memory consumption, the increased points are mainly distributed in areas with insufficient initializing points, which are necessary for high-quality reconstruction, and our method still maintains real-time rendering speed. Finally, our method is more robust to the number of initialization points, thanks to our effective pixel-aware gradient and scaled gradient field.
3DGS中的模糊和针状伪影主要归因于其无法在初始化点不足的区域中生长点。为了解决这个问题,我们提出了Pixel-GS,它认为在每个视图中的高斯覆盖的像素的数量动态加权的梯度的每个视图在计算的增长条件。该策略有效地增长了大尺度的高斯,这些高斯更有可能存在于初始化点不足的区域中,因此我们的方法可以自适应地在这些区域中增长点,同时避免在有足够点的区域中不必要的增长。我们还介绍了一个简单而有效的策略来处理飞蚊症,即,通过到相机的距离来缩放梯度场。大量的实验表明,我们的方法显着减少模糊和针状文物,并有效地抑制浮动,实现最先进的性能方面的渲染质量。 同时,虽然我们的方法消耗了更多的内存消耗,但增加的点主要分布在初始化点不足的区域,这是高质量重建所必需的,我们的方法仍然保持实时渲染速度。最后,由于我们有效的像素感知梯度和缩放梯度场,我们的方法对初始化点的数量更具鲁棒性。

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.rhkb.cn/news/308467.html

如若内容造成侵权/违法违规/事实不符,请联系长河编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

【oracle数据库安装篇一】Linux5.6基于LVM安装oracle10gR2单机

说明 本篇文章主要介绍了Linux5.6基于LVM安装oracle10gR2单机的配置过程&#xff0c;比较详细&#xff0c;基本上每一个配置部分的步骤都提供了完整的脚本&#xff0c;安装部分都提供了简单的说明和截图&#xff0c;帮助你100%安装成功oracle数据库。 安装过程有不明白的地方…

抖音视频无水印采集拓客软件|视频批量下载提取工具

抖音视频无水印批量采集拓客软件助力高效营销&#xff01; 随着抖音平台的崛起&#xff0c;视频已成为各行各业进行营销的重要工具。但是&#xff0c;传统的视频下载方式往往效率低下&#xff0c;无法满足快速获取大量视频的需求。针对这一问题&#xff0c;我们开发了一款视频无…

【PDF.js】PDF文件预览

【PDF.js】PDF文件预览 一、PDF.js二、PDF.js 下载1、下载PDF.js2、在项目中引入3、屏蔽跨域错误 三、项目中使用四、说明五、实现效果 使用PDFJS实现pdf文件的预览&#xff0c;支持预览指定页、关键词搜索、缩略图、页面尺寸调整等等。 一、PDF.js 官方地址 文档地址 二、PD…

JVM、maven、Nexus

一、jvm简介 1.应用程序申请内存时出现的三种情况&#xff1a; ①OOM:内存溢出&#xff0c;是指应用系统中存在无法回收的内存或使用的内存过多&#xff0c;最终使得程序运行要用到的内存大于能提供的最大内存。此时程序就运行不了&#xff0c;系统会提示内存溢出&#xff0c…

react query 学习笔记

文章目录 react query 学习笔记查询客户端 QueryClient获取查询客户端 useQueryClient异步重新请求数据 queryClient.fetchQuery /使查询失效 queryClient.invalidateQueries 与 重新请求数据queryClient.refetchQueries 查询 QueriesuseQuery查询配置对象查询的键值 Query Key…

最前沿・量子退火建模方法(1) : subQUBO讲解和python实现

前言 量子退火机在小规模问题上的效果得到了有效验证&#xff0c;但是由于物理量子比特的大规模制备以及噪声的影响&#xff0c;还没有办法再大规模的场景下应用。 这时候就需要我们思考&#xff0c;如何通过软件的方法怎么样把大的问题分解成小的问题&#xff0c;以便通过现在…

模型 洛萨达比例

系列文章 分享 模型&#xff0c;了解更多&#x1f449; 模型_思维模型目录。积极和消极的平衡&#xff0c;左右着你们的关系。 1 洛萨达比例的应用 1.1 企业团队管理之洛萨达比例的应用 一个软件开发公司的团队经理注意到团队的士气和生产力有所下降。此时洛萨达比例是在2.9:…

故障诊断 | Matlab实现基于小波包结合鹈鹕算法优化卷积神经网络DWT-POA-CNN实现电缆故障诊断算法

故障诊断 | Matlab实现基于小波包结合鹈鹕算法优化卷积神经网络DWT-POA-CNN实现电缆故障诊断算法 目录 故障诊断 | Matlab实现基于小波包结合鹈鹕算法优化卷积神经网络DWT-POA-CNN实现电缆故障诊断算法分类效果基本介绍程序设计参考资料 分类效果 基本介绍 1.Matlab实现基于小波…

关于机器学习/深度学习的一些事-答知乎问(二)

进化算法与深度强化学习算法结合如何进行改进&#xff1f; &#xff08;1&#xff09;进化算法普遍存在着样本效率低下的问题&#xff0c;虽然其探索度较高&#xff0c;但其本质为全局随机性搜索&#xff0c;需要在整个回合结束后才能更新其种群&#xff0c;而深度强化学习在每…

Linux系统——Elasticsearch企业级日志分析系统

目录 前言 一、ELK概述 1.ELK简介 2.ELK特点 3.为什么要使用ELK 4.完整日志系统基本特征 5.ELK工作原理 6.Elasticsearch介绍 6.1Elasticsearch概述 6.2Elasticsearch核心概念 7.Logstash介绍 7.1Logstash简介 7.2Logstash主要组件 8.Kibana介绍 8.1Kibana简介 …

爬取学习强国视频小示例

因为需要爬取的视频数量并不是很大&#xff0c;总共需要将131个视频下载下来&#xff0c;所以就直接去手动找找视频的地址和名称保存下来的。由于页面是动态加载的&#xff0c;所以我们无法在网站源码中直接找到视频的超链接。设想是可以用Selenium模拟浏览器点击进行动态加载获…

Java基础(一)--语法入门

文章目录 第一章、语法入门一、Java简介1、JVM2、Java程序执行过程3、JDK4、JRE5、JDK、JRE和JVM三者关系 二、Java常量与变量1、标识符2、关键字3、保留字4、变量5、数据类型6、常量 三、运算符1、算术运算符2、赋值运算符3、关系运算符4、逻辑运算符5、条件运算符6、运算符的…

SpringMVC(一)【入门】

前言 学完了大数据基本组件&#xff0c;SpringMVC 也得了解了解&#xff0c;为的是之后 SpringBoot 能够快速掌握。SpringMVC 可能在大数据工作中用的不多&#xff0c;但是 SSM 毕竟是现在就业必知必会的东西了。SpringBoot 在数仓开发可能会经常用到&#xff0c;所以不废话学吧…

CSS盒模型(详讲)

目录 概述&#xff1a; 内容区&#xff08;content&#xff09;&#xff1a; 内边距&#xff08;paddingj&#xff09;&#xff1a; 前言&#xff1a; 设置内边距&#xff1a; 边框&#xff08;border&#xff09;&#xff1a; 前言&#xff1a; 示例&#xff1a; 外边…

机器人路径规划:基于Q-learning算法的移动机器人路径规划,可以自定义地图,修改起始点,提供MATLAB代码

一、Q-learning算法 Q-learning算法是强化学习算法中的一种&#xff0c;该算法主要包含&#xff1a;Agent、状态、动作、环境、回报和惩罚。Q-learning算法通过机器人与环境不断地交换信息&#xff0c;来实现自我学习。Q-learning算法中的Q表是机器人与环境交互后的结果&#…

51单片机 DS1302

DS1302 实现流程 将提供的ds1302底层参考程序拷贝到工程下 注意在ds1302.c中可能硬件引脚没有定义&#xff0c;注意去看一下。还有头文件什么的在ds1302中记得加上 参考代码&#xff1a; #include "reg52.h" #include "ds1302.h"unsigned char Write_…

「 典型安全漏洞系列 」14.NoSQL注入漏洞详解

NoSQL注入是一个漏洞&#xff0c;攻击者能够干扰应用程序对NoSQL数据库进行的查询&#xff0c;本文我们将研究如何测试一般的NoSQL漏洞&#xff0c;然后重点研究如何利用MongoDB中的漏洞&#xff08;MongoDB是最流行的NoSQL数据库&#xff09;。 1. 什么是NoSQL注入 NoSQL注入…

AI大模型探索之路-实战篇:基于CVP架构-企业级知识库实战落地

目录 前言 一、概述 二、本地知识库需求分析 1. 知识库场景分析 2. 知识库应用特点 3. 知识库核心功能 三、本地知识库架构设计 1. RAG架构分析 2. 大模型方案选型 3. 应用技术架构选型 4. 向量数据库选型 5. 模型选型 三、本地知识库RAG评估 四、本地知识库代码落地 1. 文件…

Electron+React 搭建桌面应用

创建应用程序 创建 Electron 应用 使用 Webpack 创建新的 Electron 应用程序&#xff1a; npm init electron-applatest my-new-app -- --templatewebpack 启动应用 npm start 设置 Webpack 配置 添加依赖包&#xff0c;确保可以正确使用 JSX 和其他 React 功能&#xff…

【C++学习】深入理解C++异常处理机制:异常类型,捕获和处理策略

文章目录 ♫一.异常的提出♫二.异常的概念♫三.异常的使用♫3.1 异常的抛出和捕获♫3.2.异常的重新抛出♫3.3异常安全♫3.4 异常规范 ♫4.自定义异常体系♫5.C标准库的异常体系♫6.异常的优缺点 ♫一.异常的提出 之前&#xff1a; C语言传统的处理错误的方式与带来的弊端&…