【读论文】【泛读】三篇生成式自动驾驶场景生成: Bevstreet, DisCoScene, BerfScene

文章目录

  • 1. Street-View Image Generation from a Bird’s-Eye View Layout
    • 1.1 Problem introduction
    • 1.2 Why
    • 1.3 How
    • 1.4 My takeaway
  • 2. DisCoScene: Spatially Disentangled Generative Radiance Fields for Controllable 3D-aware Scene Synthesis
    • 2.1 What
    • 2.2 Why
    • 2.3 How
    • 2.4 My takeaway
  • 3. BerfScene: Bev-conditioned Equivariant Radiance Fields for Infinite 3D Scene Generation(Follow DisCoScene)
    • 3.1 What
    • 3.2 Why
    • 3.3 How
    • 3.4 My takeaway

1. Street-View Image Generation from a Bird’s-Eye View Layout

1.1 Problem introduction

From the title of this paper, we know it bound a relation from Bev(Bird’s-Eye View) to Street view image.

在这里插入图片描述

Concretely, the input (Bev) is a two-dimensional representation of a three-dimensional environment from a top perspective. In the BEV diagram, squares of different colors represent different objects or road features, such as vehicles, pedestrians, lane lines, etc. And green square means an ego vehicle that has three cameras in front.

The task is to generate three street-view images aligned to the Bev according to the relative position among these square objects.

As for the concept of “layout”, it should consider the effects of these factors:

  • Cameras with an overlapping field-of-view (FoV) must ensure overlapping content is correctly shown
  • The visual styling of the scene also needs to be consistent such that all virtual views appear to be created in the same geographical area (e.g., urban vs. rural), at the same time of day, with the same weather conditions, and so on.
  • In addition to this consistency, the images must correspond to the HD
    map, faithfully reproducing the specified road layout, lane lines, and vehicle locations.

1.2 Why

It is the first attempt to explore the generative side of BEV perception for driving scenes.

1.3 How

  1. Methods
    在这里插入图片描述
    As shown in this pipeline, the Bev layout and source images were encoded as an input of the autoregressive transformer collaborating with direction and camera information to help the understanding of space. New mv-images were output.

  2. Experiments
    Three metrics are used.
    在这里插入图片描述
    FID represents the diversity and quality of generated images. Road mIoU and Vehicle mIoU can be used to represent the overlapping to verify the relative position in the Bev inputs.
    Scene edit was achieved by the change of Bev layout:
    在这里插入图片描述

1.4 My takeaway

  1. How to utilize the ability of an autoregressive transformer!!! Why do we use it other than others?
  2. I have known about what is Bev.

2. DisCoScene: Spatially Disentangled Generative Radiance Fields for Controllable 3D-aware Scene Synthesis

2.1 What

An editable 3D generative model using object bounding boxes without semantic annotation as layout prior, allowing for high-quality scene synthesis and flexible user control of both the camera and scene objects.

2.2 Why

  • Existing generative models focus on individual objects, lacking the ability to handle non-trivial scenes.

  • Some works like GSN can only generate scenes, without object-level editing. That is because of the lack of explicit object definition in NeRF.

  • GIRAFFE explicitly composites object-centric radiance fields to support object-level control. Yet, it works poorly on mixed scenes due to the absence of proper spatial priors.

  • Interesting refer:

    17: Layout-transformer: Layout generation and completion with self-attention.

    26: Layout-gan: Generating graphic layouts with wireframe discriminators.

    58: Blockplanner: City block generation with vectorized graph representation.

2.3 How

在这里插入图片描述

Bounding boxes as layout priors to generate the objects, combined with the generated background were used in neural rendering. Meanwhile, an extra object discriminator for local discrimination is added, leading to better object-level supervision.

2.4 My takeaway

  1. Is it possible to cancel the manually marked bbox and automatically identify and regenerate the corresponding area in Gaussian?

3. BerfScene: Bev-conditioned Equivariant Radiance Fields for Infinite 3D Scene Generation(Follow DisCoScene)

3.1 What

Incorporating an equivariant radiance field with the guidance of a BEV map, this method allows us to produce large-scale, even infinite-scale, 3D scenes via synthesizing local scenes and then stitching them with smooth consistency.

Understood as the superposition of patches in a bev:
在ddd

3.2 Why

  1. Generating large-scale 3D scenes cannot simply apply existing 3D object synthesis techniques since 3D scenes usually hold complex spatial configurations and consist of many objects at varying scales.
  2. Previous approaches often relied on scene graphs, facing limitations in processing due to unstructured topology.
  3. DiscoScene introduces complexity in interpreting the entire scene and
    faces scalability challenges when using Bbox.
  4. BEV maps could specify the composition and scales of objects clearly but lack insights into the detailed visual appearance of the objects. Recent attempts like InfiniCity and SceneDreamer try to avoid the ambiguity of BEV maps, but they are inefficiency.

3.3 How

在这里插入图片描述

To integrate the prior information provided by the BEV map into the radiation field, the researchers introduced a generator U U U, which can generate a 2D feature map based on BEV map conditions. Builder U U U adopts a network structure that combines U-Net architecture and StyleGAN blocks.

3.4 My takeaway

  1. Confused about how to use this U-Net, need some other time to supplement background knowledge. 🤡

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.rhkb.cn/news/312911.html

如若内容造成侵权/违法违规/事实不符,请联系长河编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

C++进阶(2)-函数

目录 一、函数提高 1.1函数默认参数 1.2函数占位参数 1.3函数重载 1.3.1函数重载概述 1.3.2函数重载注意事项 二、类和对象 2.1封装 2.1.1封装的意义 2.1.2struct和class区别 2.1.3成员属性设置为私有 2.1.4封装案例 2.2对象的初始化和清理 2.2.1构造函数和析构函数 …

van-uploader 在app内嵌的webview中的一些坑

问题: 部分版本在ios 中没有问题,但是安卓中不触发图片选择和拍照(之前是可以的,可能是没有锁定版本,重新发版导致的)。在ios中下拉文案是英文,html配置lang等于 zh 也没有用,ios里…

Learn SRP 02

3.Editor Rendering 3.1Drawing Legacy Shaders 因为我们的管线只支持无光照的着色过程,使用其他不同的着色过程的对象是不能被渲染的,他们被标记为不可见。尽管这是正确的,但是它还是隐藏了场景中一些使用错误着色器的对象。所以让我们来渲…

libftdi1学习笔记 5 - SPI Nor Flash

目录 1. 初始化 2. CS控制例子 3. 读ID 3.1 制造商 3.2 容量大小 3.3 设置IO类型 3.3.1 setQSPIWinbond 3.3.2 setQSPIMxic 3.3.3 setQSPIMicrochip 3.3.4 setQSPIMicron 4. 写保护 5. 等待空闲 6. 擦除扇区 7. 页编程 8. 页读 9. 写 10. 读 11. 验证 基于M…

Postman之版本信息查看

Postman之版本信息查看 一、为何需要查看版本信息?二、查看Postman的版本信息的步骤 一、为何需要查看版本信息? 不同的版本之间可能存在功能和界面的差异。 二、查看Postman的版本信息的步骤 1、打开 Postman 2、打开设置项 点击页面右上角的 “Set…

Xshell无法输入命令输入命令卡顿

Xshell是一款功能强大的终端模拟软件,可以让用户通过SSH、Telnet、Rlogin、SFTP等协议远程连接到Linux、Unix、Windows等服务器。然而,在使用Xshell的过程中,我们可能会遇到一些问题。比如输入不了命令,或者输入命令很卡。这些问题…

content-type对数据采集的影响,猿人学58题

在拿猿人学网站 https://www.python-spider.com/api/challenge58 练习的时候发现请求头中少了 content-type之后结果全部不对了 当我设置headers如下时 headers {# accept: application/json, text/javascript, */*; q0.01,content-type: application/x-www-form-urlencode…

445. 两数相加 II

给你两个 非空 链表来代表两个非负整数。数字最高位位于链表开始位置。它们的每个节点只存储一位数字。将这两数相加会返回一个新的链表。 你可以假设除了数字 0 之外,这两个数字都不会以零开头。 示例1: 输入:l1 [7,2,4,3], l2 [5,6,4]…

2024腾讯一道笔试题--大小写字母移动

题目🍗 有一个字符数组,其中只有大写字母和小写字母,将小写字母移到前面, 大写字符移到后面,保持小写字母本身的顺序不变,大写字母本身的顺序不变, 注意,不要分配新的数组.(如:wCelOlME,变为wellCOME). 思路分析🍗 类似于冒泡排序,两两比较…

基于SpringBoot+Vue的疾病防控系统设计与实现(源码+文档+包运行)

一.系统概述 在如今社会上,关于信息上面的处理,没有任何一个企业或者个人会忽视,如何让信息急速传递,并且归档储存查询,采用之前的纸张记录模式已经不符合当前使用要求了。所以,对疾病防控信息管理的提升&a…

snort安装和使用

win10 x64安装snort 下载snort https://www.snort.org/downloads 下载npcap 0.9984版本 https://npcap.com/dist/ 安装npcap ,snort 安装成功 如果使用npcap版本不对或者使用winpcap会出现错误,winpcap不在win10运行。 snort.conf #-----------------------------------…

多ip证书实现多个ip地址https加密

在互联网快速发展的现在,很多用户会使用由正规数字证书颁发机构颁发的数字证书,其中IP数字证书就是只有公网IP地址网站的用户用来维护网站安全的手段。由于域名网站比较方便记忆,只有公网IP地址的网站是很少的,相应的IP数字证书产…

Python 入门指南(一)

原文:zh.annas-archive.org/md5/97bc15629f1b51a0671040c56db61b92 译者:飞龙 协议:CC BY-NC-SA 4.0 前言 这个学习路径帮助你在 Python 的世界中感到舒适。它从对 Python 的全面和实用的介绍开始。你将很快开始在学习路径的第一部分编写程序…

5. Mysql的binlog介绍

参考:InnoDB学习(三)之BinLog 1. BinLog介绍 BinLog又称为二进制日志,是MySQL服务层的数据日志,MySQL所有的存储引擎都支持BinLog。 BinLog记录了MySQL中的数据更新和可能导致数据更新的事件,可以用于主从…

逆向案例二十七——某笔网登录接口非对称加密算法RSA,涉及全扣代码,浏览器断点调试,和补环境

网址:aHR0cHM6Ly93d3cuZmVuYmkuY29tL3BhZ2UvaG9tZQ 点击账号密码登录,找到登陆的包,发现password进行了加密。 顿时,老生常谈,开始搜索,找到最有嫌疑的加密代码。进行搜索,进入js文件后&#x…

HarmonyOS开发实例:【任务延时调度】

介绍 本示例使用[ohos.WorkSchedulerExtensionAbility] 、[ohos.net.http]、[ohos.notification] 、[ohos.bundle]、[ohos.fileio] 等接口,实现了设置后台任务、下载更新包 、保存更新包、发送通知 、安装更新包实现升级的功能。 效果预览 使用说明 安装本应用之…

OpenHarmony南向开发案例【智慧中控面板(基于 Bearpi-Micro)】

1 开发环境搭建 【从0开始搭建开发环境】【快速搭建开发环境】 参考鸿蒙开发指导文档:gitee.com/li-shizhen-skin/harmony-os/blob/master/README.md点击或复制转到。 【注意】:快速上手教程第六步出拉取代码时需要修改代码仓库地址 在MobaXterm中输入…

vue3 vueUse 连接蓝牙

目录 vueuse安装: useBluetooth: 调用蓝牙API 扫描周期设备 选择设备配对 连接成功 vue3的网页项目连接电脑或者手机上的蓝牙设备,使用vueUse库,可以快速检查连接蓝牙设备。 vueUse库使用参考: VueUse工具库 常用api-CSDN…

Re65:读论文 GPT-3 Language Models are Few-Shot Learners

诸神缄默不语-个人CSDN博文目录 诸神缄默不语的论文阅读笔记和分类 论文全名:Language Models are Few-Shot Learners ArXiv网址:https://arxiv.org/abs/2005.14165 2020 NeurIPS:https://papers.nips.cc/paper/2020/hash/1457c0d6bfcb49674…

组织机构代码是哪几位?营业执照怎么看组织机构代码?

组织机构代码是哪几位? 组织机构代码通常指的是组织机构代码证上的一组特定数字,它用于唯一标识一个组织或机构。在中国,组织机构代码由9位数字组成,前8位是本体代码,最后1位是校验码。这组代码是按照国家有关标准编制的&#x…