Voice Conversion、DreamScene、X-SLAM、Panoptic-SLAM、DiffMap、TinySeg

本文首发于公众号:机器感知

Voice Conversion、DreamScene、X-SLAM、Panoptic-SLAM、DiffMap、TinySeg

图片

Converting Anyone's Voice: End-to-End Expressive Voice Conversion with a  Conditional Diffusion Model

图片

Expressive voice conversion (VC) conducts speaker identity conversion for emotional speakers by jointly converting speaker identity and emotional style. Emotional style modeling for arbitrary speakers in expressive VC has not been extensively explored. Previous approaches have relied on vocoders for speech reconstruction, which makes speech quality heavily dependent on the performance of vocoders. A major challenge of expressive VC lies in emotion prosody modeling. To address these challenges, this paper proposes a fully end-to-end expressive VC framework based on a conditional denoising diffusion probabilistic model (DDPM). We utilize speech units derived from self-supervised speech models as content conditioning, along with deep features extracted from speech emotion recognition and speaker verification systems to model emotional style and speaker identity. Objective and subjective evaluations show the effectiveness of our framework. Codes and samples are publicly available......

DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular  Videos

图片

Existing VLMs can track in-the-wild 2D video objects while current generative models provide powerful visual priors for synthesizing novel views for the highly under-constrained 2D-to-3D object lifting. Building upon this exciting progress, we present DreamScene4D, the first approach that can generate three-dimensional dynamic scenes of multiple objects from monocular in-the-wild videos with large object motion across occlusions and novel viewpoints. Our key insight is to design a "decompose-then-recompose" scheme to factorize both the whole video scene and each object's 3D motion. We first decompose the video scene by using open-vocabulary mask trackers and an adapted image diffusion model to segment, track, and amodally complete the objects and background in the video. Each object track is mapped to a set of 3D Gaussians that deform and move in space and time. We also factorize the observed motion into multiple components to handle fast motion. The camera motion can be infe......

X-SLAM: Scalable Dense SLAM for Task-aware Optimization using CSFD

图片

We present X-SLAM, a real-time dense differentiable SLAM system that leverages the complex-step finite difference (CSFD) method for efficient calculation of numerical derivatives, bypassing the need for a large-scale computational graph. The key to our approach is treating the SLAM process as a differentiable function, enabling the calculation of the derivatives of important SLAM parameters through Taylor series expansion within the complex domain. Our system allows for the real-time calculation of not just the gradient, but also higher-order differentiation. This facilitates the use of high-order optimizers to achieve better accuracy and faster convergence. Building on X-SLAM, we implemented end-to-end optimization frameworks for two important tasks: camera relocalization in wide outdoor scenes and active robotic scanning in complex indoor environments. Comprehensive evaluations on public benchmarks and intricate real scenes underscore the improvements in the accuracy of cam......

Panoptic-SLAM: Visual SLAM in Dynamic Environments using Panoptic  Segmentation

图片

The majority of visual SLAM systems are not robust in dynamic scenarios. The ones that deal with dynamic objects in the scenes usually rely on deep-learning-based methods to detect and filter these objects. However, these methods cannot deal with unknown moving objects. This work presents Panoptic-SLAM, an open-source visual SLAM system robust to dynamic environments, even in the presence of unknown objects. It uses panoptic segmentation to filter dynamic objects from the scene during the state estimation process. Panoptic-SLAM is based on ORB-SLAM3, a state-of-the-art SLAM system for static environments. The implementation was tested using real-world datasets and compared with several state-of-the-art systems from the literature, including DynaSLAM, DS-SLAM, SaD-SLAM, PVO and FusingPanoptic. For example, Panoptic-SLAM is on average four times more accurate than PVO, the most recent panoptic-based approach for visual SLAM. Also, experiments were performed using a quadruped ro......

Characterized Diffusion and Spatial-Temporal Interaction Network for  Trajectory Prediction in Autonomous Driving

图片

Trajectory prediction is a cornerstone in autonomous driving (AD), playing a critical role in enabling vehicles to navigate safely and efficiently in dynamic environments. To address this task, this paper presents a novel trajectory prediction model tailored for accuracy in the face of heterogeneous and uncertain traffic scenarios. At the heart of this model lies the Characterized Diffusion Module, an innovative module designed to simulate traffic scenarios with inherent uncertainty. This module enriches the predictive process by infusing it with detailed semantic information, thereby enhancing trajectory prediction accuracy. Complementing this, our Spatio-Temporal (ST) Interaction Module captures the nuanced effects of traffic scenarios on vehicle dynamics across both spatial and temporal dimensions with remarkable effectiveness. Demonstrated through exhaustive evaluations, our model sets a new standard in trajectory prediction, achieving state-of-the-art (SOTA) results on t......

Probablistic Restoration with Adaptive Noise Sampling for 3D Human Pose  Estimation

图片

The accuracy and robustness of 3D human pose estimation (HPE) are limited by 2D pose detection errors and 2D to 3D ill-posed challenges, which have drawn great attention to Multi-Hypothesis HPE research. Most existing MH-HPE methods are based on generative models, which are computationally expensive and difficult to train. In this study, we propose a Probabilistic Restoration 3D Human Pose Estimation framework (PRPose) that can be integrated with any lightweight single-hypothesis model. Specifically, PRPose employs a weakly supervised approach to fit the hidden probability distribution of the 2D-to-3D lifting process in the Single-Hypothesis HPE model and then reverse-map the distribution to the 2D pose input through an adaptive noise sampling strategy to generate reasonable multi-hypothesis samples effectively. Extensive experiments on 3D HPE benchmarks (Human3.6M and MPI-INF-3DHP) highlight the effectiveness and efficiency of PRPose. Code is available at: https://github.com......

DiffMap: Enhancing Map Segmentation with Map Prior Using Diffusion Model

图片

Constructing high-definition (HD) maps is a crucial requirement for enabling autonomous driving. In recent years, several map segmentation algorithms have been developed to address this need, leveraging advancements in Bird's-Eye View (BEV) perception. However, existing models still encounter challenges in producing realistic and consistent semantic map layouts. One prominent issue is the limited utilization of structured priors inherent in map segmentation masks. In light of this, we propose DiffMap, a novel approach specifically designed to model the structured priors of map segmentation masks using latent diffusion model. By incorporating this technique, the performance of existing semantic segmentation methods can be significantly enhanced and certain structural errors present in the segmentation outputs can be effectively rectified. Notably, the proposed module can be seamlessly integrated into any map segmentation model, thereby augmenting its capability to accurately d......

TinySeg: Model Optimizing Framework for Image Segmentation on Tiny  Embedded Systems

图片

Image segmentation is one of the major computer vision tasks, which is applicable in a variety of domains, such as autonomous navigation of an unmanned aerial vehicle. However, image segmentation cannot easily materialize on tiny embedded systems because image segmentation models generally have high peak memory usage due to their architectural characteristics. This work finds that image segmentation models unnecessarily require large memory space with an existing tiny machine learning framework. That is, the existing framework cannot effectively manage the memory space for the image segmentation models. This work proposes TinySeg, a new model optimizing framework that enables memory-efficient image segmentation for tiny embedded systems. TinySeg analyzes the lifetimes of tensors in the target model and identifies long-living tensors. Then, TinySeg optimizes the memory usage of the target model mainly with two methods: (i) tensor spilling into local or remote storage and (ii) ......

Efficient and Economic Large Language Model Inference with Attention  Offloading

图片

Transformer-based large language models (LLMs) exhibit impressive performance in generative tasks but introduce significant challenges in real-world serving due to inefficient use of the expensive, computation-optimized accelerators. This mismatch arises from the autoregressive nature of LLMs, where the generation phase comprises operators with varying resource demands. Specifically, the attention operator is memory-intensive, exhibiting a memory access pattern that clashes with the strengths of modern accelerators, especially as context length increases. To enhance the efficiency and cost-effectiveness of LLM serving, we introduce the concept of attention offloading. This approach leverages a collection of cheap, memory-optimized devices for the attention operator while still utilizing high-end accelerators for other parts of the model. This heterogeneous setup ensures that each component is tailored to its specific workload, maximizing overall performance and cost efficienc......

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.rhkb.cn/news/320307.html

如若内容造成侵权/违法违规/事实不符,请联系长河编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

数塔问题(蛮力算法和动态规划)

题目:如下图是一个数塔,从顶部出发在每一个节点可以选择向左或者向右走,一直走到底层,要求找出一条路径,使得路径上的数字之和最大,及路径情况。(使用蛮力算法和动态规划算法分别实现) #include…

c语言从入门到函数速成(1)

温馨提醒:本篇文章适合人群:刚学c又感觉那个地方不怎么懂的同学以及以及学了一些因为自身原因停学一段时间后又继续学c的同学 好,正片开始。 主函数 学c时最先学的是我们c语言程序的主体函数,c的主函数有两种写法,这…

【Docker】★★★

docker 的网络模式 ●host模式:使用 --nethost 指定 容器与宿主机共享网络命名空间、ip和端口 ●container模式:使用 --netcontainer:NAME_or_ID 指定 新建的容器共享已有容器的网络命名空间、ip和端口 ●none模式:使用 --netnone 指定 不进行…

被问了n遍的小程序地理位置权限开通方法

小程序地理位置接口有什么功能? 在平时我们在开发小程序时,难免会需要用到用户的地理位置信息的功能,小程序开发者开放平台新规要求如果没有申请开通微信小程序地理位置接口( getLocation ),但是在代码中却使用到了相关接口&#…

macOS sonoma 14.4.1编译JDK 12

macOS sonoma 14.4.1编译JDK 12 环境参考文档开始简述问题心路历程着手解决最终解决(前面有点啰嗦了,可以直接看这里) 记录一次靠自己看代码解决问题的经历(总之就是非常开心)。 首先,先diss一下bing,我差一点就放弃了。 环境 macOS sonom…

分布式与一致性协议之Raft算法(二)

Raft算法 什么是任期 我们知道,议会选举中的领导者是有任期的,当领导者任命到期后,需要重新再次选举。Raft算法中的领导者也是有任期,每个任期由单调递增的数字(任期编号)标识。比如,节点A的任期编号是1。任期编号会…

自定义类型

引言: C语言分为:1、内置类型 2、自定义类型: 1》结构体——每个成员有自己的空间 2》枚举型——定义多个常量 3》联合体——所有成员共用一快内存空间 一、结构体: 单独写过了一篇 关于结构体的介绍以及使用 二、 枚举型&…

论文辅助笔记:Tempo之modules/prompt.py

1 get_prompt_param_cls 2 get_prompt_value 3 Prompt 类 3.1 _init_weights 3.2 forward

表空间的概述

目录 表空间的属性 表空间的类型 永久性表空间(PermanentTablespace) 临时表空间(Temp Tablespace ) 撤销表空间(Undo Tablespace) 大文件表空间(BigfileTablespace) 表空间的状态 联机状态(Online) 读写状态(Read Write) 只读状态(Read) 脱机状态(Offline) Oracle从…

Elasticsearch初步认识

Elasticsearch初步认识 ES概述基本概念正向索引和倒排索引IK分词器ik_smart最少切分ik_max_word为最细粒度划分 ES索引库基本操作对索引库操作对文档操作 ES概述 Elasticsearch,简称为 ES,是一款非常强大的开源的高扩展的分布式全文检索引擎&#xff0c…

谁能取代迈巴赫,征服互联网安全大佬周鸿祎?

‍作者 |老缅 编辑 |德新 4月18日,「周鸿祎卖车」登上了微博热搜。这位360创始人、董事长发微博称:自己做了一个艰难的决定,将把陪伴9年的迈巴赫600给卖掉。 随后,他解释道:「这是因为我需要体验新一代车的感觉。古人…

opencv图像处理详细讲

传统的计算机视觉框架: SimpleCV BoofCV Dlib JavaCV 深度学习计算机视觉框架 Caffe Tensorflow Pytorch Paddlepaddle Keras 深度视觉计算机视觉框架 OpenVINO TensorRT onnxruntime Deepface YOLO/DarkNet mmdetection Paddle-detection/seg/ocr …

重学java 29.经典接口

光阴似箭,我好像跟不上 —— 24.5.6 一、java.lang.Comparable 我们知道基本数据类型的数据(除boolean类型外)需要比较大小的话,直接使用比较运算符即可,但是引用数据类型是不能直接使用比较运算符来比较大小的。那么,如何解决这个…

ECC 号码总结

1、问题背景 在手机开发过程中,经常遇见各种紧急号码问题,在此特意总结下紧急号码相关知识。 2、紧急号码来源 在MTK RILD EccNumberSource.h中,定义了如下几种紧急号码来源。 按优先级排序介绍如下 2.1、SOURCE_NETWORK 网络下发&#xff…

本地大语言模型LLM的高效运行专家 | Ollama

Ollama简介 Ollama是一个开源的大型语言模型服务工具,它帮助用户快速在本地运行大模型。通过简单的安装指令,用户可以执行一条命令就在本地运行开源大型语言模型,如Llama 2。Ollama极大地简化了在Docker容器内部署和管理LLM的过程&#xff0…

Linux网络设置

配置网络相关设置 一般包括如下内容: 主机名 IP/netmask A B 路由:默认网关 DNS服务器 主DNS服务器 次DNS服务器 第三个DNS服务器 ping baidu 网络配置命令 ifconfig ifconfig -a #表示显示所有网卡包括没有启动的网卡 ifconfig 网卡名称 [up|down…

考研数学|基础跟张宇,强化直接1000题还是先做660?

跟宇哥用1000题的,我愿称之为卷王之王!660对基础阶段是绝佳的查漏补缺,必做! 自我介绍一下:我21年一战数学83,总分没过线,22年二战143,逆袭上岸211!660是我的心头好&…

js api part4

其他事件 页面加载事件 外部资源(如图片、外联CSS和JavaScript等)加载完毕时触发的事件 原因:有些时候需要等页面资源全部处理完了做一些事情,老代码喜欢把 script 写在 head 中,这时候直接找 dom 元素找不到。 事件…

简单介绍IIC通信协议

文章目录 一,简单介绍二,IIC物理层三,IIC通信时序1.起始位与停止位2.IIC读写地址位信号3.IIC应答信号4.IIC数据位收发信号 四,总线速率五,主机发送数据流程六,主机接收数据流程七,IIC的时钟延展…

力扣每日一题109:有序链表转换二叉搜索树

题目 中等 给定一个单链表的头节点 head ,其中的元素 按升序排序 ,将其转换为 平衡 二叉搜索树。 示例 1: 输入: head [-10,-3,0,5,9] 输出: [0,-3,9,-10,null,5] 解释: 一个可能的答案是[0,-3,9,-10,null,5],它…