PTQ4SAM、Mamba-Attention、AniTalker、IceFormer、U-DiTs、CogDPM

本文首发于公众号:机器感知

PTQ4SAM、Mamba-Attention、AniTalker、IceFormer、U-DiTs、CogDPM

图片

PTQ4SAM: Post-Training Quantization for Segment Anything

图片

Segment Anything Model (SAM) has achieved impressive performance in many computer vision tasks. However, as a large-scale model, the immense memory and computation costs hinder its practical deployment. In this paper, we propose a post-training quantization (PTQ) framework for Segment Anything Model, namely PTQ4SAM. First, we investigate the inherent bottleneck of SAM quantization attributed to the bimodal distribution in post-Key-Linear activations. We analyze its characteristics from both per-tensor and per-channel perspectives, and propose a Bimodal Integration strategy, which utilizes a mathematically equivalent sign operation to transform the bimodal distribution into a relatively easy-quantized normal distribution offline. Second, SAM encompasses diverse attention mechanisms (i.e., self-attention and two-way cross-attention), resulting in substantial variations in the post-Softmax distributions. Therefore, we introduce an Adaptive Granularity Quantization for Softmax th......

AniTalker: Animate Vivid and Diverse Talking Faces through  Identity-Decoupled Facial Motion Encoding

图片

The paper introduces AniTalker, an innovative framework designed to generate lifelike talking faces from a single portrait. Unlike existing models that primarily focus on verbal cues such as lip synchronization and fail to capture the complex dynamics of facial expressions and nonverbal cues, AniTalker employs a universal motion representation. This innovative representation effectively captures a wide range of facial dynamics, including subtle expressions and head movements. AniTalker enhances motion depiction through two self-supervised learning strategies: the first involves reconstructing target video frames from source frames within the same identity to learn subtle motion representations, and the second develops an identity encoder using metric learning while actively minimizing mutual information between the identity and motion encoders. This approach ensures that the motion representation is dynamic and devoid of identity-specific details, significantly reducing the n......

Matten: Video Generation with Mamba-Attention

图片

In this paper, we introduce Matten, a cutting-edge latent diffusion model with Mamba-Attention architecture for video generation. With minimal computational cost, Matten employs spatial-temporal attention for local video content modeling and bidirectional Mamba for global video content modeling. Our comprehensive experimental evaluation demonstrates that Matten has competitive performance with the current Transformer-based and GAN-based models in benchmark performance, achieving superior FVD scores and efficiency. Additionally, we observe a direct positive correlation between the complexity of our designed model and the improvement in video quality, indicating the excellent scalability of Matten. ......

SMCD: High Realism Motion Style Transfer via Mamba-based Diffusion

图片

Motion style transfer is a significant research direction in multimedia applications. It enables the rapid switching of different styles of the same motion for virtual digital humans, thus vastly increasing the diversity and realism of movements. It is widely applied in multimedia scenarios such as movies, games, and the Metaverse. However, most of the current work in this field adopts the GAN, which may lead to instability and convergence issues, making the final generated motion sequence somewhat chaotic and unable to reflect a highly realistic and natural style. To address these problems, we consider style motion as a condition and propose the Style Motion Conditioned Diffusion (SMCD) framework for the first time, which can more comprehensively learn the style features of motion. Moreover, we apply Mamba model for the first time in the motion style transfer field, introducing the Motion Style Mamba (MSM) module to handle longer motion sequences. Thirdly, aiming at the SMCD......

IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs

图片

One limitation of existing Transformer-based models is that they cannot handle very long sequences as input since their self-attention operations exhibit quadratic time and space complexity. This problem becomes especially acute when Transformers are deployed on hardware platforms equipped only with CPUs. To address this issue, we propose a novel method for accelerating self-attention at inference time that works with pretrained Transformer models out-of-the-box without requiring retraining. We experiment using our method to accelerate various long-sequence Transformers, including a leading LLaMA 2-based LLM, on various benchmarks and demonstrate a greater speedup of 2.73x - 7.63x while retaining 98.6% - 99.6% of the accuracy of the original pretrained models. The code is available on our project website at https://yuzhenmao.github.io/IceFormer/. ......

Efficient Text-driven Motion Generation via Latent Consistency Training

图片

Motion diffusion models have recently proven successful for text-driven human motion generation. Despite their excellent generation performance, they are challenging to infer in real time due to the multi-step sampling mechanism that involves tens or hundreds of repeat function evaluation iterations. To this end, we investigate a motion latent consistency Training (MLCT) for motion generation to alleviate the computation and time consumption during iteration inference. It applies diffusion pipelines to low-dimensional motion latent spaces to mitigate the computational burden of each function evaluation. Explaining the diffusion process with probabilistic flow ordinary differential equation (PF-ODE) theory, the MLCT allows extremely few steps infer between the prior distribution to the motion latent representation distribution via maintaining consistency of the outputs over the trajectory of PF-ODE. Especially, we introduce a quantization constraint to optimize motion latent r......

U-DiTs: Downsample Tokens in U-Shaped Diffusion Transformers

图片

Diffusion Transformers (DiTs) introduce the transformer architecture to diffusion tasks for latent-space image generation. With an isotropic architecture that chains a series of transformer blocks, DiTs demonstrate competitive performance and good scalability; but meanwhile, the abandonment of U-Net by DiTs and their following improvements is worth rethinking. To this end, we conduct a simple toy experiment by comparing a U-Net architectured DiT with an isotropic one. It turns out that the U-Net architecture only gain a slight advantage amid the U-Net inductive bias, indicating potential redundancies within the U-Net-style DiT. Inspired by the discovery that U-Net backbone features are low-frequency-dominated, we perform token downsampling on the query-key-value tuple for self-attention and bring further improvements despite a considerable amount of reduction in computation. Based on self-attention with downsampled tokens, we propose a series of U-shaped DiTs (U-DiTs) in the ......

From Generalization Analysis to Optimization Designs for State Space  Models

图片

A State Space Model (SSM) is a foundation model in time series analysis, which has recently been shown as an alternative to transformers in sequence modeling. In this paper, we theoretically study the generalization of SSMs and propose improvements to training algorithms based on the generalization results. Specifically, we give a \textit{data-dependent} generalization bound for SSMs, showing an interplay between the SSM parameters and the temporal dependencies of the training sequences. Leveraging the generalization bound, we (1) set up a scaling rule for model initialization based on the proposed generalization measure, which significantly improves the robustness of the output value scales on SSMs to different temporal patterns in the sequence data; (2) introduce a new regularization method for training SSMs to enhance the generalization performance. Numerical results are conducted to validate our results. ......

CogDPM: Diffusion Probabilistic Models via Cognitive Predictive Coding

图片

Predictive Coding (PC) is a theoretical framework in cognitive science suggesting that the human brain processes cognition through spatiotemporal prediction of the visual world. Existing studies have developed spatiotemporal prediction neural networks based on the PC theory, emulating its two core mechanisms: Correcting predictions from residuals and hierarchical learning. However, these models do not show the enhancement of prediction skills on real-world forecasting tasks and ignore the Precision Weighting mechanism of PC theory. The precision weighting mechanism posits that the brain allocates more attention to signals with lower precision, contributing to the cognitive ability of human brains. This work introduces the Cognitive Diffusion Probabilistic Models (CogDPM), which demonstrate the connection between diffusion probabilistic models and PC theory. CogDPM features a precision estimation method based on the hierarchical sampling capabilities of diffusion models and we......

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.rhkb.cn/news/321277.html

如若内容造成侵权/违法违规/事实不符,请联系长河编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Linux\_c输出

第一条Linux_c输出 初界面 : ls # 显示目录下的文件cd # 进入到某个目录 # 比如 我进入了Codels # 发现没有显示, 说明为文件下为空vim cpucdoe.c # 创建一个 .c的源码文件进入到了vim的编辑界面: i # 按i 就可以进行编辑 , 下面显示插入标识在编辑模式下, 可以通…

【Linux】文件内容相关的命令,补充:管道符

1、查看文件内容 (1-1)查看文件内容:cat,tac,head,tail 查看文件内容cat 文件名查看文件内容并显示行号cat -n 文件名倒着查看文件内容(从最后一行开始)tac 文件名查看文件前10行…

KDTree空间搜索算法学习

目录 KDTree(K-Dimensional Tree)原理步骤空间索引建立例子[^1]回溯搜索例子[^2] 相关包案例[^3]数据KDTree 识别轨道衔接出行轨道衔接单车骑行范围分析结果保存 KDTree(K-Dimensional Tree)原理 将需要匹配的 K 维空间点建立 K …

Git中单独的功能特性分支是什么含义

在Git中,一个"功能特性分支"(通常简称为“特性分支”)是指从主开发分支(比如main或master)独立出来的分支,专门用于开发一个新功能、修复一个bug,或者进行实验性的尝试。使用特性分支…

开源的聊天服务器tigase 7.1.3 相关文档

官方的api文档 7.1.3: Tigase Administration Guide github地址: Release 7.1.3 tigase/tigase-server GitHub 安装教程: Tigase手动安装过程-腾讯云开发者社区-腾讯云

Pascal Content数据集

如果您想使用Pascal Context数据集,请安装Detail,然后运行以下命令将注释转换为正确的格式。 1.安装Detail 进入项目终端 #即 这是在我自己的项目下直接进行克隆操作: git clone https://github.com/zhanghang1989/detail-api.git $PASCAL…

标准IO函数-将bmp图片修改为德国国旗样式

代码&#xff1a; #include <stdio.h> #include <string.h> #include <stdlib.h> #include <unistd.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <pthread.h> #include <semaphore.h…

C#修改默认参数settings文件

右击项目在设置中进行修改&#xff1a; 千万不要在这里改。 如果要在自己的项目里添加这个文件&#xff0c;首先新建个文件夹&#xff0c;然后添加.setting文件&#xff0c;然后再像上面说的那样添加属性。

unity华为sdk接入指路指南

目前比较靠谱的几个方案&#xff1a;试过几个仅供参考 温馨提示&#xff1a;最高目前可支持方案到unity2021版本以下&#xff0c;以上请联系华为官方寻求技术支持 Unity集成华为游戏服务SDK方式&#xff08;一&#xff09;&#xff1a;集成Unity官方游戏SDK&#xff1a; 华为…

Quora 首席执行官亚当·德安杰洛 (Adam D’Angelo) 谈论了 AI、聊天机器人平台 Poe,以及 OpenAI 为什么不是竞争对手

每周跟踪AI热点新闻动向和震撼发展 想要探索生成式人工智能的前沿进展吗&#xff1f;订阅我们的简报&#xff0c;深入解析最新的技术突破、实际应用案例和未来的趋势。与全球数同行一同&#xff0c;从行业内部的深度分析和实用指南中受益。不要错过这个机会&#xff0c;成为AI领…

精酿啤酒:种类与风格的多样性探索

啤酒&#xff0c;这一古老的酒精饮品&#xff0c;随着时代的发展与技术的进步&#xff0c;已经衍生出了无数种类与风格。其中&#xff0c;精酿啤酒在近年来备受瞩目&#xff0c;以其与众不同的酿造工艺和风味&#xff0c;成为了啤酒爱好者们的新宠。Fendi club 啤酒&#xff0c…

【PCIE】基于PCIE4C的数据传输(四)——使用MSIX中断

基于PCIE4C的数据传输&#xff08;三&#xff09;——遗留中断与MSI中断 一文介绍了遗留中断与MSI中断两种中断方式的代码实现&#xff0c;本文继续基于Xilinx UltrascaleHBM VCU128开发板与linux&#xff08;RHEL8.9&#xff09;&#xff0c;介绍MSIX中断方式的代码实现。本文…

ROS机器人实用技术与常见问题解决

问题速查手册&#xff08;时实更新&#xff09;更加全面丰富的问题手册记录 1.机器人使用GPARTED挂载未分配空间 需要在图型界面下操作&#xff0c;建议使用no machine连接 安装gparted磁盘分区工具, sudo apt-get install gparted -y 启动软件 sudo gparted 点击磁盘/内存…

ILI9341显示驱动芯片的使用

ILI9341是一种常见的TFT LCD显示驱动芯片&#xff0c;它在众多的应用中都有广泛的使用。这种芯片的一个显著特点是它支持16位RGB565颜色&#xff0c;这意味着它可以显示多达65536种不同的颜色。这使得ILI9341能够提供鲜艳、生动的色彩效果&#xff0c;对于需要表现丰富色彩的应…

外网禅道配置

exportfs -avrf 修改代码&#xff0c;避免启动太慢&#xff1a;vi /opt/zbox/bin/zbox.php 启动和停止 /opt/zbox/zbox start /opt/zbox/zbox stop

【MATLAB源码-第204期】基于matlab的语音降噪算法对比仿真,谱减法、维纳滤波法、自适应滤波法;参数可调。

操作环境&#xff1a; MATLAB 2022a 1、算法描述 语音降噪技术的目的是改善语音信号的质量&#xff0c;通过减少或消除背景噪声&#xff0c;使得语音更清晰&#xff0c;便于听者理解或进一步的语音处理任务&#xff0c;如语音识别和语音通讯。在许多实际应用中&#xff0c;如…

保研面试408复习 1——操作系统、计网、计组

文章目录 1、操作系统一、操作系统的特点和功能二、中断和系统调用的区别 2、计算机组成原理一、冯诺依曼的三个要点二、MIPS&#xff08;每秒百万条指令&#xff09;三、CPU执行时间和CPI 3、计算机网络一、各个层常用协议二、网络协议实验——数据链路层a.网络速率表示b.数据…

Linux中的YUM源仓库和NFS文件共享服务

目录 1.YUM仓库服务 1.1 YUM概述 1.2 准备安装源 1.3 搭建yum本地ftp源仓库 1.4 yum在线源替换方法 1.5 yum的常用操作命令 2.NFS文件共享服务 2.1 NFS&#xff08;共享存储服务&#xff09;简介 2.2 NFS服务的实现 2.3 使用NFS发布共享资源 2.4 NSF配置 2.5 如何指…

matlab

图像配准&#xff1a; %手动选择执行图片(由于程序为分开&#xff0c;此处保存的mat文件为图MRI6的信息&#xff0c;所以请选择图MRI6) [filename,pathname]uigetfile({*.jpg;*.bmp;*.tif;*.png;*.gif,All Image Files;*.*,All Files}); image imread([pathname,filename]); …

LNMP一键安装包

LNMP一键安装包是什么? LNMP一键安装包是一个用Linux Shell编写的可以为CentOS/RHEL/Fedora/Debian/Ubuntu/Raspbian/Deepin/Alibaba/Amazon/Mint/Oracle/Rocky/Alma/Kali/UOS/银河麒麟/openEuler/Anolis OS Linux VPS或独立主机安装LNMP(Nginx/MySQL/PHP)、LNMPA(Nginx/MySQ…