3-机器人视觉-机器人抓取与操作

文章目录

3机器人视觉
- 目录
1. 传感器和标定
- 摄像头模型
- - Intrinsic Matrix
  - Extrinsic Matrix
- 标定
- - 内参标定
  - 手眼标定和外参标定
- 力传感器&其它传感器
- - 其它传感器
2. 神经网络和图像处理
- 2D特征处理
- - 常见架构
- 训练流程
- 推理流程
- 部署流程
- 2D 图像任务
- 3D Point Cloud Feature
- - PointNet Application
3. 3D位姿估计
- 分类
- 位姿估计数据集
- - BOP: Benchmark for 6D Object Pose Estimation
  - YCB数据集
- 位姿估计指标
- 传统方法
- PoseCNN
- DenseFusion
- YOLO6D
Category-level
- Unseen Object Pose Estimation
- Foundation Model
- Foundation Pose
Pose Estimation for Grasping
REF

3机器人视觉

1. 传感器和标定

摄像头模型

Pinhole Camera Model
3 coordinates
• World coordinate
• Camera coordinate
• Image coordinate (2D pixel)

在这里插入图片描述

Task:
• Given: pixel(u, v), and depth z;
• Compute: world coordinate （ $x\_w,y\_w,z\_w$ )

在这里插入图片描述

Intrinsic Matrix

T: 获取外界坐标到相机坐标

$f_x, f_y, c_x,c_y$
Distortion: S

c_x,c_y是像素分辨率的一半
• 假设相机传感器的宽度为36，高度为24，图像分辨率为6000*4000像素。如果相机的等效 35mm 焦距为50，则
在这里插入图片描述

Extrinsic Matrix

T：
• Model the transformation between camera coordinate to world coordinate
• Deal with “variable” frame (i.e. camera frame is moving)
在这里插入图片描述

标定

内参标定

• 内参矩阵K
• 畸变系数 $k_1,k_2,p_1,p_2,k_3$

工具：
• ROS，OpenCV（内置工具进行标定）

对棋盘格不同角度拍照测量
• 张氏标定法

在这里插入图片描述

手眼标定和外参标定

手眼标定
• 工具：ROS，OpenCV
• 流程（eye-in-hand）：
• 机械臂移动到不同位姿对标定板拍照
• 记录机械臂法兰位姿和对应的图片

眼在手上，求夹爪在相机坐标系下位姿
眼在手外：base_link在camera_link但坐标系下单位姿
在这里插入图片描述

机器人内参标定：

• 机器人内参误差一般小于摄像头外参带来的误差
• 相关产品需要验证机器人的内参误差（需要工程化验证）
• 标定方法
• 测量：一般为激光跟踪仪或者拖动到特定位置
• 算法：POE或者DH参数后，构建参数迭代

可参考：https://www.universal-robots.com/articles/ur/robot-care-maintenance/kinematic-robot-calibration/

手眼的实践问题(基于RGB-D的测量)
• 用于经典的pipeline
• 误差源较多（机器人内参，摄像头内外参，机器人工具到法兰，摄像头深度和RGB测量等），较难分离；
• 标定和验证流程较长，长期使用中出现精度下降问题比较难定位
• 用于数据生产，训练的模型可能会有硬件依赖问题

在这里插入图片描述

Depth摄像头：
• 结构光（Structured Light）：干扰，室外
• 飞行时间（Time of Flight, ToF）：干扰
• 双目视觉（Stereo Vision）：低纹理
Depth 信息：
• Pointcloud and depth image
• 有缺失值

在这里插入图片描述
Depth-RGB位置关系和标定
• 结构光，标定IR和RGB位置
其它：
• Pointcloud信息可以直接用于识别分割等任务
• RGB的识别任务，需要做2D到3D的投影

深度摄像头问题
实践中-深度缺失&不准问题
• 材料，光照，边缘
• 人体头发深度
• 玻璃深度
p 实际量产场景
• 入厂测试，
• 功能&参数测试
• 系统测试
• 供应商问题

力传感器&其它传感器

末端力传感器
• 末端力控

单轴,6轴或者3轴
• 安装和使用
在这里插入图片描述

关节扭矩传感器
• 电流估计，电磁式，应变片式
• 可以用于关节力控
• 可以用于末端六维力估计和末端力控（可靠性和性能相对不如直接测量）
在这里插入图片描述

实践中力传感器问题
• 零飘，异常数据等

其它传感器

• 编码器
在这里插入图片描述
电机侧编码器：18位一一2^18=262144
输出端多圈绝对值编码器：19位一一2^19=524288
关节减速比：1:101
问关节转90°，输出端编码器数值及电机转了多少度？

$输出端编码器数值=(90/360)*2^19$
电机转了 101*90=9090度

关节位置信息
• 触觉传感器
在这里插入图片描述

2. 神经网络和图像处理

在这里插入图片描述

优化视角
• 寻找最优网络参数组合，使得训练数据中的loss最小化；
主要元素
• 网络结构
•
特征处理和任务头
• 数据集和dataloader
• Loss function and optimizer
• Training and inference
在这里插入图片描述

2D特征处理

p Conv2d
在这里插入图片描述
这个关于维度的变换图经常在图像卷积中用到

在这里插入图片描述

p MLP(Linear)
p Other layers
• Pooling
• Activation
Max pooling
• normalization

在这里插入图片描述

在这里插入图片描述
一幅搞笑图片

Normalization

可以参考一文弄懂Batch Norm / Layer Norm / Instance Norm / Group Norm 归一化方法
在这里插入图片描述

常见架构

CNN
• Residual bock
• U-net
在这里插入图片描述

Transformer：
• ViT

训练流程

• 1 准备数据集
• 2 准备模型
• 3 准备Loss函数和优化器
• 4 训练循环（模型评估）
• 4.1 optimizer.zero_grad()
• 4.2 outputs = model(images)
• 4.3 loss = criterion(outputs, labels)
• 4.4 loss.backward()
• 4.5 optimizer.step()

在这里插入图片描述

推理流程

• 1 读取图像
• 2 加载模型参数
• 3 模型forward推理

训练流程

在这里插入图片描述

部署流程

在这里插入图片描述

2D 图像任务

p 常见任务
• 分类
• 检测
• 分割
• 其它：生成，人脸，OCR，抠图，降噪，检索等
p 机器人相关：
• 位姿估计和追踪
在这里插入图片描述

3D Point Cloud Feature

在这里插入图片描述
注意这里会用一个T-Net生成转换矩阵，这是考虑到点云在空间的坐标变换。

PointNet 示例
在这里插入图片描述

pointNet++
推荐阅读：https://zhuanlan.zhihu.com/p/266324173
关注局部范围的point
在这里插入图片描述
Autonomous Driving Prediction and ML Planning:
• PointNet for subgraph feature extraction

PointNet Application

Autonomous Driving Prediction and ML Planning:
• PointNet for subgraph feature extraction
在这里插入图片描述

在这里插入图片描述

3. 3D位姿估计

Vision-based robotic grasping from object localization, object pose estimation to grasp estimation for parallel grippers: a review

位姿估计数据集

在这里插入图片描述

BOP: Benchmark for 6D Object Pose Estimation

 BOP Tookit
 BOP Dataset：
https://bop.felk.cvut.cz/datasets/
可以直接使用huggingface cli和toolkit来准备相关数据集
部分数据集略有改动
Data: depth, rgb, model, camera_info, mask
 BOP Format：
• https://github.com/thodan/bop_toolkit/blob/master/docs/bop_datasets_format.md
 Leaderboard
在这里插入图片描述

YCB数据集

 21 YCB objects captured in 92 videos.
 常见物品, 在超市能够买到
 扩展数据集：
• DexYCB，YCB Affordance
• YCB-Sight: A visuo-tactile dataset (视觉和触觉两种模态数据集)
在这里插入图片描述

位姿估计指标

ADD和ADD-S较为常用.
 Visible Surface Discrepancy (VSD)
 Maximum Symmetry-Aware Surface Distance (MSSD)
 Maximum Symmetry-Aware Projection Distance (MSPD)
 average point distance (ADD): ref
 average closest point distance (ADD-S)
 其它：
• Intersection-over-Union (IoU) 3D
在这里插入图片描述

 BOP评估方式
• 对VSD，MSSD，MSPD增加threshold
• 然后对相关数据集计算AR，然后取平均
在这里插入图片描述

传统方法

SFM(Structure From Motion )
在这里插入图片描述

在这里插入图片描述

 2D Image
针对点匹配错误的问题
• PnP (Perspective-n-Point (PnP) )
• PnP+RANSAC（Random sample consensus）: ref

OpenCV solvePnPRansac()
在这里插入图片描述

3D PointCloud
• ICP
• pcl::IterativeClosestPoint

ICP Algorithm: Theory, Practice And Its SLAM-oriented Taxonomy
在这里插入图片描述

Instance-level Pose Estimation

 Correspondence based method
 Template-Based Methods
 Voting-based & Regression-Based Method
在这里插入图片描述

PoseCNN

Pose estimation with RGB Input
[]Model
Feature Extraction
Segmentation
Center point prediction
Rotation and translation regression
在这里插入图片描述

在这里插入图片描述

Task and Loss：
• Segmentation
• Center point prediction
* regress to the center direction for each pixel
* Hough voting
• Transformation prediction:
- PLoss: pose loss
- SLoss: shape match loss
计算两种loss,位姿和形状匹配
在这里插入图片描述

Model

Feature Extraction
Segmentation
Center point prediction
Rotation and translation regression

center point ,预测 x,y的方向, Td
在这里插入图片描述

DenseFusion

Pose Estimation with RGB and Depth Image
在这里插入图片描述

Feature:
• 在分割的物体上，通过CNN和PointNet的编码器分别提取图像和点云特征
• 在像素坐标下做特征融合（concat）
• 提取全局特征
• 全局特征和局部特征融合（concat）
在这里插入图片描述

 Head
• Translation, rotation, confidence
 Loss
 Pose Refinement
• Pose residual estimator

在这里插入图片描述

YOLO6D

 Simple and Fast
 feature extraction
• CNN
 Detection architecture
• Prediction 8 bbox points + 1 center point; and Class
• PnP for 3D estimation uses 9 control point correspondences
在这里插入图片描述
\

Category-level

Example application of category level perception:

Object detection in autonomous driving:
- Hierarchical categories:
- Car – Truck, SUV, Sedan, etc;
   Category-level pose estimation
  • 针对同类物体估计位姿（例如，杯子）
  • generalizing to objects within established categories

Category-level pose estimation
• 针对同类物体估计位姿（例如，杯子）
• generalizing to objects within established categories

NOCS:
https://github.com/hughw19/NOCS_CVPR2019
• Represent a category of objects
• Normalized Object Coordinate Space
•Predict NOCS map (x, y, z)
在这里插入图片描述

Data generation:

Mixed Reality data generation
- Real background with sim object
- Rendering with different lighting

在这里插入图片描述

Unseen Object Pose Estimation

 Input: CAD model, reference image
• No training on novel object
• Non-like category-level which requires training on category,
and alignment if using NOCS
 Traditional:
• template-based, or feature-based method
 Foundation Model
在这里插入图片描述

Foundation Model

 Foundation Pose
 SAM-6D
 FreeZe
在这里插入图片描述

在这里插入图片描述

Foundation Pose

https://github.com/NVlabs/FoundationPose/tree/main/learning/models

 Input:
• model-based, where a textured 3D CAD model of the object is
provided;
• model-free, where a set of reference images of the object is provided
 Good Performance in these tasks
• model-based, model-free;
• Pose estimation, pose tracking
在这里插入图片描述

 Pose generation data pipeline
• Hierarchical LLM from data generation

LLM Prompt for object description
LLM description for texture generation
• Physics engine for rendering

Pose Estimation for Grasping

 6D object pose estimation
 Grasping pose generation
 Pre-grasping pose
 Path planning and trajectory generation

REF

https://www.shenlanxueyuan.com/course/727/task/29418/show

 Deep Learning-Based Object Pose Estimation: A Comprehensive Survey; “https://github.com/CNJianLiu/Awesome-Object-Pose-Estimation”
 Vision-based Robotic Grasping From Object Localization, Object Pose Estimation to Grasp Estimation for Parallel Grippers: A Review
 Challenges for Monocular6D Object Pose Estimation in Robotics
 BOP: Benchmark for 6D Object Pose Estimation
 FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects
 DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion
 Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation
 PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes
 Real-Time Seamless Single Shot 6D Object Pose Prediction
 Computer Vision: A Modern Approach
 https://deeprob.org/w24/projects/project3/, Project 3 PoseCNN