0.参考资料:
Pose - Ultralytics YOLO Docs
下面仅对这个模型的输入输出接口和效率做了判断,尚不涉及训练。
pose和segment 相对class detect是相对自然的扩展。object box内部的 subclass就是seg,object box 内部的point array 就是Pose。
面部识别可以做到极快,所以segment, pose也可以做到极快。
1.实测速度比对【yolo-class vs yolo-pose】
结论:pose的执行速度与普通的Detect模型相当。
一组执行时间:
mage 1/1 /dataset/pose/body.jpg: 448x640 2 persons, 44.7ms
Speed: 2.7ms preprocess, 44.7ms inference, 59.1ms postprocess per image at shape (1, 3, 448, 640)
比对仅仅进行物体识别,执行时间:
image 1/1 /dataset/pose/body.jpg: 448x640 2 persons, 48.0ms
Speed: 5.8ms preprocess, 48.0ms inference, 132.0ms postprocess per image at shape (1, 3, 448, 640)
2.理论速度比对
不同的模型推理速度相差不多。
Model | size (pixels) | mAPval 50-95 | Speed CPU ONNX (ms) | Speed T4 TensorRT10 (ms) | params (M) | FLOPs (B) |
---|---|---|---|---|---|---|
YOLO11n | 640 | 39.5 | 56.1 ± 0.8 | 1.5 ± 0.0 | 2.6 | 6.5 |
YOLO11s | 640 | 47.0 | 90.0 ± 1.2 | 2.5 ± 0.0 | 9.4 | 21.5 |
YOLO11m | 640 | 51.5 | 183.2 ± 2.0 | 4.7 ± 0.1 | 20.1 | 68.0 |
YOLO11l | 640 | 53.4 | 238.6 ± 1.4 | 6.2 ± 0.1 | 25.3 | 86.9 |
YOLO11x | 640 | 54.7 | 462.8 ± 6.7 | 11.3 ± 0.2 | 56.9 | 194.9 |
Model | size (pixels) | mAPpose 50-95 | mAPpose 50 | Speed CPU ONNX (ms) | Speed T4 TensorRT10 (ms) | params (M) | FLOPs (B) |
---|---|---|---|---|---|---|---|
YOLO11n-pose | 640 | 50.0 | 81.0 | 52.4 ± 0.5 | 1.7 ± 0.0 | 2.9 | 7.6 |
YOLO11s-pose | 640 | 58.9 | 86.3 | 90.5 ± 0.6 | 2.6 ± 0.0 | 9.9 | 23.2 |
YOLO11m-pose | 640 | 64.9 | 89.4 | 187.3 ± 0.8 | 4.9 ± 0.1 | 20.9 | 71.7 |
YOLO11l-pose | 640 | 66.1 | 89.9 | 247.7 ± 1.1 | 6.4 ± 0.1 | 26.2 | 90.7 |
YOLO11x-pose | 640 | 69.5 | 91.1 | 488.0 ± 13.9 | 12.1 ± 0.2 | 58.8 | 203.3 |
3.输出数据
3.1 yolo 文档给出的17个项点:
- Nose
- Left Eye
- Right Eye
- Left Ear
- Right Ear
- Left Shoulder
- Right Shoulder
- Left Elbow
- Right Elbow
- Left Wrist
- Right Wrist
- Left Hip
- Right Hip
- Left Knee
- Right Knee
- Left Ankle
- Right Ankle
3.2 实际检测数据
数据长度56 = 类型1(0:person) + box(left, top, width, height) + 17*(x, y, prob_value)
0 0.60229 0.518654 0.305345 0.725004 0.5296 0.247867 0.989055 0.538345 0.229076 0.987131 0.523728 0.236682 0.803624 0.565252 0.220562 0.97476 0 0 0.178567 0.605952 0.272757 0.99945 0.524902 0.320555 0.996575 0.67903 0.325687 0.998379 0.477179 0.385596 0.969531 0.720728 0.381518 0.997152 0.460585 0.311368 0.971978 0.632899 0.481503 0.999804 0.596195 0.502827 0.999446 0.554051 0.568819 0.999557 0.600891 0.602427 0.998719 0.601888 0.723575 0.994931 0.714637 0.736034 0.991084
3.2.1 格式化后的数据
类别 ID: 0.0
边界框信息 (x_center, y_center, width, height): [0.60229, 0.518654, 0.305345, 0.725004]
关键点信息:
Nose: 横坐标=0.5296, 纵坐标=0.247867, 可见性=0.989055
Left Eye: 横坐标=0.538345, 纵坐标=0.229076, 可见性=0.987131
Right Eye: 横坐标=0.523728, 纵坐标=0.236682, 可见性=0.803624
Left Ear: 横坐标=0.565252, 纵坐标=0.220562, 可见性=0.97476
Right Ear: 横坐标=0.0, 纵坐标=0.0, 可见性=0.178567
Left Shoulder: 横坐标=0.605952, 纵坐标=0.272757, 可见性=0.99945
Right Shoulder: 横坐标=0.524902, 纵坐标=0.320555, 可见性=0.996575
Left Elbow: 横坐标=0.67903, 纵坐标=0.325687, 可见性=0.998379
Right Elbow: 横坐标=0.477179, 纵坐标=0.385596, 可见性=0.969531
Left Wrist: 横坐标=0.720728, 纵坐标=0.381518, 可见性=0.997152
Right Wrist: 横坐标=0.460585, 纵坐标=0.311368, 可见性=0.971978
Left Hip: 横坐标=0.632899, 纵坐标=0.481503, 可见性=0.999804
Right Hip: 横坐标=0.596195, 纵坐标=0.502827, 可见性=0.999446
Left Knee: 横坐标=0.554051, 纵坐标=0.568819, 可见性=0.999557
Right Knee: 横坐标=0.600891, 纵坐标=0.602427, 可见性=0.998719
Left Ankle: 横坐标=0.601888, 纵坐标=0.723575, 可见性=0.994931
Right Ankle: 横坐标=0.714637, 纵坐标=0.736034, 可见性=0.991084
附录A pose数据格式化输出程序
# 人体部位名称列表
keypoint_names = ["Nose","Left Eye","Right Eye","Left Ear","Right Ear","Left Shoulder","Right Shoulder","Left Elbow","Right Elbow","Left Wrist","Right Wrist","Left Hip","Right Hip","Left Knee","Right Knee","Left Ankle","Right Ankle"
]# 假设的 YOLO Pose 训练集 label 数据行
label_data_str = " 0 0.844458 0.649356 0.310934 0.681836 0 0 0.18451 0 0 0.0718303 0 0 0.140077 0 0 0.0899211 0 0 0.225879 0 0 0.0909999 0.91247 0.856484 0.628418 0 0 0.022336 0 0 0.262855 0 0 0.10164 0 0 0.426727 0 0 0.009039 0 0 0.0300809 0 0 0.0136058 0 0 0.035202 0 0 0.0130479 0 0 0.0247065"# 将字符串数据转换为浮点数列表
label_data = [float(num) for num in label_data_str.split()]# 提取类别 ID 和边界框信息
class_id = label_data[0]
bbox = label_data[1:5]# 提取关键点信息
keypoints_data = label_data[5:]# 确保关键点数量和名称列表长度一致
if len(keypoints_data) == len(keypoint_names) * 3:print(f"类别 ID: {class_id}")print(f"边界框信息 (x_center, y_center, width, height): {bbox}")print("关键点信息:")for i in range(len(keypoint_names)):start_index = i * 3x, y, visibility = keypoints_data[start_index:start_index + 3]print(f"{keypoint_names[i]}: 横坐标={x}, 纵坐标={y}, 可见性={visibility}")
else:print("关键点数据长度和名称列表长度不匹配,无法正确对应。")