YOLO - pose detect 输入输出接口与执行效率测试

0.参考资料：

Pose - Ultralytics YOLO Docs

下面仅对这个模型的输入输出接口和效率做了判断，尚不涉及训练。

pose和segment 相对class detect是相对自然的扩展。object box内部的 subclass就是seg，object box 内部的point array 就是Pose。

面部识别可以做到极快，所以segment, pose也可以做到极快。

1.实测速度比对【yolo-class vs yolo-pose】

结论：pose的执行速度与普通的Detect模型相当。

一组执行时间：

mage 1/1 /dataset/pose/body.jpg: 448x640 2 persons, 44.7ms
Speed: 2.7ms preprocess, 44.7ms inference, 59.1ms postprocess per image at shape (1, 3, 448, 640)

比对仅仅进行物体识别，执行时间：

image 1/1 /dataset/pose/body.jpg: 448x640 2 persons, 48.0ms
Speed: 5.8ms preprocess, 48.0ms inference, 132.0ms postprocess per image at shape (1, 3, 448, 640)

2.理论速度比对

不同的模型推理速度相差不多。

Model	size (pixels)	mAPval 50-95	Speed CPU ONNX (ms)	Speed T4 TensorRT10 (ms)	params (M)	FLOPs (B)
YOLO11n	640	39.5	56.1 ± 0.8	1.5 ± 0.0	2.6	6.5
YOLO11s	640	47.0	90.0 ± 1.2	2.5 ± 0.0	9.4	21.5
YOLO11m	640	51.5	183.2 ± 2.0	4.7 ± 0.1	20.1	68.0
YOLO11l	640	53.4	238.6 ± 1.4	6.2 ± 0.1	25.3	86.9
YOLO11x	640	54.7	462.8 ± 6.7	11.3 ± 0.2	56.9	194.9

Model	size (pixels)	mAPpose 50-95	mAPpose 50	Speed CPU ONNX (ms)	Speed T4 TensorRT10 (ms)	params (M)	FLOPs (B)
YOLO11n-pose	640	50.0	81.0	52.4 ± 0.5	1.7 ± 0.0	2.9	7.6
YOLO11s-pose	640	58.9	86.3	90.5 ± 0.6	2.6 ± 0.0	9.9	23.2
YOLO11m-pose	640	64.9	89.4	187.3 ± 0.8	4.9 ± 0.1	20.9	71.7
YOLO11l-pose	640	66.1	89.9	247.7 ± 1.1	6.4 ± 0.1	26.2	90.7
YOLO11x-pose	640	69.5	91.1	488.0 ± 13.9	12.1 ± 0.2	58.8	203.3

3.输出数据

3.1 yolo 文档给出的17个项点：

Nose
Left Eye
Right Eye
Left Ear
Right Ear
Left Shoulder
Right Shoulder
Left Elbow
Right Elbow
Left Wrist
Right Wrist
Left Hip
Right Hip
Left Knee
Right Knee
Left Ankle
Right Ankle

3.2 实际检测数据

数据长度56 = 类型1(0:person) + box(left, top, width, height) + 17*(x, y, prob_value)

0 0.60229 0.518654 0.305345 0.725004 0.5296 0.247867 0.989055 0.538345 0.229076 0.987131 0.523728 0.236682 0.803624 0.565252 0.220562 0.97476 0 0 0.178567 0.605952 0.272757 0.99945 0.524902 0.320555 0.996575 0.67903 0.325687 0.998379 0.477179 0.385596 0.969531 0.720728 0.381518 0.997152 0.460585 0.311368 0.971978 0.632899 0.481503 0.999804 0.596195 0.502827 0.999446 0.554051 0.568819 0.999557 0.600891 0.602427 0.998719 0.601888 0.723575 0.994931 0.714637 0.736034 0.991084

3.2.1 格式化后的数据

类别 ID: 0.0
边界框信息 (x_center, y_center, width, height): [0.60229, 0.518654, 0.305345, 0.725004]
关键点信息:
Nose: 横坐标=0.5296, 纵坐标=0.247867, 可见性=0.989055
Left Eye: 横坐标=0.538345, 纵坐标=0.229076, 可见性=0.987131
Right Eye: 横坐标=0.523728, 纵坐标=0.236682, 可见性=0.803624
Left Ear: 横坐标=0.565252, 纵坐标=0.220562, 可见性=0.97476
Right Ear: 横坐标=0.0, 纵坐标=0.0, 可见性=0.178567
Left Shoulder: 横坐标=0.605952, 纵坐标=0.272757, 可见性=0.99945
Right Shoulder: 横坐标=0.524902, 纵坐标=0.320555, 可见性=0.996575
Left Elbow: 横坐标=0.67903, 纵坐标=0.325687, 可见性=0.998379
Right Elbow: 横坐标=0.477179, 纵坐标=0.385596, 可见性=0.969531
Left Wrist: 横坐标=0.720728, 纵坐标=0.381518, 可见性=0.997152
Right Wrist: 横坐标=0.460585, 纵坐标=0.311368, 可见性=0.971978
Left Hip: 横坐标=0.632899, 纵坐标=0.481503, 可见性=0.999804
Right Hip: 横坐标=0.596195, 纵坐标=0.502827, 可见性=0.999446
Left Knee: 横坐标=0.554051, 纵坐标=0.568819, 可见性=0.999557
Right Knee: 横坐标=0.600891, 纵坐标=0.602427, 可见性=0.998719
Left Ankle: 横坐标=0.601888, 纵坐标=0.723575, 可见性=0.994931
Right Ankle: 横坐标=0.714637, 纵坐标=0.736034, 可见性=0.991084

附录A pose数据格式化输出程序

# 人体部位名称列表
keypoint_names = ["Nose","Left Eye","Right Eye","Left Ear","Right Ear","Left Shoulder","Right Shoulder","Left Elbow","Right Elbow","Left Wrist","Right Wrist","Left Hip","Right Hip","Left Knee","Right Knee","Left Ankle","Right Ankle"
]# 假设的 YOLO Pose 训练集 label 数据行
label_data_str = " 0 0.844458 0.649356 0.310934 0.681836 0 0 0.18451 0 0 0.0718303 0 0 0.140077 0 0 0.0899211 0 0 0.225879 0 0 0.0909999 0.91247 0.856484 0.628418 0 0 0.022336 0 0 0.262855 0 0 0.10164 0 0 0.426727 0 0 0.009039 0 0 0.0300809 0 0 0.0136058 0 0 0.035202 0 0 0.0130479 0 0 0.0247065"# 将字符串数据转换为浮点数列表
label_data = [float(num) for num in label_data_str.split()]# 提取类别 ID 和边界框信息
class_id = label_data[0]
bbox = label_data[1:5]# 提取关键点信息
keypoints_data = label_data[5:]# 确保关键点数量和名称列表长度一致
if len(keypoints_data) == len(keypoint_names) * 3:print(f"类别 ID: {class_id}")print(f"边界框信息 (x_center, y_center, width, height): {bbox}")print("关键点信息:")for i in range(len(keypoint_names)):start_index = i * 3x, y, visibility = keypoints_data[start_index:start_index + 3]print(f"{keypoint_names[i]}: 横坐标={x}, 纵坐标={y}, 可见性={visibility}")
else:print("关键点数据长度和名称列表长度不匹配，无法正确对应。")

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.rhkb.cn/news/25722.html

如若内容造成侵权/违法违规/事实不符，请联系长河编程网进行投诉反馈email:809451989@qq.com，一经查实，立即删除！