目录
一.Pipline与工具栈
二.硬件设备概况
三.GPU视频编解码框架
四.VPI编译使用实例
五. jetson_multimedia_api编译使用实例
一.Pipline与工具栈
二.硬件设备概况
三.GPU视频编解码框架
- jetson设备目前不支持VPF框架,关于VPF的使用我在下节PC段使用X86进行安装与演示
- jetson目前支持的GPU编解码框架为VPI和jetson_multimedia_api
#1.主机端 agx@ubuntu:~$ ls /usr/src/jetson_multimedia_api/ argus data include LEGAL LICENSE Makefile README samples tools agx@ubuntu:~$ ls /opt/ containerd/ genymobile/ nvidia/ ota_package/ todesk/ agx@ubuntu:~$ ls /opt/nvidia/vpi2/ bin doc etc include lib lib64 samples share#2.docker端 agx@ubuntu:~$ docker images REPOSITORY TAG IMAGE ID CREATED SIZE nvcr.io/nvidia/l4t-pytorch r35.2.1-pth2.0-py3 853b58c1dce6 2 years ago 11.7GB agx@ubuntu:~$ docker exec -it nvpy bash root@7666a2ca87d3:/# ls /usr/src/jetson_multimedia_api/ LEGAL LICENSE Makefile README argus data include samples tools root@7666a2ca87d3:/# ls /opt/nvidia/vpi2/ bin doc etc include lib lib64 samples share
四.VPI编译使用实例
1.运行结果
root@7666a2ca87d3:/opt/nvidia/vpi2# cd s
samples/ share/
root@7666a2ca87d3:/opt/nvidia/vpi2# cd samples/
01-convolve_2d/ 03-harris_corners/ 05-benchmark/ 07-fft/ 09-tnr/ 11-fisheye/ 13-optflow_dense/ 15-image_view/ 17-template_matching/ assets/
02-stereo_disparity/ 04-rescale/ 06-klt_tracker/ 08-cross_aarch64_l4t/ 10-perspwarp/ 12-optflow_lk/ 14-background_subtractor/ 16-vpi_pytorch/ 18-orb_feature_detector/ tutorial_blur/
root@7666a2ca87d3:/opt/nvidia/vpi2# cd samples/01-convolve_2d/
root@7666a2ca87d3:/opt/nvidia/vpi2/samples/01-convolve_2d# python3 main.py --backend=cuda --input "/opt/nvidia/vpi2/share/backgrounds/NVIDIA_icon.png"
root@7666a2ca87d3:/opt/nvidia/vpi2/samples/01-convolve_2d#
2.源码
import sys
import vpi
import numpy as np
from PIL import Image
from argparse import ArgumentParser# Parse command line arguments
parser = ArgumentParser()
parser.add_argument('--backend', choices=['cpu','cuda','pva'],default="cuda",help='Backend to be used for processing')parser.add_argument('--input',default="/opt/nvidia/vpi2/share/backgrounds/NVIDIA_icon.png",help='Image to be used as input')args = parser.parse_args();if args.backend == 'cpu':backend = vpi.Backend.CPU
elif args.backend == 'cuda':backend = vpi.Backend.CUDA
else:assert args.backend == 'pva'backend = vpi.Backend.PVA# Load input into a vpi.Image
try:input = vpi.asimage(np.asarray(Image.open(args.input)))
except IOError:sys.exit("Input file not found")
except:sys.exit("Error with input file")# Convert it to grayscale
input = input.convert(vpi.Format.U8, backend=vpi.Backend.CUDA)# Define a simple edge detection kernel
kernel = [[ 1, 0, -1],[ 0, 0, 0],[-1, 0, 1]]# Using the chosen backend,
with backend:# Run input through the convolution filteroutput = input.convolution(kernel, border=vpi.Border.ZERO)# Save result to disk
Image.fromarray(output.cpu()).save('edges_python'+str(sys.version_info[0])+'_'+args.backend+'.png')
3.结果展示(上面用的是一个滤波)
五. jetson_multimedia_api编译使用实例
1.cuda h264编码(bug警告,能编译通过·但是无法OSD,后续两个实验直接在jetson-dektop上面实验的,就行了)
root@ubuntu:/usr/src/jetson_multimedia_api/samples/03_video_cuda_enc# make clean
root@ubuntu:/usr/src/jetson_multimedia_api/samples/03_video_cuda_enc# make
Compiling: video_cuda_enc_csvparser.cpp
Compiling: video_cuda_enc_main.cpp
make[1]: 进入目录“/usr/src/jetson_multimedia_api/samples/common/classes”
Compiling: NvElementProfiler.cpp
Compiling: NvElement.cpp
Compiling: NvApplicationProfiler.cpp
Compiling: NvVideoDecoder.cpp
Compiling: NvJpegEncoder.cpp
Compiling: NvBuffer.cpp
Compiling: NvLogging.cpp
Compiling: NvEglRenderer.cpp
Compiling: NvUtils.cpp
Compiling: NvDrmRenderer.cpp
Compiling: NvJpegDecoder.cpp
Compiling: NvVideoEncoder.cpp
Compiling: NvV4l2ElementPlane.cpp
Compiling: NvBufSurface.cpp
Compiling: NvV4l2Element.cpp
make[1]: 离开目录“/usr/src/jetson_multimedia_api/samples/common/classes”
make[1]: 进入目录“/usr/src/jetson_multimedia_api/samples/common/algorithm/cuda”
Compiling: NvAnalysis.cu
Compiling: NvCudaProc.cpp
make[1]: 离开目录“/usr/src/jetson_multimedia_api/samples/common/algorithm/cuda”
Linking: video_cuda_enc
root@ubuntu:/usr/src/jetson_multimedia_api/samples/03_video_cuda_enc# ./video_cuda_enc ../../data/Video/sample_outdoor_car_1080p_10fps.yuv 1920 1080 H264 test.h264
段错误 (核心已转储)
2. cuda h264解码
root@ubuntu:/usr/src/jetson_multimedia_api/samples/02_video_dec_cuda# ./video_dec_cuda ../../data/Video/sample_outdoor_car_1080p_10fps.h264 H264
Opening in BLOCKING MODE
NvMMLiteOpen : Block : BlockType = 261
NVMEDIA: Reading vendor.tegra.display-size : status: 6
NvMMLiteBlockCreate : Block : BlockType = 261
Starting decoder capture loop thread
Input file read complete
Video Resolution: 1920x1080
[INFO] (NvEglRenderer.cpp:110) <renderer0> Setting Screen width 1920 height 1080
Query and set capture successful
Exiting decoder capture loop thread
App run was successful
3.cuda h264解码+tensorrt目标检测:
GPU算法检测与结果缓存
root@ubuntu:/usr/src/jetson_multimedia_api/samples/02_video_dec_cuda# cd ../04_video_dec_trt/
root@ubuntu:/usr/src/jetson_multimedia_api/samples/04_video_dec_trt# ./video_dec_trt 2 ../../data/Video/sample_outdoor_car_1080p_10fps.h264 ../../data/Video/sample_outdoor_car_1080p_10fps.h264 H264 --trt-onnxmodel ../../data/Model/resnet10/resnet10_dynamic_batch.onnx --trt-mode 0
set onnx modefile: ../../data/Model/resnet10/resnet10_dynamic_batch.onnx
Using cached TRT model
Deserialization required 13048 microseconds.
Total per-runner device persistent memory is 5632
Total per-runner host persistent memory is 45440
Allocated activation device memory of size 22138880
Opening in BLOCKING MODE
NvMMLiteOpen : Block : BlockType = 261
NVMEDIA: Reading vendor.tegra.display-size : status: 6
NvMMLiteBlockCreate : Block : BlockType = 261
Starting decoder capture loop thread
Input file read complete
Video Resolution: 1920x1080
Opening in BLOCKING MODE
NvMMLiteOpen : Block : BlockType = 261
NVMEDIA: Reading vendor.tegra.display-size : status: 6
NvMMLiteBlockCreate : Block : BlockType = 261
Resolution change successful
Starting decoder capture loop thread
Input file read complete
Video Resolution: 1920x1080
Resolution change successful
Time elapsed:1 ms per frame in past 100 frames
Time elapsed:1 ms per frame in past 100 frames
Time elapsed:1 ms per frame in past 100 frames
Time elapsed:1 ms per frame in past 100 frames
Time elapsed:1 ms per frame in past 100 frames
Time elapsed:1 ms per frame in past 100 frames
CUDA-H264视频解码+OSD
root@ubuntu:/usr/src/jetson_multimedia_api/samples/02_video_dec_cuda# ./video_dec_cuda ../../data/Video/sample_outdoor_car_1080p_10fps.h264 H264 --bbox-file result0.txt
ctx.osd_file_path:result0.txt
Opening in BLOCKING MODE
NvMMLiteOpen : Block : BlockType = 261
NVMEDIA: Reading vendor.tegra.display-size : status: 6
NvMMLiteBlockCreate : Block : BlockType = 261
Starting decoder capture loop thread
Input file read complete
Video Resolution: 1920x1080
[INFO] (NvEglRenderer.cpp:110) <renderer0> Setting Screen width 1920 height 1080
Query and set capture successful
Exiting decoder capture loop thread
App run was successful
root@ubuntu:/usr/src/jetson_multimedia_api/samples/02_video_dec_cuda# ls
Makefile resuilt.txt result0.txt result1.txt result.txt videodec_csvparser.cpp videodec_csvparser.o video_dec_cuda videodec.h videodec_main.cpp videodec_main.o
root@ubuntu:/usr/src/jetson_multimedia_api/samples/02_video_dec_cuda# ./video_dec_cuda ../../data/Video/sample_outdoor_car_1080p_10fps.h264 H264
Opening in BLOCKING MODE
NvMMLiteOpen : Block : BlockType = 261
NVMEDIA: Reading vendor.tegra.display-size : status: 6
NvMMLiteBlockCreate : Block : BlockType = 261
Starting decoder capture loop thread
Input file read complete
Video Resolution: 1920x1080
[INFO] (NvEglRenderer.cpp:110) <renderer0> Setting Screen width 1920 height 1080
Query and set capture successful