在Ubuntu上安装CUDA和cuDNN以及验证安装步骤

在Ubuntu上安装CUDA和cuDNN以及验证安装步骤

本教程详细介绍了如何在Ubuntu操作系统上安装CUDA(NVIDIA的并行计算平台)和cuDNN(深度神经网络库),以及如何验证安装是否成功。通过按照这些步骤操作,您将能够配置您的系统以利用GPU加速深度学习和其他计算密集型任务。此外,还包括如何设置环境变量和编译运行示例代码以验证CUDA和cuDNN的正常运行。

  • 安装 CUDA
  • 通过网络仓库安装CUDA(适用于Ubuntu)
  • 配置环境变量
  • 验证安装
  • 安装 cuDNN
  • 验证 cuDNN

安装 CUDA

在安装CUDA之前,我们需要进行一些预安装操作。首先,您需要安装当前正在运行的内核的头文件和开发包。打开终端并执行以下命令:

sudo apt-get install linux-headers-$(uname -r)

接下来,您需要删除过时的签名密钥:

sudo apt-key del 7fa2af80

通过网络仓库安装CUDA(适用于Ubuntu)

新的CUDA存储库的GPG公钥是3bf863cc。您可以通过cuda-keyring包或手动方法将其添加到系统中,不建议使用apt-key命令。执行以下步骤:

  1. 安装新的cuda-keyring包。根据您的系统版本替换$distro/$arch
wget https://developer.download.nvidia.com/compute/cuda/repos/$distro/$arch/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb

$distro/$arch 应该根据以下选项之一进行替换:

  • ubuntu1604/x86_64:适用于 Ubuntu 16.04 64位版本。
  • ubuntu1804/cross-linux-sbsa:适用于 Ubuntu 18.04 交叉编译版本(SBSA 架构)。
  • ubuntu1804/ppc64el:适用于 Ubuntu 18.04 64位 PowerPC 架构版本。
    * ubuntu1804/sbsa:适用于 Ubuntu 18.04 SBSA 架构版本。
  • ubuntu1804/x86_64:适用于 Ubuntu 18.04 64位版本。
  • ubuntu2004/cross-linux-aarch64:适用于 Ubuntu 20.04 交叉编译版本(AArch64 架构)。
  • ubuntu2004/arm64:适用于 Ubuntu 20.04 64位 ARM 架构版本。
  • ubuntu2004/cross-linux-sbsa:适用于 Ubuntu 20.04 交叉编译版本(SBSA 架构)。
  • ubuntu2004/sbsa:适用于 Ubuntu 20.04 SBSA 架构版本。
  • ubuntu2004/x86_64:适用于 Ubuntu 20.04 64位版本。
  • ubuntu2204/sbsa:适用于 Ubuntu 22.04 SBSA 架构版本。
  • ubuntu2204/x86_64:适用于 Ubuntu 22.04 64位版本。
    根据您的Ubuntu版本和架构选择适当的替代项来执行相应的安装步骤。
  1. 更新Apt仓库缓存:
sudo apt-get update
  1. 安装 CUDA SDK:
    您可以使用以下命令获取可用的CUDA包列表:
cat /var/lib/apt/lists/*cuda*Packages | grep "Package:"

或查看下方列表:

Meta PackagePurpose
cudaInstalls all CUDA Toolkit and Driver packages. Handles upgrading to the next version of the cuda package when it’s released.
cuda-12-2Installs all CUDA Toolkit and Driver packages. Remains at version 12.1 until an additional version of CUDA is installed.
cuda-toolkit-12-2Installs all CUDA Toolkit packages required to develop CUDA applications. Does not include the driver.
cuda-toolkit-12Installs all CUDA Toolkit packages required to develop applications. Will not upgrade beyond the 12.x series toolkits. Does not include the driver.
cuda-toolkitInstalls all CUDA Toolkit packages required to develop applications. Handles upgrading to the next 12.x version of CUDA when it’s released. Does not include the driver.
cuda-tools-12-2Installs all CUDA command line and visual tools.
cuda-runtime-12-2Installs all CUDA Toolkit packages required to run CUDA applications, as well as the Driver packages.
cuda-compiler-12-2Installs all CUDA compiler packages.
cuda-libraries-12-2Installs all runtime CUDA Library packages.
cuda-libraries-dev-12-2Installs all development CUDA Library packages.
cuda-driversInstalls all Driver packages. Handles upgrading to the next version of the Driver packages when they’re released.

选择你需要的包进行安装,这里选择 cuda-11.8

sudo apt-get install cuda-11-8

此安装包中包含显卡驱动,安装过程中,会让你输入密码,请记住该密码,后面重启电脑进入 Perform MOK managment 会使用到。

  1. 安装完成后,重新启动系统:
sudo reboot

配置 Perform MOK managment
MOK management
选择 Enroll MOK (注册)-> 选择 Continue -> 选择 Enroll the key -> 选择 Yes -> 键入步骤3中输入的密码->选择 Reboot 重启电脑,完成英伟达显卡驱动安装。

配置环境变量

  1. 使用 vim 编辑 ~/.bashrc 文件。
sudo vim ~/.bashrc
  1. 在文件结尾添加以下内容:
export PATH=/usr/local/cuda-11.8/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-11.8/lib64\${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

${PATH:+:${PATH}} 是一个用于设置环境变量的 Bash Shell 中的特殊语法。它的作用是在添加新路径到环境变量时,确保如果原始变量(在这种情况下是 $PATH)已经包含一些路径,那么新路径会添加在原有路径的末尾,而且它们之间会用冒号 : 分隔。
具体来说,${PATH:+:${PATH}} 的含义是:
如果 $PATH 已经定义(非空),那么它会在新路径之前加上一个冒号 :,然后再添加新路径。
如果 $PATH 未定义或为空,那么它只会添加新路径,不会加冒号。
这个语法的目的是确保在向 $PATH 添加新路径时,保持路径之间用冒号分隔,以确保环境变量的正确格式。这在很多环境变量的设置中都很有用,因为它避免了路径之间缺少分隔符而导致的错误。

LD_LIBRARY_PATH 是一个环境变量,用于指定动态链接器(dynamic linker)在运行可执行文件时搜索共享库文件(动态链接库或共享对象文件)的路径。在 Linux 和类Unix系统中,共享库文件包含在各种程序中,允许多个程序共享相同的库,从而减少内存占用并提高系统的效率。

  1. 刷新配置
    在终端中运行以下命令,以使新的环境变量设置生效:
source ~/.bashrc

验证安装

首先,我们需要安装一些CUDA示例所需的第三方库。这些示例通常会在构建过程中检测所需的库,但如果未检测到,您需要手动安装它们。打开终端并执行以下命令:

sudo apt-get install g++ freeglut3-dev build-essential libx11-dev \libxmu-dev libxi-dev libglu1-mesa libglu1-mesa-dev libfreeimage-dev

完成第三方库依赖安装后,从 github 下载 https://github.com/nvidia/cuda-samples 源代码。

下载完成后,可以使用以下命令编译:

cd cuda-sample
sudo make

注意切换到你安装 cuda 版本的分支,这里是 v11.8。

可以完成整个编译,那么说明安装过程没有问题了。

在源代码目录执行 ./bin/x86_64/linux/release/deviceQuery 命令,结果如下所示:

cheungxiongwei@root:~/Source/cuda-samples$ ./bin/x86_64/linux/release/deviceQuery
./bin/x86_64/linux/release/deviceQuery Starting...CUDA Device Query (Runtime API) version (CUDART static linking)Detected 1 CUDA Capable device(s)Device 0: "NVIDIA GeForce RTX 4060 Laptop GPU"CUDA Driver Version / Runtime Version          12.2 / 11.8CUDA Capability Major/Minor version number:    8.9Total amount of global memory:                 7940 MBytes (8325824512 bytes)
MapSMtoCores for SM 8.9 is undefined.  Default to use 128 Cores/SM
MapSMtoCores for SM 8.9 is undefined.  Default to use 128 Cores/SM(024) Multiprocessors, (128) CUDA Cores/MP:    3072 CUDA CoresGPU Max Clock rate:                            2250 MHz (2.25 GHz)Memory Clock rate:                             8001 MhzMemory Bus Width:                              128-bitL2 Cache Size:                                 33554432 bytesMaximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layersMaximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layersTotal amount of constant memory:               65536 bytesTotal amount of shared memory per block:       49152 bytesTotal shared memory per multiprocessor:        102400 bytesTotal number of registers available per block: 65536Warp size:                                     32Maximum number of threads per multiprocessor:  1536Maximum number of threads per block:           1024Max dimension size of a thread block (x,y,z): (1024, 1024, 64)Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)Maximum memory pitch:                          2147483647 bytesTexture alignment:                             512 bytesConcurrent copy and kernel execution:          Yes with 2 copy engine(s)Run time limit on kernels:                     YesIntegrated GPU sharing Host Memory:            NoSupport host page-locked memory mapping:       YesAlignment requirement for Surfaces:            YesDevice has ECC support:                        DisabledDevice supports Unified Addressing (UVA):      YesDevice supports Managed Memory:                YesDevice supports Compute Preemption:            YesSupports Cooperative Kernel Launch:            YesSupports MultiDevice Co-op Kernel Launch:      YesDevice PCI Domain ID / Bus ID / location ID:   0 / 1 / 0Compute Mode:< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.2, CUDA Runtime Version = 11.8, NumDevs = 1
Result = PASS

安装 cuDNN

安装 cuDNN库和 cuDNN 示例

sudo apt-get install libcudnn8=${cudnn_version}-1+${cuda_version}
sudo apt-get install libcudnn8-dev=${cudnn_version}-1+${cuda_version}
sudo apt-get install libcudnn8-samples=${cudnn_version}-1+${cuda_version}

根据以下内容进行替换:
${cudnn_version} is 8.9.4.*
${cuda_version} is cuda12.2 or cuda11.8

使用以下命令查找与 cuDNN 版本 “libcudnn8” 相关的软件包信息

cat /var/lib/apt/lists/*cuda*Packages | grep "./libcudnn8"

输出结果如下所示:

cheungxiongwei@root:~/cudnn_samples_v8/mnistCUDNN$ cat /var/lib/apt/lists/*cuda*Packages | grep "./libcudnn8"
Filename: ./libcudnn8_8.5.0.96-1+cuda11.7_amd64.deb
Filename: ./libcudnn8-dev_8.5.0.96-1+cuda11.7_amd64.deb
Filename: ./libcudnn8_8.6.0.163-1+cuda11.8_amd64.deb
Filename: ./libcudnn8-dev_8.6.0.163-1+cuda11.8_amd64.deb
Filename: ./libcudnn8_8.7.0.84-1+cuda11.8_amd64.deb
Filename: ./libcudnn8-dev_8.7.0.84-1+cuda11.8_amd64.deb
Filename: ./libcudnn8_8.8.0.121-1+cuda11.8_amd64.deb
Filename: ./libcudnn8_8.8.0.121-1+cuda12.0_amd64.deb
Filename: ./libcudnn8-dev_8.8.0.121-1+cuda11.8_amd64.deb
Filename: ./libcudnn8-dev_8.8.0.121-1+cuda12.0_amd64.deb
Filename: ./libcudnn8_8.8.1.3-1+cuda11.8_amd64.deb
Filename: ./libcudnn8_8.8.1.3-1+cuda12.0_amd64.deb
Filename: ./libcudnn8-dev_8.8.1.3-1+cuda11.8_amd64.deb
Filename: ./libcudnn8-dev_8.8.1.3-1+cuda12.0_amd64.deb
Filename: ./libcudnn8_8.9.0.131-1+cuda11.8_amd64.deb
Filename: ./libcudnn8_8.9.0.131-1+cuda12.1_amd64.deb
Filename: ./libcudnn8-dev_8.9.0.131-1+cuda11.8_amd64.deb
Filename: ./libcudnn8-dev_8.9.0.131-1+cuda12.1_amd64.deb
Filename: ./libcudnn8_8.9.1.23-1+cuda11.8_amd64.deb
Filename: ./libcudnn8_8.9.1.23-1+cuda12.1_amd64.deb
Filename: ./libcudnn8-dev_8.9.1.23-1+cuda11.8_amd64.deb
Filename: ./libcudnn8-dev_8.9.1.23-1+cuda12.1_amd64.deb
Filename: ./libcudnn8-samples_8.9.1.23-1+cuda11.8_amd64.deb
Filename: ./libcudnn8-samples_8.9.1.23-1+cuda12.1_amd64.deb
Filename: ./libcudnn8_8.9.2.26-1+cuda11.8_amd64.deb
Filename: ./libcudnn8_8.9.2.26-1+cuda12.1_amd64.deb
Filename: ./libcudnn8-dev_8.9.2.26-1+cuda11.8_amd64.deb
Filename: ./libcudnn8-dev_8.9.2.26-1+cuda12.1_amd64.deb
Filename: ./libcudnn8-samples_8.9.2.26-1+cuda11.8_amd64.deb
Filename: ./libcudnn8-samples_8.9.2.26-1+cuda12.1_amd64.deb
Filename: ./libcudnn8_8.9.3.28-1+cuda11.8_amd64.deb
Filename: ./libcudnn8_8.9.3.28-1+cuda12.1_amd64.deb
Filename: ./libcudnn8-dev_8.9.3.28-1+cuda11.8_amd64.deb
Filename: ./libcudnn8-dev_8.9.3.28-1+cuda12.1_amd64.deb
Filename: ./libcudnn8-samples_8.9.3.28-1+cuda11.8_amd64.deb
Filename: ./libcudnn8-samples_8.9.3.28-1+cuda12.1_amd64.deb
Filename: ./libcudnn8_8.9.4.25-1+cuda11.8_amd64.deb
Filename: ./libcudnn8_8.9.4.25-1+cuda12.2_amd64.deb
Filename: ./libcudnn8-dev_8.9.4.25-1+cuda11.8_amd64.deb
Filename: ./libcudnn8-dev_8.9.4.25-1+cuda12.2_amd64.deb
Filename: ./libcudnn8-samples_8.9.4.25-1+cuda11.8_amd64.deb
Filename: ./libcudnn8-samples_8.9.4.25-1+cuda12.2_amd64.deb

这里选择最新的 cudnn 8.9.4.25,和 cuda 11.8 进行替换,替换后的完整指令如下所示:

sudo apt-get install libcudnn8=8.9.4.25-1+cuda11.8
sudo apt-get install libcudnn8-dev=8.9.4.25-1+cuda11.8
sudo apt-get install libcudnn8-samples=8.9.4.25-1+cuda11.8

验证 cuDNN

要验证 cuDNN 是否已安装并正常运行,请编译 `/usr/src/cudnn_samples_v8`` 目录中的 mnistCUDNN 示例。

  1. 复制 cuDNN 示例到当前用户目录
cp -r /usr/src/cudnn_samples_v8/ $HOME
  1. 移动到 cuDNN 示例目录中
cd  $HOME/cudnn_samples_v8/mnistCUDNN
  1. 编译 cuDNN mnisiCUDNN 示例
$make clean && make

如报错没有找到 FreeImage.h 文件,请执行 `sudo apt-get install libfreeimage-dev`` 指令安装该依赖。

  1. 运行 mnistCUDNN 示例
 ./mnistCUDNN

如果 cuDNN 在您的 Linux 系统上正确安装并编译&运行,您将看到类似以下内容的消息:

heungxiongwei@root:~/cudnn_samples_v8/mnistCUDNN$ ./mnistCUDNN
Executing: mnistCUDNN
cudnnGetVersion() : 8904 , CUDNN_VERSION from cudnn.h : 8904 (8.9.4)
Host compiler version : GCC 11.4.0There are 1 CUDA capable devices on your machine :
device 0 : sms 24  Capabilities 8.9, SmClock 2250.0 Mhz, MemSize (Mb) 7940, MemClock 8001.0 Mhz, Ecc=0, boardGroupID=0
Using device 0Testing single precision
Loading binary file data/conv1.bin
Loading binary file data/conv1.bias.bin
Loading binary file data/conv2.bin
Loading binary file data/conv2.bias.bin
Loading binary file data/ip1.bin
Loading binary file data/ip1.bias.bin
Loading binary file data/ip2.bin
Loading binary file data/ip2.bias.bin
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.010240 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.010240 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.018432 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.032992 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.047104 time requiring 2057744 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.051200 time requiring 184784 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 128848 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 128000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.049152 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.051200 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.058368 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.063648 time requiring 1433120 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.065536 time requiring 128000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.130112 time requiring 128848 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Resulting weights from Softmax:
0.0000000 0.9999399 0.0000000 0.0000000 0.0000561 0.0000000 0.0000012 0.0000017 0.0000010 0.0000000 
Loading image data/three_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.007328 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.010240 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.011264 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.024576 time requiring 2057744 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.025600 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.026624 time requiring 178432 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 128848 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 128000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.025376 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.030720 time requiring 128848 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.036864 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.051200 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.063488 time requiring 1433120 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.065536 time requiring 128000 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 0.9999288 0.0000000 0.0000711 0.0000000 0.0000000 0.0000000 0.0000000 
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 0.9999820 0.0000154 0.0000000 0.0000012 0.0000006 Result of classification: 1 3 5Test passed!Testing half precision (math in single precision)
Loading binary file data/conv1.bin
Loading binary file data/conv1.bias.bin
Loading binary file data/conv2.bin
Loading binary file data/conv2.bias.bin
Loading binary file data/ip1.bin
Loading binary file data/ip1.bias.bin
Loading binary file data/ip2.bin
Loading binary file data/ip2.bias.bin
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 4608 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 28800 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.011264 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.021504 time requiring 28800 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.022592 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.025600 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.033792 time requiring 2057744 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.074752 time requiring 4608 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 1536 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 64000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.031744 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.040960 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.051168 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.060416 time requiring 1433120 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.064512 time requiring 64000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.069632 time requiring 1536 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Resulting weights from Softmax:
0.0000001 1.0000000 0.0000001 0.0000000 0.0000563 0.0000001 0.0000012 0.0000017 0.0000010 0.0000001 
Loading image data/three_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 4608 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 28800 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.009216 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.012288 time requiring 28800 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.021312 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.023552 time requiring 4608 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.024352 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.029696 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 1536 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 64000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.025600 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.035840 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.051200 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.060416 time requiring 1433120 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.064512 time requiring 64000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.065536 time requiring 1536 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 1.0000000 0.0000000 0.0000714 0.0000000 0.0000000 0.0000000 0.0000000 
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 1.0000000 0.0000154 0.0000000 0.0000012 0.0000006 Result of classification: 1 3 5Test passed!

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.rhkb.cn/news/120604.html

如若内容造成侵权/违法违规/事实不符,请联系长河编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

小黑受到了未来的焦虑,周四继续参加团跑活动仰山跑,跑奥森的坡,越跑越上瘾更加热爱生活的leetcode之旅:LCR 008. 长度最小的子数组

小黑代码1 class Solution:def minSubArrayLen(self, target: int, nums: List[int]) -> int:# 数组长度n len(nums)# 双指针head 0tail 0# 中间变量sum_ 0# 结果变量res n1# 开始双指针迭代while tail < n:sum_ nums[tail]tail 1while sum_ > target:if tail…

0010Java程序设计-springboot+vue影院售票系统设计与实现

摘 要目 录系统实现开发环境 摘 要 看电影已经成为了人们生活中不可缺少的一部分&#xff0c;电影院售票及管理系统是电影院的日常管理及售票任务的核心&#xff0c; 在电影院中&#xff0c; 工作人员并非只是放映电影&#xff0c; 还有诸如票房统计、影片放映、影片场次安排、…

动态规划:路径和子数组问题(C++)

动态规划&#xff1a;路径和子数组问题 路径问题1.不同路径&#xff08;中等&#xff09;2.不同路径II&#xff08;中等&#xff09;3.下降路径最⼩和&#xff08;中等&#xff09;4.地下城游戏&#xff08;困难&#xff09; 子数组问题1.最大子数组和&#xff08;中等&#xf…

一篇文章教会你SpringMVC

目录 1.什么是SpringMVC 2.SpringMVC工作流程 3.SpringMVC核心组件 4.SpringMVC的配置流程 4.1导入POM依赖 4.2在WEB-INF下添加springmvc-servlet.xml(spring-mvc.xml) 4.3 修改web.xml 创建一个Controller用来存放web层的方法和内容 创建一个前端页面用来做测试展示 前言…

04 Linux补充|C/C++

目录 Linux补充 C语⾔ C语言中puts和printf的区别&#xff1f; Linux补充 (1)ubuntu安装ssh服务端openssh-server命令&#xff1a; ubuntu安装后默认只有ssh客户端&#xff0c;只能去连其它ssh服务器&#xff1b;其它客户端想要连接这个ubuntu系统&#xff0c;需要安装部署…

进制转换(二进制、八进制、十六进制、十进制)

一、进制表示 二进制&#xff1a;每一位只有两种符号表示 -> 0,1 例如 (101011)₂&#xff0c;也可写作101011B&#xff0c;其中B是Binary英文的缩写。八进制&#xff1a; 每一位有8种符号表示(0~7)&#xff0c;例如(1652)₈&#xff0c;也可写作1652O&#xff0c;其中O是O…

STL常用容器 (C++核心基础教程之STL容器详解)String的API

在C的标准模板库&#xff08;STL&#xff09;中&#xff0c;有多种容器可供使用。以下是一些常见的容器类型&#xff1a; 序列容器&#xff08;Sequential Containers&#xff09;&#xff1a; std::vector&#xff1a;动态数组&#xff0c;支持快速随机访问。 std::list&…

CS420 课程笔记 P7 - 虚拟内存 多级指针寻址

文章目录 IntroPointersMemory leaksPointer pathPointer scanningExample! Intro 上节课我们学习了静态地址&#xff0c;这节课我们将着手关注动态地址&#xff0c;我们需要了解一个叫做指针的东西 Pointers 简单地说&#xff0c;指针是对象之间的单向连接 Pointers are co…

vue集成mars3d后,basemaps加不上去

首先&#xff1a; <template> <div id"centerDiv" class"mapcontainer"> <mars-map :url"configUrl" οnlοad"onMapload" /> </div> </template> <script> import MarsMap from ../component…

C到C++的升级

C和C的关系 C继承了所有C语言的特性&#xff1b;C在C的基础上提供了更多的语法和特性&#xff0c;C语言去除了一些C语言的不好的特性。C的设计目标是运行效率与开发效率的统一。 变化一&#xff1a;所有变量都可以在使用时定义 C中更强调语言的实用性&#xff0c;所有的变量…

解决centos离线安装cmake找不到OpenSSL问题

安装方法&#xff1a;见另外一篇文章 https://blog.csdn.net/zhongxj183/article/details/118488629 按照文章下载了离线gcc 和OpenSSL&#xff0c;以及在cmake官网下载了最新版 cmake-3.27.4.tar.gz 顺利安装gcc 和OpenSSL 但执行编译cmake时&#xff0c;报错找不到OpenSSL…

【python】读取.dat格式文件

import binascii# 打开二进制文件以只读二进制模式 with open(EXCEL/文件.dat, rb) as file:binary_data file.read()print(binary_data)# 将二进制数据转换为十六进制字符串 hex_data binascii.hexlify(binary_data).decode(utf-8) # binary_data 现在包含了文件的二进制内容…

计算机图形学线性代数相关概念

Transformation&#xff08;2D-Model&#xff09; Scale(缩放) [ x ′ y ′ ] [ s 0 0 s ] [ x y ] (等比例缩放) \left[ \begin{matrix} x \\ y \end{matrix} \right] \left[ \begin{matrix} s & 0 \\ 0 & s \end{matrix} \right] \left[ \begin{matrix} x \\ y \en…

页面页脚部分CSS分享

先看效果&#xff1a; CSS部分&#xff1a;&#xff08;查看更多&#xff09; <style>body {display: grid;grid-template-rows: 1fr 10rem auto;grid-template-areas: "main" "." "footer";overflow-x: hidden;background: #F5F7FA;min…

Qt+C++自建网页浏览器-Chrome blink最新内核基础上搭建-改进版本

程序示例精选 QtC自建网页浏览器-Chrome blink最新内核基础上搭建-改进版本 如需安装运行环境或远程调试&#xff0c;见文章底部个人QQ名片&#xff0c;由专业技术人员远程协助&#xff01; 前言 这篇博客针对<<QtC自建网页浏览器-Chrome blink最新内核基础上搭建-改进版…

linux并发服务器 —— linux网络编程(七)

网络结构模式 C/S结构 - 客户机/服务器&#xff1b;采用两层结构&#xff0c;服务器负责数据的管理&#xff0c;客户机负责完成与用户的交互&#xff1b;C/S结构中&#xff0c;服务器 - 后台服务&#xff0c;客户机 - 前台功能&#xff1b; 优点 1. 充分发挥客户端PC处理能力…

机器学习笔记之最优化理论与方法(六)无约束优化问题——最优性条件

机器学习笔记之最优化理论与方法——无约束优化问题[最优性条件] 引言无约束优化问题无约束优化问题最优解的定义 无约束优化问题的最优性条件无约束优化问题的充要条件无约束优化问题的必要条件无约束优化问题的充分条件 引言 本节将介绍无约束优化问题&#xff0c;主要介绍无…

DDR2 IP核调式记录2

本文相对简单&#xff0c;只供自己看看就行。从其它的博客找了个代码&#xff0c;然后记录下仿真波形。 1. 功能 直接使用quartus生成的DDR2 IP核&#xff0c;然后实现循环 -->写入burst长度的数据后读出。 代码数据的传输是32位&#xff0c;实际使用了两片IC。因此IP核也是…

51单片机热水器温度控制系统仿真设计( proteus仿真+程序+原理图+报告+讲解视频)

51单片机热水器温度控制系统仿真设计 1.主要功能&#xff1a;2.仿真3. 程序代码4. 原理图5. 设计报告6. 设计资料内容清单 &&下载链接 51单片机热水器温度控制系统仿真设计( proteus仿真程序原理图报告讲解视频&#xff09; 仿真图proteus7.8及以上 程序编译器&#x…

【图解RabbitMQ-2】图解JMS规范与AMQP协议是什么

&#x1f9d1;‍&#x1f4bb;作者名称&#xff1a;DaenCode &#x1f3a4;作者简介&#xff1a;CSDN实力新星&#xff0c;后端开发两年经验&#xff0c;曾担任甲方技术代表&#xff0c;业余独自创办智源恩创网络科技工作室。会点点Java相关技术栈、帆软报表、低代码平台快速开…