Windows11 WSL2 深度学习环境配置

本文介绍了在 Windows 11 上使用 WSL2 配置深度学习环境的方法。首先需要安装 NVIDIA 显卡驱动程序以获得 GPU 支持,推荐使用官方工具 GeForce Experience 进行安装或更新。然后安装 WSL2,并在其中安装 CUDA 和 cuDNN。在测试中,使用 CPU 运算和 GPU 运算分别运行了小规模和大规模神经网络,并打印了运行时间。结果表明,使用 GPU 运算可以显著提高运行速度。

Windows11 WSL2 深度学习环境配置

CUDA on WSL User Guide

1. 安装 NVIDIA 驱动程序以获得 GPU 支持

1.1 方法一(推荐)

英伟达显卡驱动安装或者更新最好的方法就是利用官方的工具 GeForce Experience 来进行,每一次显卡驱动更新后 CUDA 支持的最高版本会发生变化。

https://www.nvidia.cn/geforce/geforce-experience/ 上下载 GeForce Experience 安装后,进入程序安装/更新 GeForce Game Ready 驱动程序。

1.2 方法二

https://www.nvidia.com/Download/index.aspx?lang=en-us 获取机器对应显卡版本的驱动程序,然后在系统上安装 NVIDIA GeForce Game Ready 或 NVIDIA RTX Quadro Windows 11 显示驱动程序。

注意:这是您需要安装的唯一驱动程序。 不要在 WSL 中安装任何 Linux 显示驱动程序。

2. 安装 WSL2

启动您首选的 Windows Terminal / Command Prompt / Powershell 并安装 WSL:

1
wsl.exe --install

确保您拥有最新的 WSL 内核:

1
wsl.exe --update

查看 WSL 内核版本:

1
2
PS Microsoft.PowerShell.Core\FileSystem::\\wsl$\Ubuntu\root> wsl cat /proc/version
Linux version 5.10.102.1-microsoft-standard-WSL2 (oe-user@oe-host) (x86_64-msft-linux-gcc (GCC) 9.3.0, GNU ld (GNU Binutils) 2.34.0.20200220) #1 SMP Wed Mar 2 00:30:59 UTC 2022

注意:WSL 2 Support Constraints

Ensure you are on the latest WSL Kernel or at least 4.19.121+. We recommend 5.10.16.3 or later for better performance and functional fixes.

3. 安装 CUDA

3.1 确认 NVIDIA 驱动支持的 CUDA 版本

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
C:\Windows>nvidia-smi
Wed Jul 27 22:57:07 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 516.59 Driver Version: 516.59 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... WDDM | 00000000:01:00.0 Off | N/A |
| N/A 42C P0 11W / N/A | 0MiB / 4096MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+

C:\Windows>nvidia-smi -L
GPU 0: NVIDIA GeForce RTX 3050 Laptop GPU (UUID: GPU-f55a7fb2-0dc9-1d4f-dca9-8f7ea311ec67)
  • NVIDIA 显卡硬件版本为:NVIDIA GeForce RTX 3050 Laptop GPU
  • 当前显卡驱动程序版本为 :516.59
  • CUDA 版本为:11.7

意味着我笔记本当前的显卡驱动版本 516.59 最高支持的 CUDA 版本是 11.7(11.7 以下都是适配/兼容的)。可以查询 NVIDIA 显卡硬件版本 NVIDIA GeForce RTX 3050 Laptop 所支持的最高显卡驱动版本,通过升级显卡驱动后,来支持更高的 CUDA 版本。

详细的显卡驱动和 CUDA 的版本关系请参考

3.2 下载 CUDA

https://developer.nvidia.com/cuda-toolkit-archive 获取合适版本的 CUDA Toolkit,这里以 11.2.2 版本为例进行说明。

  • 进入 CUDA Toolkit 11.2.2 下载页面
  • 选择 Linux 操作系统
  • 选择 x86_64 架构
  • 选择 WSL-Ubuntu 发行版
  • 选择 runfile(local) 安装类型

即可得到安装指示信息:

1
2
$ wget https://developer.download.nvidia.com/compute/cuda/11.2.2/local_installers/cuda_11.2.2_460.32.03_linux.run
$ sudo sh cuda_11.2.2_460.32.03_linux.run

3.3 在 WSL2 中安装 CUDA

根据上一步获取的安装指示,下载安装程序:

1
wget -c https://developer.download.nvidia.com/compute/cuda/11.2.2/local_installers/cuda_11.2.2_460.32.03_linux.run

执行安装程序:

1
sh cuda_11.2.2_460.32.03_linux.run

输入 accept 回车确认同意用户使用协议,光标移至 Install,安装全部,安装完成后会打印如下提示:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
===========
= Summary =
===========

Driver: Not Selected
Toolkit: Installed in /usr/local/cuda-11.2/
Samples: Installed in /root/, but missing recommended libraries

Please make sure that
- PATH includes /usr/local/cuda-11.2/bin
- LD_LIBRARY_PATH includes /usr/local/cuda-11.2/lib64, or, add /usr/local/cuda-11.2/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-11.2/bin
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 460.00 is required for CUDA 11.2 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
sudo <CudaInstaller>.run --silent --driver

Logfile is /var/log/cuda-installer.log

提示中的警告 "WARNING: Incomplete installation! This installation did not install the CUDA Driver." 是正常的,因为我们只需在 Windows 下安装了 NVIDIA 驱动,而非 WSL 下。

3.4 设置环境变量

编辑 /etc/profile 或 ~/.bashrc 文件,以 ~/.bashrc 为例

1
vim ~/.bashrc

在文件末尾添加如下配置:

1
2
3
4
5
# Cuda Environment
export CUDA_HOME=/usr/local/cuda-11.2
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH

export PATH=$CUDA_HOME/bin:$PATH

使配置生效

1
source ~/.bashrc

3.5 测试 CUDA

进入你的 CUDA Example 所在目录,默认是主目录,找到 "NVIDIA_CUDA-11.2_Samples"。依次打开 "1_Utilities" –> "deviceQuery",然后重新打开一个终端输入:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
root@ubuntu:~# cd ~/NVIDIA_CUDA-11.2_Samples/1_Utilities/deviceQuery
root@ubuntu:~/NVIDIA_CUDA-11.2_Samples/1_Utilities/deviceQuery# make
/usr/local/cuda/bin/nvcc -ccbin g++ -I../../common/inc -m64 --threads 0 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_86,code=compute_86 -o deviceQuery.o -c deviceQuery.cpp
nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
/usr/local/cuda/bin/nvcc -ccbin g++ -m64 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_86,code=compute_86 -o deviceQuery deviceQuery.o
nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
mkdir -p ../../bin/x86_64/linux/release
cp deviceQuery ../../bin/x86_64/linux/release
root@ubuntu:~/NVIDIA_CUDA-11.2_Samples/1_Utilities/deviceQuery# ./deviceQuery
./deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA GeForce RTX 3050 Laptop GPU"
CUDA Driver Version / Runtime Version 11.7 / 11.2
CUDA Capability Major/Minor version number: 8.6
Total amount of global memory: 4096 MBytes (4294508544 bytes)
(16) Multiprocessors, (128) CUDA Cores/MP: 2048 CUDA Cores
GPU Max Clock rate: 1057 MHz (1.06 GHz)
Memory Clock rate: 5501 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 1572864 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total shared memory per multiprocessor: 102400 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 1536
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Managed Memory: Yes
Device supports Compute Preemption: Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: No
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.7, CUDA Runtime Version = 11.2, NumDevs = 1
Result = PASS

出现 "Result = PASS" 字样时,说明安装成功

4. 安装 cuDNN

4.1 下载 cuDNN

https://developer.nvidia.com/rdp/cudnn-archive 获取符合 CUDA 版本的 cuDNN。这里示例 CUDA 是 11.2,所以选择了 Download cuDNN v8.1.1 (Feburary 26th, 2021), for CUDA 11.0,11.1 and 11.2,选择下载 cuDNN Library for Linux (x86_64)

1
wget -c https://developer.nvidia.com/compute/machine-learning/cudnn/secure/8.1.1.33/11.2_20210301/cudnn-11.2-linux-x64-v8.1.1.33.tgz

注意:这里需要注册 NVIDIA 账号。

4.2 在 WSL2 中安装 cuDNN

安装过程实际上是把 cuDNN 的头文件复制到 CUDA 的头文件目录里面去,把 cuDNN 的库复制到 CUDA 的库目录里面去。

解压:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
root@ubuntu:~/Downloads# tar -zxvf cudnn-11.2-linux-x64-v8.1.1.33.tgz
cuda/include/cudnn.h
cuda/include/cudnn_adv_infer.h
cuda/include/cudnn_adv_infer_v8.h
cuda/include/cudnn_adv_train.h
cuda/include/cudnn_adv_train_v8.h
cuda/include/cudnn_backend.h
cuda/include/cudnn_backend_v8.h
cuda/include/cudnn_cnn_infer.h
cuda/include/cudnn_cnn_infer_v8.h
cuda/include/cudnn_cnn_train.h
cuda/include/cudnn_cnn_train_v8.h
cuda/include/cudnn_ops_infer.h
cuda/include/cudnn_ops_infer_v8.h
cuda/include/cudnn_ops_train.h
cuda/include/cudnn_ops_train_v8.h
cuda/include/cudnn_v8.h
cuda/include/cudnn_version.h
cuda/include/cudnn_version_v8.h
cuda/NVIDIA_SLA_cuDNN_Support.txt
cuda/lib64/libcudnn.so
cuda/lib64/libcudnn.so.8
cuda/lib64/libcudnn.so.8.1.1
cuda/lib64/libcudnn_adv_infer.so
cuda/lib64/libcudnn_adv_infer.so.8
cuda/lib64/libcudnn_adv_infer.so.8.1.1
cuda/lib64/libcudnn_adv_train.so
cuda/lib64/libcudnn_adv_train.so.8
cuda/lib64/libcudnn_adv_train.so.8.1.1
cuda/lib64/libcudnn_cnn_infer.so
cuda/lib64/libcudnn_cnn_infer.so.8
cuda/lib64/libcudnn_cnn_infer.so.8.1.1
cuda/lib64/libcudnn_cnn_train.so
cuda/lib64/libcudnn_cnn_train.so.8
cuda/lib64/libcudnn_cnn_train.so.8.1.1
cuda/lib64/libcudnn_ops_infer.so
cuda/lib64/libcudnn_ops_infer.so.8
cuda/lib64/libcudnn_ops_infer.so.8.1.1
cuda/lib64/libcudnn_ops_train.so
cuda/lib64/libcudnn_ops_train.so.8
cuda/lib64/libcudnn_ops_train.so.8.1.1
cuda/lib64/libcudnn_static.a
cuda/lib64/libcudnn_static_v8.a

复制 cuDNN 头文件

1
cp cuda/include/* /usr/local/cuda-11.2/include/

复制 cuDNN 库文件

1
cp cuda/lib64/* /usr/local/cuda-11.2/lib64/

添加可执行权限

1
2
chmod +x /usr/local/cuda-11.2/include/cudnn.h
chmod +x /usr/local/cuda-11.2/lib64/libcudnn*

4.3 测试 cuDNN

打开一个新的终端:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
root@ubuntu:~# conda activate ailab
(ailab) root@ubuntu:~# ipython
Python 3.7.13 (default, Mar 29 2022, 02:18:16)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.34.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import tensorflow as tf

In [2]: tf.test.is_gpu_available()
WARNING:tensorflow:From <ipython-input-2-17bb7203622b>:1: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
2022-07-27 23:46:19.645726: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-07-27 23:46:21.108297: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:961] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-07-27 23:46:21.136684: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:961] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-07-27 23:46:21.137199: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:961] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-07-27 23:46:22.147971: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:961] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-07-27 23:46:22.148598: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:961] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-07-27 23:46:22.148635: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Could not identify NUMA node of platform GPU id 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-07-27 23:46:22.149146: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:961] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-07-27 23:46:22.149214: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /device:GPU:0 with 1626 MB memory: -> device: 0, name: NVIDIA GeForce RTX 3050 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6
Out[2]: True

In [3]:

输出中提示 "Your kernel may not have been built with NUMA support" ,这是无害告警,可以忽略。

https://stackoverflow.com/questions/51733128/your-kernel-may-have-been-built-without-numa-support

https://forums.developer.nvidia.com/t/numa-error-running-tensorflow-on-jetson-tx2/56119

https://stackoverflow.com/questions/40426502/is-there-a-way-to-suppress-the-messages-tensorflow-prints

5. CPU vs GPU

硬件 型号
CPU AMD Ryzen 7 5800H with Radeon Graphics
GPU NVIDIA GeForce RTX 3050 Laptop GPU

5.1 小规模神经网络

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# TensorFlow and tf.keras
import tensorflow as tf
# Helper libraries
from time import time

mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

# 用 CPU 运算
startTime1 = time()

with tf.device('/cpu:0'):
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation='softmax'),
])
model.compile(
optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'],
)
model.fit(x_train, y_train, epochs=10)
model.evaluate(x_test, y_test)
t1 = time() - startTime1

# 用 GPU 运算
startTime2 = time()

with tf.device('/gpu:0'):
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation='softmax'),
])
model.compile(
optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'],
)
model.fit(x_train, y_train, epochs=10)
model.evaluate(x_test, y_test)

t2 = time() - startTime2

# 打印运行时间
print('使用cpu花的时间:', t1)
print('使用gpu花的时间:', t2)

运行结果

1
2
使用cpu花的时间: 26.906126737594604
使用gpu花的时间: 54.50972127914429

5.2 大规模神经网络

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# TensorFlow and tf.keras
import tensorflow as tf
# Helper libraries
from time import time

mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

# CPU 运行
startTime1 = time()

with tf.device('/cpu:0'):
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(1000, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(1000, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation='softmax'),
])
model.compile(
optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'],
)
model.fit(x_train, y_train, epochs=10)
model.evaluate(x_test, y_test)

t1 = time() - startTime1

# GPU 运行
startTime2 = time()

with tf.device('/gpu:0'):
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(1000, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(1000, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation='softmax'),
])

model.compile(
optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'],
)
model.fit(x_train, y_train, epochs=10)
model.evaluate(x_test, y_test)

t2 = time() - startTime2

# 打印运行时间
print('使用cpu花的时间:', t1)
print('使用gpu花的时间:', t2)

运行结果

1
2
使用 cpu 花的时间:205.31678819656372
使用 gpu 花的时间:74.53320002555847

References

深度学习环境配置 Windows+WSL2

Windows搭建深度学习环境(CUDA)

ubuntu安装cudnn

CUDA on WSL User Guide

linux下查看库是否存在

NVIDIA CUDA Toolkit Release Notes