RTX 4060, Win11, TF 2.19.0, CUDA 12.3.2 - GPU not detected despite nvidia-smi/deviceQuery PASS

TensorFlow GPU Detection Issue Troubleshooting Summary

Hello. I am seeking assistance with an issue where TensorFlow is unable to detect the GPU on my system, which has an NVIDIA GeForce RTX 4060. I have attempted various troubleshooting steps, but the problem persists.

1. System and Software Environment:

  • GPU: NVIDIA GeForce RTX 4060
  • Operating System: Windows 11
  • Python Version: 3.11.9
  • NVIDIA Driver Version: 566.36 (Lowest compatible driver version for Windows 11)
  • CUDA Toolkit Version: 12.3.2 (January 2024)
  • cuDNN Version: 8.9.7 (for CUDA 12.x)
  • TensorFlow Version: 2.19.0 (initial attempt), 2.12.0 (intermediate attempt), 2.16.1 (final attempt)
    • Currently, I have attempted with TensorFlow 2.19.0 (officially supporting CUDA 12.3, cuDNN 8.9.7) and 2.16.1 (compatible with CUDA 12.x, cuDNN 8.9.x).
  • Visual C++ Redistributable Packages: 2015-2022 (x64) repaired.

2. Problem Symptoms:

  • tf.config.list_physical_devices('GPU') consistently returns an empty list.
  • TensorFlow outputs the message “GPU not detected. Using CPU.” and operates solely on the CPU.
  • Even with TF_CPP_MIN_LOG_LEVEL set to 0 , no GPU-related loading failure messages are displayed.

3. Troubleshooting Steps Attempted and Results:

  • NVIDIA Driver Clean Reinstallation (using DDU):
    • Used DDU to completely remove existing drivers, then reinstalled the latest driver (576.88) from the NVIDIA official website using the “Perform a clean installation” option.
    • Result: Driver installed, but nvidia-smi reported CUDA Version: 12.9 , suggesting a version mismatch with TensorFlow 2.19.0 (CUDA 12.3).
    • Subsequently, performed another DDU clean reinstallation with 566.36 , the lowest driver version available for Windows 11.
    • Result: nvidia-smi correctly reported CUDA Version: 12.7 . (Theoretically compatible with CUDA 12.3).
  • CUDA Toolkit and cuDNN Version Matching:
    • Installed CUDA Toolkit 12.3.2 and cuDNN 8.9.7, precisely matching TensorFlow 2.19.0’s official requirements, and correctly copied cuDNN files to the CUDA Toolkit path.
    • An intermediate attempt involved downgrading to TensorFlow 2.12.0 with CUDA Toolkit 11.8 and cuDNN 8.6.0, but GPU detection still failed. (Later reverted to 12.3.2/8.9.7).
  • Environment Variable Path Setting:
    • The following four paths were placed at the very top of the system Path variable:
      • C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\bin
      • C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\libnvvp
      • C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include
      • C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\lib\x64
  • Virtual Environment Usage:
    • All TensorFlow installations and tests were performed within clean virtual environments created with python -m venv . (Using an elevated/administrator CMD).
  • TF_FORCE_GPU_ALLOW_GROWTH Environment Variable Setting:
    • Set the system environment variable TF_FORCE_GPU_ALLOW_GROWTH to true and rebooted the system.
  • NVIDIA Driver and CUDA Toolkit Self-Diagnosis:
    • nvidia-smi output (after installing driver 566.36):
Fri Jul  4 14:24:57 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 566.36           Driver Version: 566.36        CUDA Version: 12.7      |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                  Driver-Model | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |           MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4060       WDDM |   00000000:0B:00.0  On |                  N/A |
|  0%   49C    P5              N/A /  115W |   1188MiB /   8188MiB |     0%    Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI      PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|   0   N/A  N/A     2252   C+G    ...CBS_cw5n1h2txyewy\TextInputHost.exe      N/A      |
|  ... (truncated) ...
+-----------------------------------------------------------------------------------------+
* **Result:** Driver version is 566.36, and `CUDA Version` is correctly reported as 12.7.
  • deviceQuery.exe output (after installing CUDA Toolkit 12.3.2):
deviceQuery.exe Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA GeForce RTX 4060"
  CUDA Driver Version / Runtime Version           12.7 / 12.3
  CUDA Capability Major/Minor version number:      8.9
  Total amount of global memory:                   8188 MBytes (8585216000 bytes)
MapSMtoCores for SM 8.9 is undefined.  Default to use 128 Cores/SM
MapSMtoCores for SM 8.9 is undefined.  Default to use 128 Cores/SM
  (24) Multiprocessors, (128) CUDA Cores/MP:       3072 CUDA Cores
  GPU Max Clock rate:                              2505 MHz (2.50 GHz)
  Memory Clock rate:                               8501 Mhz
  Memory Bus Width:                                128-bit
  L2 Cache Size:                                   25165824 bytes
  Maximum Texture Dimension Size (x,y,z)           1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers    1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers    2D=(32768, 32768), 2048 layers
  Total amount of constant memory:                 zu bytes
  Total amount of shared memory per block:         zu bytes
  Total number of registers available per block: 65536
  Warp size:                                       32
  Maximum number of threads per multiprocessor:    1536
  Maximum number of threads per block:             1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                            zu bytes
  Texture alignment:                               zu bytes
  Concurrent copy and kernel execution:            Yes with 1 copy engine(s)
  Run time limit on kernels:                       Yes
  Integrated GPU sharing Host Memory:              No
  Support host page-locked memory mapping:         Yes
  Alignment requirement for Surfaces:              Yes
  Device has ECC support:                          Disabled
  CUDA Device Driver Mode (TCC or WDDM):           WDDM (Windows Display Driver Model)
  Device supports Unified Addressing (UVA):        Yes
  Device supports Compute Preemption:              Yes
  Supports Cooperative Kernel Launch:              Yes
  Supports MultiDevice Co-op Kernel Launch:        No
  Device PCI Domain ID / Bus ID / location ID:     0 / 11 / 0
  Compute Mode:
    < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.7, CUDA Runtime Version = 12.3, NumDevs = 1, Device0 = NVIDIA GeForce RTX 4060
Result = PASS
* **Result:** Confirmed that the GPU communicates perfectly with the CUDA driver (12.7) and runtime (12.3), indicating normal operation at the system level.

4. Conclusion and Request for Assistance:

Both nvidia-smi and deviceQuery clearly show that the GPU and CUDA environment are functioning correctly at the system level. All version compatibilities (driver, CUDA Toolkit, cuDNN, TensorFlow) also align with official requirements. Despite this, TensorFlow fails to detect the GPU, which is highly unusual.

I would appreciate any insights into potential overlooked issues or known problems specific to this environment (Windows 11, RTX 4060, Python 3.11.9). Any additional methods for diagnosing why TensorFlow might be failing to load GPU-related DLLs would be greatly helpful.

Currently, I am unable to proceed with GPU-accelerated deep learning training due to this issue. Your assistance would be greatly appreciated.

1 Like

Any solution? I have literally the exact same situation

1 Like

I am having the same issue. Tensorflow has latest compatibilities noted as TF 2.19.0, CUDA 12.5, and cuDNN 9.3. Of course, when I try this, I’m getting the same thing. Passing query, yet tensorflow cannot see GPU. This is a continuous problem it seems…

Title:

TensorFlow Not Detecting GPU on Windows 11 (RTX 4060) — Fixed with Docker

Post Content:

On my Windows 11 machine with an NVIDIA GeForce RTX 4060 and the latest driver, TensorFlow on bare metal Python would not detect the GPU. Even with matching CUDA and cuDNN versions (according to TensorFlow’s official compatibility table), the GPU remained invisible and TensorFlow silently fell back to CPU with no error messages.

This issue is usually caused by strict CUDA/cuDNN minor version requirements in the prebuilt TensorFlow Windows wheels. For certain GPUs like the RTX 4060, even if the versions “match” on paper, the Windows build may still fail to load GPU support.

The solution that worked for me was to run TensorFlow inside a Docker container with GPU passthrough using the NVIDIA Container Toolkit. This ensures TensorFlow runs with the exact CUDA/cuDNN versions it was built for, bypassing any host-level mismatches.

To reproduce my working setup:

  1. Install the latest NVIDIA GPU driver.

  2. Install Docker Desktop for Windows.

  3. Install NVIDIA Container Toolkit (https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html).

  4. Open Command Prompt or PowerShell and run the following single line (on Linux/macOS, replace %cd% with $(pwd)):

docker run –gpus all –rm -it -v %cd%:/workspace -w /workspace tensorflow/tensorflow:2.10.0-gpu python -c “import tensorflow as tf; print(tf.config.list_physical_devices(‘GPU’))”

Why this works:

  • The container already includes CUDA 11.2 and cuDNN 8.1.x, preconfigured for TensorFlow 2.10.0.

  • No need to install CUDA/cuDNN on the host — only the NVIDIA driver is required.

  • Works reliably with RTX 4060 and newer GPUs.

Expected output:

[PhysicalDevice(name=’/physical_device:GPU:0’, device_type=‘GPU’)]

If you have any questions, feel free to email me at [email protected] and I’ll be happy to provide more details.

Please see my new comment.