A. The GPU baseboard may not be detected. You can determine this by opening the IPMI GUI and reviewing the FW section.
B. If more than 3 GPUs are missing, then ROCm commands will not function. We need to verify that the 3 are missing in the IPMI GUI and review the GPU component section to make sure that at least 2 or fewer GPUs are missing.
C. You suspect there are GPUs not detected. To determine which GPUs are missing, open the IPMI GUI and review the GPU component section to make sure all GPUs are present.
D. CPU is not installed correctly on the server. Review the OS dmesg logs for any odd CPU messages, then if located proceed to reseat the CPU.
A. One on each side of the GPU tray
B. Bottom of the GPU tray
C. Top of the GPU tray
D. One on top and one on bottom of the GPU tray
A. sudo rocm-gpu, sudo rocm-baseboardgpu
B. sudo rockm-smc, and sudo rockm-bug.report.txt
C. sudo rocm-csv, and sudo rocm-amd.bug.txt
D. sudo rocm-smi -a, and sudo rocm-smi