NCP-AII試験無料問題集（197題）「NVIDIA AI Infrastructure 認定」

出題：1

After updating to a Docker version post 19.03, a data scientist attempts to run a container designed for GPU-accelerated applications with the following command:

This generates the following error (output might differ slightly depending on the specific version):

What will fix the problem?

A. Use an NGC TensorFlow container

B. The NVIDIA driver should be re-installed

C. The DOCA driver needs to be installed

D. Add the argument "--gpus all" to the docker command

正解：D 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：2

An engineer needs to validate NVLink Switch functionality on a DGX H100 system with 8 GPUs.
Which NCCL command verifies intra-node NVLink bandwidth?

A. all_reduce_perf -b 8 -e 16G -f2 -g 8 with NCCL_TESTS_SPLIT= "OR 0x7"

B. broadcast_perf -b 8 -e 16G -f2 -g 8 without split configuration

C. all_reduce_perf -b 8 -e 16G -f2 -g 4 with NCCL_TESTS_SPLIT= "MOD 2"

D. all_reduce_perf -b 8 -e 16G -f2 -g 1 repeated 8 times

正解：A 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：3

An administrator notices that a server is not collecting telemetry data such as traffic flows, performance faults, and events. In which network does this information flow?

A. Storage

B. InfiniBand management

C. In-band management

D. Compute

正解：B 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：4

A shared AI cluster contains H100, L40S, and A30 GPU nodes. Many users submit inference workloads without specifying hardware requirements, causing high-end H100 resources to be consumed by relatively lightweight jobs. Which administrative policy would most effectively improve overall cluster utilization?

A. Disable scheduling on H100 nodes during business hours

B. Restrict every user to a single GPU regardless of workload

C. Configure Slurm partitions, constraints, or Quality of Service (QoS) policies based on workload requirements

D. Assign every job to the first available GPU regardless of model size

正解：C 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：5

A system administrator wants to enable vGPU virtualization on a DGX A100 system. What action should be taken first?

A. Configure the setting in the BIOS of the DGX system

B. Initialize the NVlinks

C. Bind NVSwitches and GPUs to nvidia.ko

D. Configure the FM service in vGPU Virtualization mode

正解：A 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：6

An engineer wants to verify that their NVIDIA GPU is accessible inside a Docker container for running deep learning workloads. They have installed the NVIDIA Container Toolkit on a machine with working NVIDIA drivers. Which command demonstrates the correct way to run a container that can access all available GPUs?

A. docker run --rm nvidia/cuda:12.4.0-base-ubuntu22.04 nvidia-smi

B. docker run --rm it ubuntu:22.04 nvidia-smi

C. docker run --rm --gpus all nvidia/cuda:12.4.0-base-ubuntu22.04 nvidia-smi

D. docker run --rm --runtime=docker nvidia/cuda nvidia-smi

正解：C 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：7

An engineer is tasked with configuring Out-of-Band (OOB) management for a DGX BasePOD deployment. Which network design will best ensure secure and reliable OOB management operations?

A. Place all BMC and management interfaces on an isolated OOB network with access restricted by firewall rules.

B. Connect OOB management ports to the same switch as user traffic for easier troubleshooting.

C. Use a single VLAN for both OOB management and compute fabric to simplify network design.

D. Configure OOB management interfaces to be accessible from any subnet within the data center for maximum flexibility.

正解：A 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：8

Refer to the exhibit. What is the longest distance that DAC LinkX 25G-NRZ cables can cover?

A. 1 meter

B. 3 meters

C. 10 meters

D. 5 meters

正解：D 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：9

You are performing storage validation for an H100-based cluster. Your goal is to ensure the storage system is optimized for AI workloads. What should you focus on?

A. Validate high throughput and low latency by running application-specific benchmarks designed for AI workloads.

B. Verify the total storage capacity and confirm it matches the system specifications without running performance benchmarks.

C. Check that the storage system is configured with RAID levels optimized for redundancy, even if it sacrifices performance.

D. Test storage performance by transferring randomized files between nodes while monitoring metrics like transfer speed and error rates.

正解：A 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：10

A single-node stress test fails during the PCle bandwidth validation phase. Which troubleshooting step is recommended first?

A. Disable NVLink in BIOS to isolate PCle performance

B. Reseat the GPU, then rerun the test.

C. Reduce PCle Gen4 to Gen3 speed in BIOS settings.

D. Reinstall NVIDIA drivers using apt-get install nvidia-driver-550

正解：B 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：11

A system administrator installed a new DPU on a system and needs to connect to the RHSIM interface using SSH for the first time. What IP address should the system administrator connect to?

A. 172.16.0.2/24

B. 192.168.100.2/24

C. 10.0.0.1/24

D. 192.168.1.2/24

正解：B 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：12

Which software library provides GPU acceleration for pandas-like DataFrame operations?

A. TensorFlow Data API

B. Intel oneAPI Data Analytics Library

C. R tidyverse

D. RAPIDS cuDF

正解：D 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：13

What is the purpose of using NCCL in verifying east/west fabric in an NVIDIA AI Factory?
(Choose two.)

A. To measure the latency between GPUs.

B. To measure bandwidth between GPUs.

C. To measure the storage network performance.

D. To measure the power consumption of GPUs.

正解：A,B 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：14

You are tasked with setting up High Availability (HA) for NVIDIA Base Command Manager (BCM) in a new GPU cluster. The cluster consists of a primary and secondary head node, as well as several compute nodes. The requirements are: automatic failover of BCM services, minimal disruption to workloads, and proper cluster health monitoring during and after installation. During your BCM HA installation and configuration process, which two of the following actions are mandatory for ensuring a robust and verified HA cluster configuration? (Choose two.)

A. After configuration is complete, simulate a failover by stopping BCM services on the active head node to verify that all services are running on the secondary node with no interruption.

B. Assign a floating Virtual IP address that can automatically migrate between the primary and secondary head nodes during failover.

C. Configure both head nodes to use independent static IP addresses for BCM services instead of relying on a shared virtual IP address.

D. During configuration, explicitly synchronize both the configuration and state data directories from the primary to the secondary head node to ensure consistency.

E. Compute nodes must be powered on and performing work to initiate the synchronization of the head nodes.

正解：A,B 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

NCP-AII試験無料問題集「NVIDIA AI Infrastructure 認定」