NCP-AII試験無料問題集「NVIDIA AI Infrastructure 認定」

You are deploying a new A1 inference service using Triton Inference Server on a multi-GPU system. After deploying the models, you observe that only one GPU is being utilized, even though the models are configured to use multiple GPUs. What could be the possible causes for this?

解説: (GoShiken メンバーにのみ表示されます)
Which command is used to verify the installation and configuration of the NGC CLI after initial setup?

解説: (GoShiken メンバーにのみ表示されます)
You are observing that the memory bandwidth being achieved by your CUDA application on an NVIDIAAIOO GPU is significantly lower than the theoretical peak bandwidth. Which of the following could be potential causes for this, and what actions can you take to validate or mitigate them? (Select all that apply)

正解:C,D,E 解答を投票する
解説: (GoShiken メンバーにのみ表示されます)
You want to automate the NGC CLI installation process across multiple hosts in your infrastructure. What are the best practices to achieve this?

正解:A,C,E 解答を投票する
解説: (GoShiken メンバーにのみ表示されます)
You are using MIG (Multi-lnstance GPU) on an NVIDIAAI 00 GPU within a Kubernetes cluster. You want to configure a pod to use a specific MIG instance. How do you define the GPU resource request in the pod's YAML definition?

解説: (GoShiken メンバーにのみ表示されます)
You are troubleshooting performance issues in an A1 training clusten You suspect network congestion. Which of the following network monitoring tools would be MOST helpful in identifying the source of the congestion?

解説: (GoShiken メンバーにのみ表示されます)
You have a BlueField-2 DPU running Ubuntu. After upgrading the MLNX OFED drivers, the DPU fails to boot properly. You are presented with a GRUB prompt. Which sequence of commands is most likely to help you boot into a working kernel?

解説: (GoShiken メンバーにのみ表示されます)
You are configuring an NVIDIAAIOO GPU in a server, and after installation and driver setup, lower than the GPU's specified TDP. What are the possible reasons for this? nvidia-smi reports a power limit much

解説: (GoShiken メンバーにのみ表示されます)
You have configured two lg.10gb MIG instances on an NVIDIAA100 GPU. You are running a deep learning training job on one instance and want to ensure that it cannot consume resources from the other MIG instance. Which mechanism ensures isolation between the two MIG instances at the hardware level?

解説: (GoShiken メンバーにのみ表示されます)
Which of the following are crucial considerations when validating the hardware operation of an NVIDIA-Certified Professional AI Infrastructure server before deploying a production A1 workload? (Select all that apply)

正解:B,C,D 解答を投票する
解説: (GoShiken メンバーにのみ表示されます)
You are troubleshooting a performance issue with a GPU-accelerated application running inside a Docker container. The 'nvidia-smi' output inside the container shows the GPU is being utilized, but the performance is significantly lower than expected. Which of the following could be the cause of this performance bottleneck?

正解:A,B,D,E 解答を投票する
解説: (GoShiken メンバーにのみ表示されます)
You are setting up a BlueField-2 SmartNIC and want to offload network functions. Which of the following are valid methods for enabling hardware offload capabilities?

解説: (GoShiken メンバーにのみ表示されます)
You are tasked with designing a high-performance network for a large-scale recommendation system. The system requires low latency and high throughput for both training and inference. Which interconnect technology is MOST suitable for connecting the nodes within the cluster?

解説: (GoShiken メンバーにのみ表示されます)
After deploying BlueField OS, you notice that the network interfaces are not automatically configured with IP addresses. Which of the following actions would be the MOST appropriate first step to troubleshoot this issue?

解説: (GoShiken メンバーにのみ表示されます)
Which of the following statements accurately describe the benefits of using MIG (Multi-lnstance GPU) in an AI/HPC environment?
(Select all that apply)

解説: (GoShiken メンバーにのみ表示されます)
A GPU in your AI server consistently overheats during inference workloads. You've ruled out inadequate cooling and software bugs.
Running 'nvidia-smi' shows high power draw even when idle. Which of the following hardware issues are the most likely causes?

正解:A,B,D 解答を投票する
解説: (GoShiken メンバーにのみ表示されます)
A large language model (LLM) training job is running across multiple NVIDIAAI 00 GPUs in a cluster. You observe that the GPUs within a single server are communicating efficiently via NVLink, but the inter-server communication over Ethernet is becoming a bottleneck. Which of the following strategies, focusing on cable and transceiver selection, would MOST effectively address this inter-server communication bottleneck? (Choose TWO)

解説: (GoShiken メンバーにのみ表示されます)
You're managing a cluster of servers with BlueField-2 DPUs. One server is experiencing intermittent network connectivity issues. You suspect a problem with the DPU's firmware. Which of the following is the MOST reliable method to determine the CURRENT firmware version of the BlueField-2 DPIJ?

解説: (GoShiken メンバーにのみ表示されます)