NCP-AII試験無料問題集「NVIDIA AI Infrastructure 認定」
You are deploying a new A1 inference service using Triton Inference Server on a multi-GPU system. After deploying the models, you observe that only one GPU is being utilized, even though the models are configured to use multiple GPUs. What could be the possible causes for this?
正解:B,E
解答を投票する
解説: (GoShiken メンバーにのみ表示されます)
You are observing that the memory bandwidth being achieved by your CUDA application on an NVIDIAAIOO GPU is significantly lower than the theoretical peak bandwidth. Which of the following could be potential causes for this, and what actions can you take to validate or mitigate them? (Select all that apply)
正解:C,D,E
解答を投票する
解説: (GoShiken メンバーにのみ表示されます)
You have configured two lg.10gb MIG instances on an NVIDIAA100 GPU. You are running a deep learning training job on one instance and want to ensure that it cannot consume resources from the other MIG instance. Which mechanism ensures isolation between the two MIG instances at the hardware level?
正解:A
解答を投票する
解説: (GoShiken メンバーにのみ表示されます)
You are troubleshooting a performance issue with a GPU-accelerated application running inside a Docker container. The 'nvidia-smi' output inside the container shows the GPU is being utilized, but the performance is significantly lower than expected. Which of the following could be the cause of this performance bottleneck?
正解:A,B,D,E
解答を投票する
解説: (GoShiken メンバーにのみ表示されます)
You are tasked with designing a high-performance network for a large-scale recommendation system. The system requires low latency and high throughput for both training and inference. Which interconnect technology is MOST suitable for connecting the nodes within the cluster?
正解:E
解答を投票する
解説: (GoShiken メンバーにのみ表示されます)
A GPU in your AI server consistently overheats during inference workloads. You've ruled out inadequate cooling and software bugs.
Running 'nvidia-smi' shows high power draw even when idle. Which of the following hardware issues are the most likely causes?
Running 'nvidia-smi' shows high power draw even when idle. Which of the following hardware issues are the most likely causes?
正解:A,B,D
解答を投票する
解説: (GoShiken メンバーにのみ表示されます)
A large language model (LLM) training job is running across multiple NVIDIAAI 00 GPUs in a cluster. You observe that the GPUs within a single server are communicating efficiently via NVLink, but the inter-server communication over Ethernet is becoming a bottleneck. Which of the following strategies, focusing on cable and transceiver selection, would MOST effectively address this inter-server communication bottleneck? (Choose TWO)
正解:A,C
解答を投票する
解説: (GoShiken メンバーにのみ表示されます)
You're managing a cluster of servers with BlueField-2 DPUs. One server is experiencing intermittent network connectivity issues. You suspect a problem with the DPU's firmware. Which of the following is the MOST reliable method to determine the CURRENT firmware version of the BlueField-2 DPIJ?
正解:B
解答を投票する
解説: (GoShiken メンバーにのみ表示されます)